Another fresh data release for MyVariant.info is out! In this data release, we have updated the data from ClinVar to their latest versions, and also added two new fields under ClinVar and ExAC to handle specific cases, including genotype set and multi-allelic variants. Here are more details.
ClinVar was updated to its latest (same version for both hg19 and hg38 assembly):
Some numbers for GRCh37/hg19 variants:
last release | new release | # of variants in last release |
# of variants in new release |
|
ClinVar | 2017-04 | 2017-06 | 282,772 | 307,101 |
Similarly, some numbers for GRCh38/hg38 variants:
last release | new release | # of variants in last release |
# of variants in new release |
|
ClinVar | 2017-04 | 2017-06 | 282,956 | 307,286 |
ClinVar annotations are available under "clinvar" subfields for each annotated variant. MyVariant.info aggregates annotations from ClinVar, dbSNP, dbNSFP and other 12 sources for each variant, so you can access them all in one request.
The total number of unique variants is now over 424M (424,519,520), slightly higher than our previous release on April 2017, which is 424,515,266. More details about the variant data we provide from MyVariant.info are always available from our documentation. The programmatic access of this information is available from our metadata endpoint (and hg38 metadata).
There are a few submissions in ClinVar that represent assertions about simple or complex genotypes. To include this information in MyVariant.info, we have included a new genotypeset field under clinvar. There are two subfields under genotypeset, which is genotype and type. The "genotype" field records all variants as hgvs ids sharing the same genotype with the target variant. And the "type" field specifies the genotype which these variants are sharing, e.g. "CompoundHeterozygote".
curl 'http://myvariant.info/v1/query?q=_exists_:clinvar.genotypeset'
curl 'http://myvariant.info/v1/variant/chr5:g.151208511G>A?fields=clinvar.genotypeset'
{
"_id": "chr5:g.151208511G>A",
"_version": 2,
"clinvar": {
"_license": "https://goo.gl/OaHML9",
"genotypeset": {
"genotype": [
"chr5:g.151239534C>A",
"chr5:g.151208511G>A"
],
"type": "CompoundHeterozygote"
}
}
}
Thus, users could query for all multi-allelic variants for a target variant, e.g. chr10:g.103234255C>G using:
curl 'http://myvariant.info/v1/variant/chr12:g.103234255C>G?fields=exac.multi-allelic'
{
"_id": "chr12:g.103234255C>G",
"_version": 3,
"exac": {
"_license": "https://goo.gl/MH8b34",
"multi-allelic": [
"chr12:g.103234255C>T",
"chr12:g.103234255C>G"
]
}
}
Or query for all multi-allelic variants in ExAC using:
curl 'http://myvariant.info/v1/query?q=_exists_:exac.multi-allelic'
Please note that these two fields do not introduce any incompatible changes in the data structure, so your existing code should just work fine.
That's all! And as always, feel free to reach us at help@myvariant.info or @myvariantinfo if you have any questions or feedback.