MyVariant gets bigger with the June 2018 release

myvariant.info biothings newdata improvements data release

The late June data release almost doubles the number of variants available through myvariant.info API! Besides migrating to a more recent version of ElasticSearch and bringing lots of optimizations, we added the following resources:

  • clingen: "clingen.caid" field now contains the Canonical Allele ID (CAID) from ClinGen Allele registry. The ClinGen Allele Registry provides unique variant identifiers for the same variant but represented in various HGVS names based on different reference sequences, such as genomic, transcript or protein sequences.
  • gnomad_exome and gnomad_genome: We previously included gnomAD variants and annotations under "gnomad_exome" and "gnomad_genome" fields, but only available for the variants based on GRCh37/hg19. As gnomAD now provides variant annotations based on GRCh38/hg38 (via liftover) in its new version 2.0.2, we have now added "gnomad_exome" and "gnomad_genome" fields for GRCh38/hg38 variants as well, growing total hg38 variants by more than 260M.
  • dbsnp: A recent dbSNP release version 151 doubles the number of variants, over 600M variants available from dbSNP now. A side note, for some reason, the version number 151 kept the same as last data change, but actual data doubled in this release.

Total number of variants now reaches ~875M and ~796M variants for the our hg19 and hg38 indices, respectively.

Besides these new additions, we also refreshed both ClinVar and CGI resources to their latest versions and fixed a few minor data issues reported from our users.

The full release note can found as usual on this page:
http://docs.myvariant.info/en/latest/doc/release_changes.html

You can always reach us at contact@biothings.io for any questions or comments.