Data release updates and metadata deprecations

mygene.info mygene data release usage biothings

Beginning with this latest data release, the structure of the metadata in MyGene.info has changed slightly. It used to be that users could retrieve information on the data version number by calling the “src_version” field from the metadata json object, which would contain the version information for each data resource and looked something like this:

 "src_version": {
    "PantherDB": "2017-12-11",
    "cpdb": "34",
    "ensembl": "96",
    "ensembl_fungi": "43",
    "ensembl_genomic_pos_hg19": null,
    …
    …
    "wikipedia": null
 }

Similarly, the number of annotations from each resource could be retrieved from the “stats” json object.

 "stats": {
    "total_ensembl_genes": 31358764,
    "total_ensembl_genes_mapped_to_entrez": 3209199,
    "total_ensembl_only_genes": 7465711,
    "total_entrez_genes": 23740786,
    "total_genes": 31206497,
    "total_species": 25238
 }

From this update on, “src_version” has been deprecated and removed. This is because the metadata is (and has been for awhile now) contained in a “src” nested json object with each resource containing properties such as “code”, “stats”, and “version” along with the corresponding values as seen in this example:

 "src": {
    "PantherDB": {
      "code": {
        "branch": "v3",
        "commit": "2a4aeca",
        "folder": "src/plugins/PantherDB",
        "repo": "https://github.com/biothings/mygene.info.git",
        "url":  "https://github.com/biothings/..."
      },
      "stats": {
        "PantherDB": 156054
      },
      "version": "2017-12-11"
 }

Users interested in retrieving information on the latest stats or version of the data resources in MyGene.info should adjust their code accordingly.

While we’re on the subject of the latest stats and versions...these are the most recent updates to the data:

Resource build_version: "20190421" build_version: "20190428"
entrez_accession 23623571 23833656
entrez_gene 23740786 23950933
entrez_genomic_pos 2596024 2593132
entrez_go 204547 204413
entrez_refseq 23586607 23796228
entrez_retired 249767 250635
entrez_unigene 544868 544835
generif 97870 97929
entrez_ec 19906 19909
entrez_genesummary 27722 28153
total_ensembl_genes 31358764 31568911
total_ensembl_genes_mapped_to_entrez 3209199 3209192
total_ensembl_only_genes 7465711 7465713
total_entrez_genes 23740786 23950933
total_genes 31206497 31416646
total_species 25238 25257