API Documentation

Detailed documentation of the biothings_client package can be found on this page.

get_client

biothings_client.get_client(biothing_type, instance=True, *args, **kwargs)[source]

Function to return a new python client for a Biothings API service.

Parameters:
  • biothing_type – the type of biothing client, currently one of: ‘gene’, ‘variant’, ‘taxon’, ‘chem’, ‘disease’
  • instance – if True, return an instance of the derived client, if False, return the class of the derived client

All other args/kwargs are passed to the derived client instantiation (if applicable)

MyGeneInfo

class biothings_client.MyGeneInfo(url=None)[source]
clear_cache()

Clear the globally installed cache.

findgenes(id_li, **kwargs)

Deprecated since version 2.0.0.

Use querymany() instead. It’s kept here as an alias of querymany() method.

get_fields(search_term=None, verbose=True)

Return all available fields can be return from MyGene.info services.

This is a wrapper for http://mygene.info/metadata/fields

Parameters:search_term – an optional string to search (case insensitive) for matching field names. If not provided, all available fields will be returned.

Example:

>>> mv.get_fields()
>>> mv.get_fields("uniprot")
>>> mv.get_fields("refseq")
>>> mv.get_fields("kegg")

Hint

This is useful to find out the field names you need to pass to fields parameter of other methods.

getgene(_id, fields=None, **kwargs)

Return the gene object for the give geneid. This is a wrapper for GET query of “/gene/<geneid>” service.

Parameters:
  • geneid – entrez/ensembl gene id, entrez gene id can be either a string or integer
  • fields – fields to return, a list or a comma-separated string. If fields=”all”, all available fields are returned
  • species – optionally, you can pass comma-separated species names or taxonomy ids
  • email – optionally, pass your email to help us to track usage
  • filter – alias for fields parameter
Returns:

a gene object as a dictionary, or None if geneid is not valid.

Ref:

http://docs.mygene.info/en/latest/doc/data.html for available fields, extra kwargs and more.

Example:

>>> mg.getgene(1017, email='abc@example.com')
>>> mg.getgene('1017', fields='symbol,name,entrezgene,refseq')
>>> mg.getgene('1017', fields='symbol,name,entrezgene,refseq.rna')
>>> mg.getgene('1017', fields=['symbol', 'name', 'pathway.kegg'])
>>> mg.getgene('ENSG00000123374', fields='all')

Hint

The supported field names passed to fields parameter can be found from any full gene object (when fields=”all”). Note that field name supports dot notation for nested data structure as well, e.g. you can pass “refseq.rna” or “pathway.kegg”.

getgenes(ids, fields=None, **kwargs)

Return the list of gene objects for the given list of geneids. This is a wrapper for POST query of “/gene” service.

Parameters:
  • geneids – a list/tuple/iterable or comma-separated entrez/ensembl gene ids
  • fields – fields to return, a list or a comma-separated string. If fields=”all”, all available fields are returned
  • species – optionally, you can pass comma-separated species names or taxonomy ids
  • email – optionally, pass your email to help us to track usage
  • filter – alias for fields
  • as_dataframe – if True, return object as DataFrame (requires Pandas).
  • df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:

a list of gene objects or a pandas DataFrame object (when as_dataframe is True)

Ref:

http://mygene.info/doc/annotation_service.html for available fields, extra kwargs and more.

Example:

>>> mg.getgenes([1017, '1018','ENSG00000148795'], email='abc@example.com')
>>> mg.getgenes([1017, '1018','ENSG00000148795'], fields="entrezgene,uniprot")
>>> mg.getgenes([1017, '1018','ENSG00000148795'], fields="all")
>>> mg.getgenes([1017, '1018','ENSG00000148795'], as_dataframe=True)

Hint

A large list of more than 1000 input ids will be sent to the backend web service in batches (1000 at a time), and then the results will be concatenated together. So, from the user-end, it’s exactly the same as passing a shorter list. You don’t need to worry about saturating our backend servers.

metadata(verbose=True, **kwargs)

Return a dictionary of MyGene.info metadata.

Example:

>>> metadata = mg.metadata
query(q, **kwargs)

Return the query result. This is a wrapper for GET query of “/query?q=<query>” service.

Parameters:
  • q – a query string, detailed query syntax here
  • fields – fields to return, a list or a comma-separated string. If fields=”all”, all available fields are returned
  • species – optionally, you can pass comma-separated species names or taxonomy ids. Default: human,mouse,rat.
  • size – the maximum number of results to return (with a cap of 1000 at the moment). Default: 10.
  • skip – the number of results to skip. Default: 0.
  • sort – Prefix with “-” for descending order, otherwise in ascending order. Default: sort by matching scores in decending order.
  • entrezonly – if True, return only matching entrez genes, otherwise, including matching Ensemble-only genes (those have no matching entrez genes).
  • email – optionally, pass your email to help us to track usage
  • as_dataframe – if True, return object as DataFrame (requires Pandas).
  • df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
  • fetch_all – if True, return a generator to all query results (unsorted). This can provide a very fast return of all hits from a large query. Server requests are done in blocks of 1000 and yielded individually. Each 1000 block of results must be yielded within 1 minute, otherwise the request will expire on the server side.
Returns:

a dictionary with returned gene hits or a pandas DataFrame object (when as_dataframe is True)

Ref:

http://mygene.info/doc/query_service.html for available fields, extra kwargs and more.

Example:

>>> mg.query('cdk2')
>>> mg.query('reporter:1000_at')
>>> mg.query('symbol:cdk2', species='human')
>>> mg.query('symbol:cdk*', species=10090, size=5, as_dataframe=True)
>>> mg.query('q=chrX:151073054-151383976', species=9606)
querymany(qterms, scopes=None, **kwargs)

Return the batch query result. This is a wrapper for POST query of “/query” service.

Parameters:
  • qterms – a list/tuple/iterable of query terms, or a string of comma-separated query terms.
  • scopes – type of types of identifiers, either a list or a comma-separated fields to specify type of input qterms, e.g. “entrezgene”, “entrezgene,symbol”, [“ensemblgene”, “symbol”]. Refer to official MyGene.info docs for full list of fields.
  • fields – fields to return, a list or a comma-separated string. If fields=”all”, all available fields are returned
  • species – optionally, you can pass comma-separated species names or taxonomy ids. Default: human,mouse,rat.
  • entrezonly – if True, return only matching entrez genes, otherwise, including matching Ensemble-only genes (those have no matching entrez genes).
  • returnall – if True, return a dict of all related data, including dup. and missing qterms
  • verbose – if True (default), print out infomation about dup and missing qterms
  • email – optionally, pass your email to help us to track usage
  • as_dataframe – if True, return object as DataFrame (requires Pandas).
  • df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:

a list of gene objects or a pandas DataFrame object (when as_dataframe is True)

Ref:

http://mygene.info/doc/query_service.html for available fields, extra kwargs and more.

Example:

>>> mg.querymany(['DDX26B', 'CCDC83'], scopes='symbol', species=9606)
>>> mg.querymany(['1255_g_at', '1294_at', '1316_at', '1320_at'], scopes='reporter')
>>> mg.querymany(['NM_003466', 'CDK2', 695, '1320_at', 'Q08345'],
...              scopes='refseq,symbol,entrezgene,reporter,uniprot', species='human')
>>> mg.querymany(['1255_g_at', '1294_at', '1316_at', '1320_at'], scopes='reporter',
...              fields='ensembl.gene,symbol', as_dataframe=True)

Hint

querymany() is perfect for doing id mappings.

Hint

Just like getgenes(), passing a large list of ids (>1000) to querymany() is perfectly fine.

set_caching(cache_db=None, verbose=True, **kwargs)

Installs a local cache for all requests.

cache_db is the path to the local sqlite cache database.

stop_caching()

Stop caching.

MyVariantInfo

class biothings_client.MyVariantInfo(url=None)[source]
clear_cache()

Clear the globally installed cache.

format_hgvs(chrom, pos, ref, alt)

get a valid hgvs name from VCF-style “chrom, pos, ref, alt” data. Example:

>>> utils.variant.format_hgvs("1", 35366, "C", "T")
>>> utils.variant.format_hgvs("2", 17142, "G", "GA")
>>> utils.variant.format_hgvs("MT", 8270, "CACCCCCTCT", "C")
>>> utils.variant.format_hgvs("X", 107930849, "GGA", "C")
get_fields(search_term=None, verbose=True)

Wrapper for http://myvariant.info/v1/metadata/fields

Parameters:
  • search_term – a case insensitive string to search for in available field names. If not provided, all available fields will be returned.
  • assembly – return the metadata for either hg19 or hg38 variants, “hg19” (default) or “hg38”.

Example:

>>> mv.get_fields()
>>> mv.get_fields("rsid")
>>> mv.get_fields("sift")

Hint

This is useful to find out the field names you need to pass to fields parameter of other methods.

get_hgvs_from_vcf(input_vcf)

From the input VCF file (filename or file handle), return a generator of genomic based HGVS ids. :param input_vcf: input VCF file, can be a filename or a file handle :returns: a generator of genomic based HGVS ids. To get back a list

instead, using list(get_hgvs_from_vcf(“your_vcf_file”))

Note

This is a lightweight VCF parser to return valid genomic-based HGVS ids from the input_vcf file. For more sophisticated VCF parser, consider using PyVCF module.

getvariant(_id, fields=None, **kwargs)

Return the variant object for the give HGVS-based variant id. This is a wrapper for GET query of “/variant/<hgvsid>” service.

Parameters:
  • vid – an HGVS-based variant id. More about HGVS id.
  • fields

    fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned. See here for all available fields.

  • assembly – specify the human genome assembly used in HGVS-based variant id, “hg19” (default) or “hg38”.
Returns:

a variant object as a dictionary, or None if vid is not found.

Example:

>>> mv.getvariant('chr9:g.107620835G>A')
>>> mv.getvariant('chr9:g.107620835G>A', fields='dbnsfp.genename')
>>> mv.getvariant('chr9:g.107620835G>A', fields=['dbnsfp.genename', 'cadd.phred'])
>>> mv.getvariant('chr9:g.107620835G>A', fields='all')
>>> mv.getvariant('chr1:g.161362951G>A', assembly='hg38')

Hint

The supported field names passed to fields parameter can be found from any full variant object (without fields, or fields=”all”). Note that field name supports dot notation for nested data structure as well, e.g. you can pass “dbnsfp.genename” or “cadd.phred”.

getvariants(ids, fields=None, **kwargs)

Return the list of variant annotation objects for the given list of hgvs-base varaint ids. This is a wrapper for POST query of “/variant” service.

Parameters:
  • ids

    a list/tuple/iterable or a string of comma-separated HGVS ids. More about hgvs id.

  • fields

    fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned. See here for all available fields.

  • assembly – specify the human genome assembly used in HGVS-based variant id, “hg19” (default) or “hg38”.
  • as_generator – if True, will yield the results in a generator.
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:

a list of variant objects or a pandas DataFrame object (when as_dataframe is True)

Ref:

http://docs.myvariant.info/en/latest/doc/variant_annotation_service.html.

Example:

>>> vars = ['chr1:g.866422C>T',
...         'chr1:g.876664G>A',
...         'chr1:g.69635G>C',
...         'chr1:g.69869T>A',
...         'chr1:g.881918G>A',
...         'chr1:g.865625G>A',
...         'chr1:g.69892T>C',
...         'chr1:g.879381C>T',
...         'chr1:g.878330C>G']
>>> mv.getvariants(vars, fields="cadd.phred")
>>> mv.getvariants('chr1:g.876664G>A,chr1:g.881918G>A', fields="all")
>>> mv.getvariants(['chr1:g.876664G>A', 'chr1:g.881918G>A'], as_dataframe=True)
>>> mv.getvariants(['chr1:g.161362951G>A', 'chr2:g.51032181G>A'], assembly='hg38')

Hint

A large list of more than 1000 input ids will be sent to the backend web service in batches (1000 at a time), and then the results will be concatenated together. So, from the user-end, it’s exactly the same as passing a shorter list. You don’t need to worry about saturating our backend servers.

Hint

If you need to pass a very large list of input ids, you can pass a generator instead of a full list, which is more memory efficient.

metadata(verbose=True, **kwargs)

Return a dictionary of MyVariant.info metadata.

Parameters:assembly – return the metadata for either hg19 or hg38 variants, “hg19” (default) or “hg38”.

Example:

>>> metadata = mv.metadata()
>>> metadata = mv.metadata(assembly='hg38')
query(q, **kwargs)

Return the query result. This is a wrapper for GET query of “/query?q=<query>” service.

Parameters:
  • q

    a query string, detailed query syntax here.

  • fields

    fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned. See here for all available fields.

  • assembly – specify the human genome assembly used for the query, “hg19” (default) or “hg38”.
  • size – the maximum number of results to return (with a cap of 1000 at the moment). Default: 10.
  • skip – the number of results to skip. Default: 0.
  • sort – Prefix with “-” for descending order, otherwise in ascending order. Default: sort by matching scores in decending order.
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • fetch_all – if True, return a generator to all query results (unsorted). This can provide a very fast return of all hits from a large query. Server requests are done in blocks of 1000 and yielded individually. Each 1000 block of results must be yielded within 1 minute, otherwise the request will expire at server side.
Returns:

a dictionary with returned variant hits or a pandas DataFrame object (when as_dataframe is True) or a generator of all hits (when fetch_all is True)

Ref:

http://docs.myvariant.info/en/latest/doc/variant_query_service.html.

Example:

>>> mv.query('_exists_:dbsnp AND _exists_:cosmic')
>>> mv.query('dbnsfp.polyphen2.hdiv.score:>0.99 AND chrom:1')
>>> mv.query('cadd.phred:>50')
>>> mv.query('dbnsfp.genename:CDK2', size=5)
>>> mv.query('dbnsfp.genename:CDK2', size=5, assembly='hg38')
>>> mv.query('dbnsfp.genename:CDK2', fetch_all=True)
>>> mv.query('chrX:151073054-151383976')

Hint

By default, query method returns the first 10 hits if the matched hits are >10. If the total number of hits are less than 1000, you can increase the value for size parameter. For a query returns more than 1000 hits, you can pass “fetch_all=True” to return a generator of all matching hits (internally, those hits are requested from the server-side in blocks of 1000).

querymany(qterms, scopes=None, **kwargs)

Return the batch query result. This is a wrapper for POST query of “/query” service.

Parameters:
  • qterms – a list/tuple/iterable of query terms, or a string of comma-separated query terms.
  • scopes

    specify the type (or types) of identifiers passed to qterms, either a list or a comma-separated fields to specify type of input qterms, e.g. “dbsnp.rsid”, “clinvar.rcv_accession”, [“dbsnp.rsid”, “cosmic.cosmic_id”]. See here for full list of supported fields.

  • fields

    fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned. See here for all available fields.

  • assembly – specify the human genome assembly used for the query, “hg19” (default) or “hg38”.
  • returnall – if True, return a dict of all related data, including dup. and missing qterms
  • verbose – if True (default), print out information about dup and missing qterms
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:

a list of matching variant objects or a pandas DataFrame object.

Ref:

http://docs.myvariant.info/en/latest/doc/variant_query_service.html for available fields, extra kwargs and more.

Example:

>>> mv.querymany(['rs58991260', 'rs2500'], scopes='dbsnp.rsid')
>>> mv.querymany(['rs58991260', 'rs2500'], scopes='dbsnp.rsid', assembly='hg38')
>>> mv.querymany(['RCV000083620', 'RCV000083611', 'RCV000083584'], scopes='clinvar.rcv_accession')
>>> mv.querymany(['COSM1362966', 'COSM990046', 'COSM1392449'], scopes='cosmic.cosmic_id', fields='cosmic')
>>> mv.querymany(['COSM1362966', 'COSM990046', 'COSM1392449'], scopes='cosmic.cosmic_id',
...              fields='cosmic.tumor_site', as_dataframe=True)

Hint

querymany() is perfect for query variants based different ids, e.g. rsid, clinvar ids, etc.

Hint

Just like getvariants(), passing a large list of ids (>1000) to querymany() is perfectly fine.

Hint

If you need to pass a very large list of input qterms, you can pass a generator instead of a full list, which is more memory efficient.

set_caching(cache_db=None, verbose=True, **kwargs)

Installs a local cache for all requests.

cache_db is the path to the local sqlite cache database.

stop_caching()

Stop caching.

MyChemInfo

class biothings_client.MyChemInfo(url=None)[source]
clear_cache()

Clear the globally installed cache.

get_fields(search_term=None, verbose=True)

Wrapper for http://mychem.info/v1/metadata/fields

Parameters:search_term – a case insensitive string to search for in available field names. If not provided, all available fields will be returned.

Example:

>>> mc.get_fields()
>>> mc.get_fields("pubchem")
>>> mc.get_fields("drugbank.targets")

Hint

This is useful to find out the field names you need to pass to fields parameter of other methods.

getchem(_id, fields=None, **kwargs)

Return the chemical/drug object for the give id. This is a wrapper for GET query of “/chem/<chem_id>” service.

Parameters:
  • _id – a chemical/drug id, supports InchiKey, Drugbank ID, ChEMBL ID, ChEBI ID, PubChem CID and UNII. More about chemical/drug id.
  • fields

    fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned. See here for all available fields.

Returns:

a chemical/drug object as a dictionary, or None if _id is not found.

Example:

>>> mc.getchem("ZRALSGWEFCBTJO-UHFFFAOYSA-N")
>>> mc.getchem("DB00553", fields="chebi.name,drugbank.id,pubchem.cid")
>>> mc.getchem("CHEMBL1308", fields=["chebi.name", "drugbank.id", "pubchem.cid"])
>>> mc.getchem("7AXV542LZ4", fields="unii")
>>> mc.getchem("CHEBI:6431", fields="chembl.smiles")

Hint

The supported field names passed to fields parameter can be found from any full chemical/drug object (without fields, or fields=”all”). Note that field name supports dot notation for nested data structure as well, e.g. you can pass “drugbank.id” or “chembl.smiles”.

getchems(ids, fields=None, **kwargs)

Return the list of chemical/drug annotation objects for the given list of chemical/drug ids. This is a wrapper for POST query of “/chem” service.

Parameters:
  • ids – a list/tuple/iterable or a string of comma-separated chem/drug ids. More about chem/drug id.
  • fields

    fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned. See here for all available fields.

  • as_generator – if True, will yield the results in a generator.
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:

a list of variant objects or a pandas DataFrame object (when as_dataframe is True)

Ref:

http://docs.mychem.info/en/latest/doc/chem_annotation_service.html.

Example:

>>> chems = [
...     "KTUFNOKKBVMGRW-UHFFFAOYSA-N",
...     "HXHWSAZORRCQMX-UHFFFAOYSA-N",
...     "DQMZLTXERSFNPB-UHFFFAOYSA-N"
... ]
>>> mc.getchems(chems, fields="pubchem")
>>> mc.getchems('KTUFNOKKBVMGRW-UHFFFAOYSA-N,DB00553', fields="all")
>>> mc.getchems(chems, fields='chembl', as_dataframe=True)

Hint

A large list of more than 1000 input ids will be sent to the backend web service in batches (1000 at a time), and then the results will be concatenated together. So, from the user-end, it’s exactly the same as passing a shorter list. You don’t need to worry about saturating our backend servers.

Hint

If you need to pass a very large list of input ids, you can pass a generator instead of a full list, which is more memory efficient.

getdrug(_id, fields=None, **kwargs)

Return the object given id. This is a wrapper for GET query of the biothings annotation service.

Parameters:
  • _id – an entity id.
  • fields – fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned.
Returns:

an entity object as a dictionary, or None if _id is not found.

getdrugs(ids, fields=None, **kwargs)

Return the list of annotation objects for the given list of ids. This is a wrapper for POST query of the biothings annotation service.

Parameters:
  • ids – a list/tuple/iterable or a string of ids.
  • fields – fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned.
  • as_generator – if True, will yield the results in a generator.
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:

a list of objects or a pandas DataFrame object (when as_dataframe is True)

Hint

A large list of more than 1000 input ids will be sent to the backend web service in batches (1000 at a time), and then the results will be concatenated together. So, from the user-end, it’s exactly the same as passing a shorter list. You don’t need to worry about saturating our backend servers.

Hint

If you need to pass a very large list of input ids, you can pass a generator instead of a full list, which is more memory efficient.

metadata(verbose=True, **kwargs)

Return a dictionary of MyChem.info metadata, a wrapper for http://mychem.info/v1/metadata

Example:

>>> metadata = mv.metadata()
query(q, **kwargs)

Return the query result. This is a wrapper for GET query of “/query?q=<query>” service.

Parameters:
  • q

    a query string, detailed query syntax here.

  • fields

    fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned. See here for all available fields.

  • size – the maximum number of results to return (with a cap of 1000 at the moment). Default: 10.
  • skip – the number of results to skip. Default: 0.
  • sort – Prefix with “-” for descending order, otherwise in ascending order. Default: sort by matching scores in decending order.
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • fetch_all – if True, return a generator to all query results (unsorted). This can provide a very fast return of all hits from a large query. Server requests are done in blocks of 1000 and yielded individually. Each 1000 block of results must be yielded within 1 minute, otherwise the request will expire at server side.
Returns:

a dictionary with returned variant hits or a pandas DataFrame object (when as_dataframe is True) or a generator of all hits (when fetch_all is True)

Ref:

http://docs.mychem.info/en/latest/doc/chem_query_service.html.

Example:

>>> mc.query('drugbank.name:monobenzone')
>>> mc.query('drugbank.targets.uniprot:P07998')
>>> mc.query('drugbank.targets.uniprot:P07998 AND _exists_:unii')
>>> mc.query('chebi.mass:[300 TO 500]')
>>> mc.query('sider.side_effect.name:anaemia', size=5)
>>> mc.query('sider.side_effect.name:anaemia', fetch_all=True)

Hint

By default, query method returns the first 10 hits if the matched hits are >10. If the total number of hits are less than 1000, you can increase the value for size parameter. For a query returns more than 1000 hits, you can pass “fetch_all=True” to return a generator of all matching hits (internally, those hits are requested from the server-side in blocks of 1000).

querymany(qterms, scopes=None, **kwargs)

Return the batch query result. This is a wrapper for POST query of “/query” service.

Parameters:
  • qterms – a list/tuple/iterable of query terms, or a string of comma-separated query terms.
  • scopes

    specify the type (or types) of identifiers passed to qterms, either a list or a comma-separated fields to specify type of input qterms, e.g. “dbsnp.rsid”, “clinvar.rcv_accession”, [“dbsnp.rsid”, “cosmic.cosmic_id”]. See here for full list of supported fields.

  • fields

    fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned. See here for all available fields.

  • returnall – if True, return a dict of all related data, including dup. and missing qterms
  • verbose – if True (default), print out information about dup and missing qterms
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:

a list of matching variant objects or a pandas DataFrame object.

Ref:

http://docs.myvariant.info/en/latest/doc/variant_query_service.html for available fields, extra kwargs and more.

Example:

>>> mc.querymany(["ZRALSGWEFCBTJO-UHFFFAOYSA-N", "RRUDCFGSUDOHDG-UHFFFAOYSA-N"])
>>> mc.querymany(["DB00536", 'DB00533'], scopes='drugbank.id')
>>> mc.querymany(["CHEBI:95222", 'CHEBI:45924', 'CHEBI:33325'], scopes='chebi.id')
>>> mc.querymany(["CHEMBL1555813", 'CHEMBL22', 'CHEMBL842'], scopes='chembl.molecule_chembl_id')
>>> mc.querymany(["DB00536", '4RZ82L2GY5'], scopes='drugbank.id,unii.unii')
>>> mc.querymany(["DB00536", '4RZ82L2GY5'], scopes=['drugbank.id', 'unii.unii'])
>>> mc.querymany(["DB00536", '4RZ82L2GY5'], scopes=['drugbank.id', 'unii.unii'], fields='drugbank,unii')
>>> mc.querymany(["DB00536", '4RZ82L2GY5'], scopes=['drugbank.id', 'unii.unii'],
...              fields='drugbank.name,unii',as_dataframe=True)

Hint

querymany() is perfect for query variants based different ids, e.g. rsid, clinvar ids, etc.

Hint

Just like getvariants(), passing a large list of ids (>1000) to querymany() is perfectly fine.

Hint

If you need to pass a very large list of input qterms, you can pass a generator instead of a full list, which is more memory efficient.

set_caching(cache_db=None, verbose=True, **kwargs)

Installs a local cache for all requests.

cache_db is the path to the local sqlite cache database.

stop_caching()

Stop caching.

MyDiseaseInfo

class biothings_client.MyDiseaseInfo(url=None)[source]
clear_cache()

Clear the globally installed cache.

get_fields(search_term=None, verbose=True)

Wrapper for /metadata/fields

search_term is a case insensitive string to search for in available field names. If not provided, all available fields will be returned.

Hint

This is useful to find out the field names you need to pass to fields parameter of other methods.

getdisease(_id, fields=None, **kwargs)

Return the object given id. This is a wrapper for GET query of the biothings annotation service.

Parameters:
  • _id – an entity id.
  • fields – fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned.
Returns:

an entity object as a dictionary, or None if _id is not found.

getdiseases(ids, fields=None, **kwargs)

Return the list of annotation objects for the given list of ids. This is a wrapper for POST query of the biothings annotation service.

Parameters:
  • ids – a list/tuple/iterable or a string of ids.
  • fields – fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned.
  • as_generator – if True, will yield the results in a generator.
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:

a list of objects or a pandas DataFrame object (when as_dataframe is True)

Hint

A large list of more than 1000 input ids will be sent to the backend web service in batches (1000 at a time), and then the results will be concatenated together. So, from the user-end, it’s exactly the same as passing a shorter list. You don’t need to worry about saturating our backend servers.

Hint

If you need to pass a very large list of input ids, you can pass a generator instead of a full list, which is more memory efficient.

metadata(verbose=True, **kwargs)

Return a dictionary of Biothing metadata.

query(q, **kwargs)

Return the query result. This is a wrapper for GET query of biothings query service.

Parameters:
  • q – a query string.
  • fields – fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned.
  • size – the maximum number of results to return (with a cap of 1000 at the moment). Default: 10.
  • skip – the number of results to skip. Default: 0.
  • sort – Prefix with “-” for descending order, otherwise in ascending order. Default: sort by matching scores in decending order.
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • fetch_all – if True, return a generator to all query results (unsorted). This can provide a very fast return of all hits from a large query. Server requests are done in blocks of 1000 and yielded individually. Each 1000 block of results must be yielded within 1 minute, otherwise the request will expire at server side.
Returns:

a dictionary with returned variant hits or a pandas DataFrame object (when as_dataframe is True) or a generator of all hits (when fetch_all is True)

Hint

By default, query method returns the first 10 hits if the matched hits are >10. If the total number of hits are less than 1000, you can increase the value for size parameter. For a query that returns more than 1000 hits, you can pass “fetch_all=True” to return a generator of all matching hits (internally, those hits are requested from the server in blocks of 1000).

querymany(qterms, scopes=None, **kwargs)

Return the batch query result. This is a wrapper for POST query of “/query” service.

Parameters:
  • qterms – a list/tuple/iterable of query terms, or a string of comma-separated query terms.
  • scopes – specify the type (or types) of identifiers passed to qterms, either a list or a comma-separated fields to specify type of input qterms.
  • fields – fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned.
  • returnall – if True, return a dict of all related data, including dup. and missing qterms
  • verbose – if True (default), print out information about dup and missing qterms
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:

a list of matching objects or a pandas DataFrame object.

Hint

Passing a large list of ids (>1000) to querymany() is perfectly fine.

Hint

If you need to pass a very large list of input qterms, you can pass a generator instead of a full list, which is more memory efficient.

set_caching(cache_db=None, verbose=True, **kwargs)

Installs a local cache for all requests.

cache_db is the path to the local sqlite cache database.

stop_caching()

Stop caching.

MyTaxonInfo

class biothings_client.MyTaxonInfo(url=None)[source]
clear_cache()

Clear the globally installed cache.

get_fields(search_term=None, verbose=True)

Wrapper for /metadata/fields

search_term is a case insensitive string to search for in available field names. If not provided, all available fields will be returned.

Hint

This is useful to find out the field names you need to pass to fields parameter of other methods.

gettaxa(ids, fields=None, **kwargs)

Return the list of annotation objects for the given list of ids. This is a wrapper for POST query of the biothings annotation service.

Parameters:
  • ids – a list/tuple/iterable or a string of ids.
  • fields – fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned.
  • as_generator – if True, will yield the results in a generator.
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:

a list of objects or a pandas DataFrame object (when as_dataframe is True)

Hint

A large list of more than 1000 input ids will be sent to the backend web service in batches (1000 at a time), and then the results will be concatenated together. So, from the user-end, it’s exactly the same as passing a shorter list. You don’t need to worry about saturating our backend servers.

Hint

If you need to pass a very large list of input ids, you can pass a generator instead of a full list, which is more memory efficient.

gettaxon(_id, fields=None, **kwargs)

Return the object given id. This is a wrapper for GET query of the biothings annotation service.

Parameters:
  • _id – an entity id.
  • fields – fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned.
Returns:

an entity object as a dictionary, or None if _id is not found.

metadata(verbose=True, **kwargs)

Return a dictionary of Biothing metadata.

query(q, **kwargs)

Return the query result. This is a wrapper for GET query of biothings query service.

Parameters:
  • q – a query string.
  • fields – fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned.
  • size – the maximum number of results to return (with a cap of 1000 at the moment). Default: 10.
  • skip – the number of results to skip. Default: 0.
  • sort – Prefix with “-” for descending order, otherwise in ascending order. Default: sort by matching scores in decending order.
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • fetch_all – if True, return a generator to all query results (unsorted). This can provide a very fast return of all hits from a large query. Server requests are done in blocks of 1000 and yielded individually. Each 1000 block of results must be yielded within 1 minute, otherwise the request will expire at server side.
Returns:

a dictionary with returned variant hits or a pandas DataFrame object (when as_dataframe is True) or a generator of all hits (when fetch_all is True)

Hint

By default, query method returns the first 10 hits if the matched hits are >10. If the total number of hits are less than 1000, you can increase the value for size parameter. For a query that returns more than 1000 hits, you can pass “fetch_all=True” to return a generator of all matching hits (internally, those hits are requested from the server in blocks of 1000).

querymany(qterms, scopes=None, **kwargs)

Return the batch query result. This is a wrapper for POST query of “/query” service.

Parameters:
  • qterms – a list/tuple/iterable of query terms, or a string of comma-separated query terms.
  • scopes – specify the type (or types) of identifiers passed to qterms, either a list or a comma-separated fields to specify type of input qterms.
  • fields – fields to return, a list or a comma-separated string. If not provided or fields=”all”, all available fields are returned.
  • returnall – if True, return a dict of all related data, including dup. and missing qterms
  • verbose – if True (default), print out information about dup and missing qterms
  • as_dataframe – if True or 1 or 2, return object as DataFrame (requires Pandas). True or 1: using json_normalize 2 : using DataFrame.from_dict otherwise: return original json
  • df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:

a list of matching objects or a pandas DataFrame object.

Hint

Passing a large list of ids (>1000) to querymany() is perfectly fine.

Hint

If you need to pass a very large list of input qterms, you can pass a generator instead of a full list, which is more memory efficient.

set_caching(cache_db=None, verbose=True, **kwargs)

Installs a local cache for all requests.

cache_db is the path to the local sqlite cache database.

stop_caching()

Stop caching.