Page tree
Skip to end of metadata
Go to start of metadata

Contents

Introduction

The Dbfetch services provide access to entries from various up-to-date biological databases using entry identifiers or accession numbers. 

Important Note

  • We kindly ask all users of EMBL-EBI Web Services to submit tool jobs in batches of no more than 30 at a time and not to submit more until the results and processing is complete. Please ensure that a valid email address is provided. Excessive usage of a particular resource will be dealt with in accordance with EMBL-EBI's Terms of Use. Please contact us if you need further information.

How to Access Dbfetch

Dbfetch can be accessed via

Web Form

Web interface for Dbfetch is available at : https://www.ebi.ac.uk/Tools/dbfetch/

See Dbfetch Help page for more details on how to access the tool.

Important links


Web Services

Web Services are available using REST and SOAP protocols that enable programmatic access and allow their integration into other applications and analytical workflows and pipelines. 

For an introduction on how to run these clients and use them in workflows please see the webinar series.


REST API

The Representational State Transfer (REST) sample clients are provided for a number of programming languages. For details of how to use these clients, download the client and run the program without any arguments.

LanguageDownloadRequirements
Perldbfetch.pl
LWP and YAML::Syck
Pythondbfetch.py
xmltramp2

For details see Environment setup for REST Web Services and Examples for Perl REST Web Services Clients pages. 


Info

WADL

The WADL for dbfetch interfaces can be found at:

Note: due to the wide range of possible response formats for a dbfetch/WSDbfetch call (e.g. EMBL flatfile, EMBLXML, INSDXML, GenBank, UniProtKB flatfile, UniProtKB XML, UniRef XML, UniParc XML, fasta format, etc.) the format of the response cannot be easily specified in the WADL description. For details of the entry formats please see the documentation of the database(s) of interest. If you need to parse the results then you may find bioinformatics tool-kits such as:

Input

Formats -  refers to sequence and sequence features formats, e.g. FASTAGCGEMBLGenBankPHYLIP or UniprotKB/SwissProt.

Styles -  refers to how entries are rendered in the browser or client. HTML has markup, while raw is plain text. The default is (almost always) html. 

Users should refer to the individual data resource in the table below, for further information. 

Select DatabaseFormatStyleSearch Items
EDAM (edam)defaultdefault, html, raw

"database name":"id"

or upload file

obodefault, html, raw
 Input List (continued)
Select DatabaseFormatStyleSearch Items

ENA Coding (ena_coding)

defaultdefault, html, raw

"database name":"id"

or upload file

annotdefault, html, raw
embldefault, html, raw
emblxml-1.1default, raw
emblxml-1.2default, raw
entrysizedefault, html, raw
fastadefault, html, raw
seqxmldefault, raw
ENA Geospatial  (ena_geospatial)defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
emblxml-1.2default, raw
entrysizedefault, html, raw
fastadefault, html, raw
seqxmldefault, raw
ENA Non-coding  (ena_noncoding)defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
emblxml-1.2default, raw
entrysizedefault, html, raw
fastadefault, html, raw
seqxmldefault, raw
ENA Sequence  (ena_sequence)defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
emblxml-1.1default, raw
emblxml-1.2default, raw
entrysizedefault, html, raw
fastadefault, html, raw
insdxmldefault, raw
seqxmldefault, raw

ENA Sequence Constructed (ena_sequence_con)

defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
emblxml-1.1default, raw
emblxml-1.2default, raw
entrysizedefault, html, raw
fastadefault, html, raw
insdxmldefault, raw
seqxmldefault, raw

ENA Sequence Constructed Expanded (ena_sequence_conexp)

defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
emblxml-1.1default, raw
emblxml-1.2default, raw
entrysizedefault, html, raw
fastadefault, html, raw
insdxmldefault, raw
seqxmldefault, raw

ENA/SVA (ena_sva)

defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
entrysizedefault, html, raw
fastadefault, raw

Ensembl Gene (ensemblgene)

defaultdefault, raw
csvdefault, raw
embldefault, raw
fastadefault, raw
genbankdefault, raw
gff2default, raw
gff3default, raw
tabdefault, raw

Ensembl Genomes Gene (ensemblgenomesgene)

defaultdefault, raw
csvdefault, raw
embldefault, raw
fastadefault, raw
genbankdefault, raw
gff2default, raw
gff3default, raw
tabdefault, raw

Ensembl Genomes Transcript (ensemblgenomestranscript)

defaultdefault, raw
fastadefault, raw

Ensembl Transcript (ensembltranscript)

defaultdefault, raw
fastadefault, raw

EPO Proteins (epo_prt)

defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
entrysizedefault, html, raw
fastadefault, html, raw
seqxmldefault, raw

HGNC (hgnc)

defaultdefault, html, raw
tabdefault, html, raw

IMGT/HLA (imgthla)

defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
entrysizedefault, html, raw
seqxmldefault, raw
fastadefault, html, raw

IMGT/LIGM-DB (imgtligm)

defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
entrysizedefault, html, raw
seqxmldefault, raw
fastadefault, html, raw

InterPro (interpro)

defaultdefault, html, raw
interprodefault, html, raw
interproxmldefault, raw
tabdefault, html, raw

IPD-KIR (ipdkir)

defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
entrysizedefault, html, raw
seqxmldefault, raw
fastadefault, html, raw

IPD-MHC (ipdmhc)

defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
entrysizedefault, html, raw
seqxmldefault, raw
fastadefault, html, raw

IPRMC (iprmc)

defaultdefault, html, raw
gff2default, html, raw
iprmcdefault, html, raw
iprmctabdefault, html, raw
iprmcxmldefault, raw
dasgffdefault, raw

IPRMC UniParc (iprmcuniparc)

defaultdefault, html, raw
gff2default, html, raw
iprmcdefault, html, raw
iprmctabdefault, html, raw
iprmcxmldefault, raw
dasgffdefault, raw

JPO Proteins (jpo_prt)

defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
entrysizedefault, html, raw
fastadefault, html, raw
seqxmldefault, raw
ddbjdefault, html, raw

KIPO Proteins (kipo_prt)

defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
entrysizedefault, html, raw
fastadefault, html, raw
seqxmldefault, raw
ddbjdefault, html, raw

MEDLINE (medline)

defaultdefault, html, raw
medlinefulldefault, html, raw
medlinerefdefault, html, raw
bibtexdefault, raw
endnotedefault, raw
isidefault, raw
modsxmldefault, raw
pubmedxmldefault, raw
risdefault, raw
wordbibxmldefault, raw

Patent DNA NRL1 (nrnl1)

defaultdefault, html, raw
annotdefault, html, raw
entrysizedefault, html, raw
nrl1default, html, raw
seqxmldefault, raw
fastadefault, html, raw

Patent DNA NRL2 (nrnl2)

defaultdefault, html, raw
annotdefault, html, raw
entrysizedefault, html, raw
nrl2default, html, raw
seqxmldefault, raw
fastadefault, html, raw

Patent Protein NRL1 (nrpl1)

defaultdefault, html, raw
annotdefault, html, raw
entrysizedefault, html, raw
nrl1default, html, raw
seqxmldefault, raw
fastadefault, html, raw

Patent Protein NRL2 (nrpl2)

defaultdefault, html, raw
annotdefault, html, raw
entrysizedefault, html, raw
nrl2default, html, raw
seqxmldefault, raw
fastadefault, html, raw

Patent Equivalents (patent_equivalents)

defaultdefault, html, raw
patent_equivalentsdefault, html, raw

PDB (pdb)

defaultdefault, html, raw
fastadefault, raw
annotdefault, html, raw
mmcifdefault, raw
pdbdefault, html, raw
pdbmldefault, raw
rdfxmldefault, raw

RefSeq (nucleotide) (refseqn)

defaultdefault, html, raw
annotdefault, html, raw
entrysizedefault, html, raw
fastadefault, html, raw
insdxmldefault, raw
refseqdefault, html, raw
seqxmldefault, raw
tinyseqdefault, raw

RefSeq (protein) (refseqp)

defaultdefault, html, raw
annotdefault, html, raw
entrysizedefault, html, raw
fastadefault, html, raw
insdxmldefault, raw
refseqpdefault, html, raw
seqxmldefault, raw
tinyseqdefault, raw

SGT (sgt)

defaultdefault, raw
annotdefault, html, raw
fastadefault, html, raw
seqxmldefault, raw
sgtxmldefault, raw

Taxonomy (taxonomy)

defaultdefault, html, raw
taxonomydefault, html, raw
enataxonomyxmldefault, raw
uniprottaxonomyrdfxmldefault, raw

Trace Archive (tracearchive)

defaultdefault, raw
fastadefault, raw
fastqdefault, raw
tracexmldefault, raw

UniParc (uniparc)

defaultdefault, raw
fastadefault, raw
seqxmldefault, raw
uniparcdefault, raw
uniprotrdfxmldefault, raw

UniProtKB (uniprotkb)

defaultdefault, html, raw
annotdefault, html, raw
entrysizedefault, html, raw
fastadefault, html, raw
gff3default, html, raw
seqxmldefault, raw
uniprotdefault, html, raw
uniprotrdfxmldefault, raw
uniprotxmldefault, raw

UniRef100 (uniref100)

defaultdefault, raw
fastadefault, raw
seqxmldefault, raw
uniprotrdfxmldefault, raw
uniref100default, raw

UniRef50 (uniref50)

defaultdefault, raw
fastadefault, raw
seqxmldefault, raw
uniprotrdfxmldefault, raw
uniref50default, raw

UniRef90 (uniref90)

defaultdefault, raw
fastadefault, raw
seqxmldefault, raw
uniprotrdfxmldefault, raw
uniref90default, raw

UniSave (unisave)

defaultdefault, raw
annotdefault, raw
entrysizedefault, html, raw
fastadefault, raw
uniprotdefault, raw

USPTO Proteins (uspto_prt)

defaultdefault, html, raw
annotdefault, html, raw
embldefault, html, raw
entrysizedefault, html, raw
fastadefault, html, raw
seqxmldefault, raw
genpeptdefault, html, raw
insdxmldefault, raw
tinyseqdefault, raw

Format Examples

Formats refers to sequence and sequence features formats. Users should refer to the individual data resource in the table below, for further information. 

See examples of values that Dbfetch accepts:

Tool parameters

Parameters


getSupportedDBslist available databases
getSupportedFormatslist available databases with formats
getSupportedStyleslist available databases with styles
getDbFormatslist formats for a specified database
getFormatStyleslist styles for a specified database and format
fetchDataretrieve an database entry. See below for details of arguments.
fetchBatchretrieve database entries. See below for details of arguments. 
Arguments
dbName:id database name and entry ID or accession (e.g. UNIPROT:WAP_RAT), use @fileName to read identifiers from a file or @- to read identifiers from STDIN.
format    format to retrive (e.g. uniprot)
style     style to retrive (e.g. raw) 

Top


SOAP API

The Simple Object Access Protocol (SOAP) sample clients are provided for a number of programming languages. For details of how to use these clients, download the client and run the program without any arguments.

WSDL

There are three interfaces to the WSDbfetch (SOAP) service, each with its own WSDL:

  1. WSDBFetchDoclitServerService (document/literal): http://www.ebi.ac.uk/ws/services/WSDbfetchDoclit?wsdl
    Recommended for development of new clients. For compatible SOAP tool-kits WSDLs with additional semantic annotations (using SAWSDL) are available:

  2. WSDBFetchServerService (RPC/encoded): http://www.ebi.ac.uk/ws/services/WSDbfetch?wsdl
    Provided for compatibility with SOAP tool-kits which support only RPC/encoded SOAP services and for use by older clients.

  3. WSDBFetchServerLegacyService (RPC/encoded): http://www.ebi.ac.uk/ws/services/urn:Dbfetch?wsdl
    Deprecated provided solely for compatibility with old clients, please do not use for new development.


Usage

fetchBatch(db, ids, format, style)

Fetch a set of entries in a defined format and style.

Arguments:

  • db: the name of the database to obtain the entries from. For example: “uniprotkb”

  • query: a string containing a comma separated list of entry identifiers. For example: “WAP_MOUSE,WAP_RAT”

  • format: the name of the format required. To get the default format for the database use the value “default”.

  • style: the name of the style required. To get the default style for the database use the value “default”.

Returns:

The format of the response depends on the interface to the service used:

  • WSDBFetchServerService and WSDBFetchDoclitServerService: the entries as a string.

  • WSDBFetchServerLegacyService: an array of strings containing the entries. Generally this will contain only one item which contains the set of entries.

Throws:

  • DbfConnException: an issue occurred connecting to the database.

  • DbfException: an issue occurred performing the query.

  • DbfParamsException: input parameters failed to validate.

  • DbfNoEntryFoundException: no entries were found matching the parameters.

  • InputException: missing or incorrectly formatted parameters.

fetchData(query, format, style)

Fetch an entry in a defined format and style.

Arguments:

  • query: the entry identifier in db:id format. For example: “UniProtKB:WAP_RAT”

  • format: the name of the format required. To get the default format for the database use the value “default”.

  • style: the name of the style required. To get the default style for the database use the value “default”.

Returns:

The format of the response depends on the interface to the service used:

  • WSDBFetchServerService and WSDBFetchDoclitServerService: the entries as a string.

  • WSDBFetchServerLegacyService: an array of strings containing the entries. Generally this will contain only one item which contains the set of entries.

Throws:

  • DbfConnException: an issue occurred connecting to the database.

  • DbfException: an issue occurred performing the query.

  • DbfParamsException: input parameters failed to validate.

  • DbfNoEntryFoundException: no entries were found matching the parameters.

  • InputException: missing or incorrectly formatted parameters.

getDatabaseInfo(db)

Get details describing specific database, including the available data formats and result styles.

Note: WSDBFetchDoclitServerService (document/literal) only.

Arguments:

  • db: database name to get details for. For example: “uniprotkb”

Returns: a data structure describing the database:

  • DatabaseInfo: details of a database.

    • aliasList: list of other names for the database.

      • alias: alias for the database, can be used in requests.

    • databaseTerms: list of ontology terms describing the database. Typically from DRCATEDAM and/or MIRIAM.

      • databaseTerm: ontology term describing the database. May be represented as a namespace:id identifier or a URI.

    • dataResourceInfoList: list of data resources used to provide obtain data for this database.

      • dataResourceInfo: summary information decribing the data resource.

        • href: link to the data resource main page or documentation.

        • name: name for/of the data resource.

    • defaultFormat: name of the data format used as the default.

    • description: a short description of the database, suitable for use in help documentation.

    • displayName: name of the database to be displayed in user interfaces. Not to be used for retrieval.

    • exampleIdentifiers: a set of example identifiers for entries appearing in the database.

      • accessionList: list of example accession numbers.

        • accession: example accession number.

      • entryVersionList: list of example entry version identifiers.

        • entryVersion: example entry version identifier.

      • idList: list of example Ids.

        • id: example Id.

      • nameList: list of example entry names.

        • name: example entry name.

      • sequenceVersionList: list of sequence version identifiers.

        • sequenceVersion: sequence version identifier.

    • formatInfoList: list of detailed information describing the available data formats.

      • formatInfo: details of an available data format.

        • aliases: list of other names for this data format, can be used in requests.

          • alias: an alias for this data format.

        • dataTerms: list of ontology terms describing the content of the data format. Typically from EDAM.

          • dataTerm: ontology term describing the content of the data format. May be represented as a namespace:id identifier or a URI.

        • name: name for the data format, used in requests.

        • styleInfoList: list of details of available result styles.

          • styleInfo: details of an available result style.

            • mimeType: MIME type for results in this format and style.

            • name: name of this result style, to be used in requests.

        • syntaxTerms: list of ontology terms describing the syntax of the data format. Typically from EDAM.

          • syntaxTerm: ontology term describing the syntax of the format. May be represented as a namespace:id identifier or a URI.

    • href: URL to the main database site, for use in help or documentation.

    • name: Name for the database, used in requests.

Throws:

  • DbfParamsException: input parameters failed to validate.

getDatabaseInfoList()

Get details of all available databases, includes details of the available data formats and result styles.

Note: WSDBFetchDoclitServerService (document/literal) only.

Argumentsnone

Returns: a list of data structures describing the databases. See getDatabaseInfo(db) for a description of the data structure.

getDbFormats(db)

Get a list of format names for a given database.

Arguments:

  • db: database name to get available formats for. For example: “uniprotkb”

Returns: an array of strings containing the format names.

Throws:

  • DbfParamsException: input parameters failed to validate.

getFormatStyles(db, format)

Get a list of style names available for a given database and format.

Arguments:

  • db: database name to get available styles for. For example: “uniprotkb”

  • format: the data format to get available styles for. For example: “fasta”

Returns: an array of strings containing the style names.

Throws:

  • DbfParamsException: input parameters failed to validate.

getSupportedDBs()

Get a list of database names usable with WSDbfetch.

Argumentsnone

Returns: an array of strings containing the database names.

getSupportedFormats()

Get a list of database and format names usable with WSDbfetch.

Deprecated: use of getDbFormats(db), getDatabaseInfo(db) or getDatabaseInfoList() is recommended.

Argumentsnone

Returns: an array of strings containing the database and format names. For example:

uniprotkb       default,fasta,uniprot,uniprotxml

getSupportedStyles()

Get a list of database and style names usable with WSDbfetch.

Deprecated: use of getFormatStyles(db, format), getDatabaseInfo(db) or getDatabaseInfoList() is recommended.

Argumentsnone

Returns: an array of strings containing the database and style names. For example:

uniprotkb       default,html,raw

DbfException

Generic exception used for errors thrown by the WSDbfetch service.

Subclasses:

  • DbfParamsException: 

  • DbfConnException:

  • DbfNoEntryFoundException:

DbfParamsException

Exception indicating that the input parameters failed to validate. The message contains specific information about the cause. For example:

  • uk.ac.ebi.jdbfetch.exceptions.DbfParamsException: Database, wibble , not found!
  • uk.ac.ebi.jdbfetch.exceptions.DbfParamsException: No database found for the job:
    Job: 
            Database name: wibble
            Style: raw
            Format: default
            IDs:    blah
    

Parent class: DbfException

DbfConnException

Exception indicating that there was a problem contacting the database to retrieve the requested data. For example:

uk.ac.ebi.jdbfetch.exceptions.DbfConnException: Unexpected error when opening stream on the URL, please contact support@ebi.ac.uk

Parent class: DbfException

DbfNoEntryFoundException

Exception indicating that no entries were found in the database which matches the request. For example:

uk.ac.ebi.jdbfetch.exceptions.DbfNoEntryFoundException: No result found

Common causes of the error include:

  • Invalid identifier used for the specified database. For example: EMBL-Bank does not contain entries with RefSeq identifiers (e.g. XM_002123246).

  • Use of an identifier which is not supported for the specified database. For example: the EMBL-Bank query used by WSDbfetch does not support sequence versions (e.g. L12345.1) instead the accession (e.g. L12345) should be used, or the archive database EMBL-SVA, which does support sequence versions, should be queried.

Parent class: DbfException

InputException

Exception indicating that required parameters were not specified, or were in an incorrect format. For example:

  • uk.ac.ebi.jdbfetch.ws.wsdbfetch.InputException: The 'query' argument doesn't contain ':'. Proper format= DBNAME:ID
  • uk.ac.ebi.jdbfetch.ws.wsdbfetch.InputException: No database found in the query
  • uk.ac.ebi.jdbfetch.ws.wsdbfetch.InputException: No ID found in the query


Top


Common Workflow Language

In addition to these sample clients users have submitted workflows using these services to the myExperiment workflow repository. See workflows using the WSDbfetch Web Services for a list.

CWL (Common Workflow Language) implementation for consuming EMBL-EBI Bioinformatics Web Services tools' clients are available at https://github.com/ebi-wp/webservice-cwl

For details, see CWL Workflows page. 

Top


Reference 

EBI  Web Services

Lopez R, Cowley A, Li W, McWilliam H. (2014) Using EMBL-EBI Services via Web Interface and Programmatically via Web Services. Curr Protoc Bioinformatics Volume 48 p.3.12.1-3.12.50. DOI: 10.1002/0471250953.bi0312s48

McWilliam H., Li W., Uludag M., Squizzato S., Park Y.M., Buso N., Cowley A.P., Lopez R. (2013) Analysis Tool Web Services from the EMBL-EBI Nucleic Acids Research 41: W597-W600. PubMed Id: 23671338 Abstract DOI: 10.1093/nar/gkt376

Top


  • No labels