EBI Search Advanced Documentation

Here are the answers to some more advanced questions you may have about using the EBI search, or technical questions about its implementation.

How do I specify a complex search query in the search box?

The search box supports an advanced query syntax to allow you to construct specific queries. By default, search terms separated by spaces are interpreted as a boolean "AND" operation, and specifying "AND" between terms has the same effect:

  • apoptosis: Results contain the word "apoptosis".
  • BRCA2 human: Results contain both "BRCA2" and "human".
  • BRCA2 AND human: As above.

Specifying "OR" between terms locates results containing either term:

  • BRCA2 OR BRCA1: Results contain either "BRCA2" or "BRCA1".

Specifying "NOT", or prepending a minus (-) character to a term, searches for records without that term. By contrast, prepending a plus (+) character requires the term to be present:

  • cancer NOT breast: Results contain "cancer", but do not contain "breast".
  • cancer -breast: As above.
  • cancer -breast +human: Results contain both "cancer" and "human", but do not contain "breast".

Wildcards can be used instead of alphanumeric characters to expand the matches for a search term. The "*" character matches any number of characters, whereas the "?" character matches a single character:

  • gluta*: Matches "glutacin", "glutamate", "glutamic", etc.
  • b?nd: Matches "bind", "bond", "band", etc.

Enclosing a series of words in double quotes searches for an exact phrase. Note that boolean operators and wildcards have no effect in a phrase:

  • "liver cancer": Results contain the complete phrase "liver cancer".

You may use parentheses to control the combination of multiple AND/OR/NOT operators:

  • (BRCA2 OR BRCA1) human: Results contain "human", and also contain either "BRCA2" or "BRCA1".
  • "liver cancer" AND (human OR mouse): Results contain the phrase "liver cancer", and also contain either "human" or "mouse".
  • cancer AND human NOT (breast OR virus): Results contain both "cancer" and "human", but do not contain either "breast" or virus".
  • cancer AND human NOT (breast OR virus): Results contain both "cancer" and "human", but do not contain both "breast" and virus". Note that results containing just one of "breast" or "virus" will still be included.

It is also possible to perform database-specific searches, and search individual fields of a database's records. See the advanced search section for more information.

How do I use the advanced search option?

The "advanced search" link adjacent to the main search box provides an interface to easily specify complex queries. There are separate input fields for:

  • Terms which must all be present for a match.
  • A phrase which must be present in its entirety and order for a match.
  • Terms of which any may be present for a match (at least one must be present).
  • Terms that matches must not contain.

You may also restrict your search to a specific database by clicking the "domain-specific search" link. This allows you to choose a domain (a database, or categories of databases), to restrict the search to. After selecting the domain, you can add your search terms as normal.

If you are searching a single database, you can further refine the search by selecting only specific fields from that database to search. For example, if searching PDBe you can choose to search only the "authors" field. Selecting multiple fields will look for your search terms in any of the chosen fields.

Similar to the "field search", you can also search a single database for entries that have cross-references to record identifiers in a second database. For example, selecting the "UniProt" cross-reference in the "Ensembl Gene" domain allows you to search for Ensembl genes which have a cross reference to a specific UniProt identifier.

Do you provide programmatic access to the search?

The search engine can be accessed over the web or programmatically using a SOAP Web Services interface. This allows its search and retrieval capabilities to be exploited in workflows and analytical pipe-lines. See the Search Web Services API.

The above SOAP API does not include the gene and protein summaries. However the data for these are available as a series of REST XML web services in the form of Distributed Annotation System (DAS) sources. These can be accessed as follows:

DAS sources for the gene section

DAS source for the expression section

DAS sources for the protein section

DAS sources for the protein structure section

DAS sources for the literature section