EMDB Search Engine documentation

Text Search

You can use the search bar near the top of all EMDB pages to search across any data deposited at EMDB and EMPIAR.

The EMDB search engine was built using the Apache Solr server, therefore you can check the Solr query parser tutorial to find further information about how to use this engine. You can also check the full list of EMDB query fields.

Query Syntax

Free text terms

Most of the text fields are broken up into single words before indexing. This procedure allows you to perform searches based on single terms or phrases (using double quotes).

The search below returns all the entries that contain the word "zika" anywhere:

zika

The search below returns all the entries that contain the phrase "monoclonal antibody" anywhere:

"monoclonal antibody"
Boolean operators

More complex queries can be created by combining multiple terms and boolean operators.

Boolean operator Description Example
AND Both terms are required. spliceosome AND human
OR One of the terms is required. spliceosome OR ribonucleoprotein
NOT The following term must not be present. NOT Relion

The search terms can be grouped to form sub-queries and parentheses are used to define the order of the sub-queries. The query below returns all entries that contain the words spliceosome or ribonucleoprotein, but do not contain the word human.

"(spliceosome OR ribonucleoprotein) AND NOT human"
Search by specific fields

Searching by specific fields allows the user to obtain more confident results. All you need to do is specify the field followed by a colon (":") and the term that you are searching for within the field. All the query syntax previously described is also available for specific-field searches. The full list of fields and their description can be found here.

An example to find all the human spliceosomes or ribonucleoproteins, using the sample name and the human taxonomy ID: 9606:

(sample_name:ribonucleoprotein OR sample_name:spliceosome) AND natural_source_ncbi_code:9606
Range Search

The search egnine allows the user to find entries that fall within a given range. The upper and lower bounds must be provided in the following format: "[X TO Y]". The example below finds all entries with a resolution value between 1Å and 3Å.

resolution:[1 TO 3]

You can also use curly brackets to set exclusive range intervals. In the example below we are retrieving all entries with resolution value between 1Å and 3Å, but excluding entries with exactly 3Å resolution.

resolution:[1 TO 3}

Range queries are not restricted to numeric fields, it is possible to search over date and text ranges. You can find the Solr documentation about date formats here. The example below returns all entries that contain samples starting with the letter "X", "Y" or "Z".

sample_name:[X TO Z]

An asterisk ("*") can be used as to match any values. It can be included in a range query to define an infinite interval. The example below finds all entries with resolution value greater than or equal to 20Å.

resolution:[20 TO *]

Another use of the asterisk is to match every entry that has a value for that field. The example below list all entries that contain half-maps:

half_map_filename:[* TO *]
Fuzzy Search

The EMDB search engine allows the user to search by similar terms using the Damerau-Levenshtein Distance (basically, the number of edit operations necessary to transform one string in another). To perform a fuzzy search, you need to add the tilde symbol ("~") after the query term followed by the maximum distance. The example below will match any sample term that is at most one text edit from the word DNA. In other words, it will return all entries containing samples of DNA or RNA.

"sample_type:dna~1"