spacer
spacer

Searching ArrayExpress

1. Searching the Experiment Archive
  1.1 Simple searches
  1.2 Advanced searches
    1.2.1 Filter experiments to show data directly submitted to ArrayExpress only (not GEO-imported data)
    1.2.2 Filter experiments by species, array design, molecule or technology
    1.2.3 Combining search terms with AND, OR and NOT
    1.2.4 Specifying fields for searches
    1.2.5 Filtering experiments by counts of assays, samples, experimental factors etc.
  1.3 Using the legacy interface (previously the advanced query interface)
    1.3.1 Experiment attributes
    1.3.2 Array attributes
    1.3.3 Protocol attributes
  1.4 Login to view private data
  1.5 RSS feed
  1.6 Downloading data/FTP archives and programmatic access
    1.6.1 Files available from the FTP site
    1.6.2 MAGE-TAB format description
    1.6.3 Programmatic access
  1.7 Understanding archive search results
 
2. Searching the Gene Expression Atlas
  2.1 Atlas searches
  2.2 Advanced Atlas searches

 

 

1. Searching the Experiments Archive

Top

1.1. Simple Searches

  1. enter an experiment accession number or keyword (e.g. RNAi, breast cancer) in the query box on the left-hand panel on the ArrayExpress home page http://www.ebi.ac.uk/arrayexpress, or in the query box on a results page

Archive browse experiments image

Archive query box image

  1. all experiments where your term is found in any of these fields will be returned:
    1. ArrayExpress accession number e.g. E-MEXP-568
    2. secondary accession numbers e.g. GEO series accession GSE5389, ENA study accession number ERP000054
    3. experiment name
    4. submitter's experiment description
    5. sample and experimental factor attribute classifiers and values, including species (e.g. GeneticModification, Mus musculus, DREB2C over-expression)
    6. publication title, authors and journal name, PubMed ID
    7. array design name and accession
  1. synonyms for terms are always included in searches e.g. 'human' and 'Homo sapiens'
  2. a drop down menu will show matching terms in the Experimental Factor Ontology (marked EFO) or terms that exist in any record. The Experimental Factor Ontology is an application-focused ontology modelling the experimental factors in ArrayExpress. The Experimental Factor Ontology expansion affects values that are experiment types, sample attributes, experimental factor values and species. In the search results exact matches are highlighted yellow, synonyms green and child terms pink. See the search results help page here: Understanding archive browse/search results.
Image of archive query expansion

 

  1. use * as a wildcard for 0 or more characters and ? for single characters e.g. embryo* will retrieve results with matches to embryo and embyronic, te?t will search for test and text. Wildcards will not work within phrases (see below)
  2. put quotes around phrases where you want to search for more than one word together e.g. "bone marrow"
  3. US spelling conventions are used e.g. leukemia not leukaemia, although common terms are searched for in both UK and US spellings
  4. ArrayExpress uses latin names for species e.g Homo sapiens. For some species a search for the common name will bring up results but search for the latin name to be sure you find all relevant experiments.
  5. Non-standard character sets are not supported, e.g. greek symbols
  6. To browse all experiments in ArrayExpress click on the 'Browse experiments' link.

The search results page is described here: Understanding archive browse/search results.

Top

1.2. Advanced searches

1.2.1 Filter experiments to show data directly submitted to ArrayExpress only (not GEO-imported data)

We import data from the Gene Expression Omnibus (GEO, www.ncbi.nlm.nih.gov/geo/). To limit the search to only the experiments submitted directly to ArrayExpress and not imported from GEO check the box under the search box. For more information about how we import data from GEO see the GEO data help page.

 

image of text box for AE data only

Top

1.2.2 Filter experiments by species, array design, molecule or technology

Experiments can be filtered by species, array design, molecule (DNA, RNA, metabolite, protein) or technology (array, high-throughput sequencing, mass spectrometry) using the drop down menus in the centre of the top search option bar. After selecting a filter, click on the 'Query' box on the right hand side to filter experiments. To remove a filter, either re-select the top option from the list, or click on the '[reset]' link, and then click on the 'Query' box again to requery ArrayExpress.

 

Filtering options

Top

1.2.3. Combining search terms

Enter two or more keywords in the search box with the operators AND, OR or NOT. AND is the default search term; a search for 'prostate breast' will return hits with a match to 'prostate' AND 'breast'.

Search terms of more than one word must be entered inside quotes otherwise only the first word will be searched for. E.g. transcription AND Rattus norvegicus will effectively be a search for transcription AND Rattus.

If a field is not specified (see below) then the term is search in any of the experiment fields (experiment description, sample annotation, citation etc).

 

Operator Searches Example
AND Experiments with more than one term. This is the default term AND query
OR Experiments with either term. OR query
NOT Experiments without a term NOT query

 

Top

 

1.2.4. Specifying fields for searches

Particular fields for searching can also be specified in the format of fieldname:value. Again, phrases of more than one word must be entered in quotes otherwise only the first word will be searched for. The fields that can be searched are shown in the table below.

 

Field name Searches Example
accession Experiment primary or secondary accession accession query
array Array design accession or name array query
ef Experimental factor, the name of the main variables in an experiment. experimental factor query
efv Experimental factor value. Has EFO expansion. experimental factor value query
expdesign Experiment design type experiment design type query
exptype Experiment type. Has EFO expansion. experiment type category
gxa Presence in the Gene Expression Atlas. Only value is gxa:true. atlas query
pmid PubMed identifier pubmed query
sa

Sample attribute values. Has EFO expansion.

sample attribute query
species Species of the samples. Has EFO expansion. species query

Top

 

1.2.5. Filtering experiments by counts of a particular attribute

Experiments fulfilling certain count critera can also be searched for e.g. having more than 10 assays (hybridizations). These searches use the following syntax:

 

Filter What is filtered

assaycount:[x TO y]

filter on the number of of assays where x <= y and both values are between 0 and 99,999 (inclusive) . To count excluding the values given use curly brackets e.g. assaycount:{1 TO 5} will find experiments with 2-4 assays. Single numbers may also be given e.g. assaycount:10 will find experiments with 10 assays.
efcount:[x TO y] filter on the number of experimental factors
samplecount:[x TO y] filter on the number of samples
sacount:[x TO y] filter on the number of sample attribute categories
rawcount:[x TO y] filter on the number of raw files
fgemcount:[x TO y] filter on the number of final gene expression matrix (processed data) files
miamescore:[x TO y] filter on the MIAME compliance score (maximum score is 5)
date:yyyy-mm-dd

filter by release date

  • date:2009-12-01 - will search for experiments released on 1st of Dec 2009
  • date:2009* - will search for experiments released in 2009
  • date:[2008-01-01 2008-05-31] - will search for experiments released between 1st of Jan and end of May 2008

Examples

Search term What is searched
leukemia AND species:"homo sapiens" AND exptype:"transcription profiling" AND assaycount:[10 TO 99999] Transcription profiling experiments that mention the word 'leukemia' in any field, use human samples and have at least 10 assays
species:"Arabidopsis thaliana" AND NOT array:Affymetrix* AND fgemcount:[1 TO 99999] Arabidopsis experiments that are not on an Affymetrix array, but that have processed data files
species:Saccharomyces* AND date:[2009-06-01 2010*] Yeast experiments released between June 2009 and 2010

Top

 

1.3 Legacy advanced query interface

This legacy advanced query interface is no longer being actively developed, but you can still use it to query for experiments and to download subsets of the available data. In addition you can search for specific array designs and protocols. See also the understanding advanced query interface results page

You are logged into the database as 'guest' by default and can only see publicly available datasets. To view confidential data you must have a private login account.

Advanced Query Interface
  • The retrieval page is divided into three sections, for experiments, array designs and protocols. Each of these components have associated query fields, e.g. "Species", "Author" or "Experimental Factor" (see above)
  • The simplest query of the database is for the accession numbers associated with Arrays, Experiments and Protocols. See the accession codes page for more information about accession numbers
  • Queries are created by typing free text into any of the boxes provided, and/or by using the pull-down menus to select terms
  • Searches are case insensitive.
  • Wild card searches can be made using the * symbol
  • Search criteria may be combined in a single query to limit the number of results obtained. For example, a query for "Author=Preiss", "Species=Saccharomyces cerevisiae" and "Array provider=EMBL" will return all experiments by that author, using samples derived from that species and arrays from that array provider
  • The attributes which can be used in database queries are listed and explained below.
  • If your query fails to produce any results, a message similar to the following will be displayed. In such cases you should relax your search constraints to increase the chances of finding the desired results. Note: In some cases, a query based on species will return no results. This is probably because while the data exists in the database, it has not yet been made public.
Failed query
  • The query results page for a given experiment or array design provides links to the array design used in the experiment, or experiments performed using the array design. Experiment results also provide links to the protocols used in that experiment. For example, having found a particular array design, every experiment which uses that array may then be retrieved from the database by clicking on the "Experiments done with this Array" link.

The advanced query interface results page is described here: Understanding advanced query interface results >>

Top

1.3.1 Experiment attributes

The panel on the main query page which is used to search the database for Experiments is shown below:

Experiment Queries

The attributes of an experiment which can be queried for are:

Experiment accession number
The unique identifier assigned to each experiment by the curation staff. Accession numbers for experiments have the format E-XXXX-n. An experiment is a grouping of related hybridizations that address a biological question. Note: Single hybridizations are not assigned individual accession numbers and cannot be queried directly. Once you have retrieved an experiment you can then look at associated hybridizations

Keyword
This searches for words in the experiment description. The description includes free text information provided by the submitter and an automatically generated summary.

Species
ArrayExpress uses Latin names from the NCBI Taxonomy database. Query results may be limited to a particular species using the pull-down menu provided. If the species you are looking for is not in the list there is no public data available for that species.

Experiment type
The kind of experiment which is of interest. Experiment type uses a controlled vocabulary, for example, "dose response" or "time series ", and is queried using the pull-down menu provided. A current list of allowed experiment types is available from the MGED Ontology web site. For further information on the development of controlled vocabularies and their use within ArrayExpress see MIAME-MAGE-Ontology mapping.

Experimental factors
These are the parameters which are varied by the experimenter during the experiment. Experimental factors can relate to the sample, the treatment of the sample during the experiment (e.g., the protocols used) or some other methodological factor such as the equipment used. For example, if a yeast culture was treated with a compound such as rapamycin then experimental factor would be "compound". The search query may be instructed to list only those experiments in which a given variable is changed using the pull-down menu provided.

Description
The experiment description includes free text information provided by the submitter and an automatically generated summary.

Author
The name(s) of the researcher(s) associated with the experiment (may be named in any associated publication), and the original submitter of the experiment.

Laboratory
The institutions(s) where the submitters work.

Publication
This is a list of journal names. You can retrieve all experiments where the associated paper was published in a particular journal.

Array accession number/design name/provider
Attributes of the array designs used in the experiment.

The advanced query interface results page is described here: Understanding advanced query interface results >>

Top

1.3.2 Array attributes

The panel on the main query page which is used to search the database for Arrays is shown below:

Array query panel

The attributes of an array which can be queried for are:

Array accession number
The unique identifier given to the array design. Array accession numbers have the format A-XXXX-n, where XXXX is a code representing how the array design was submitted to ArrayExpress.

Array design name
The name provided for the array design by the manufacturer. This could be a the name of a commercial design, or a name given to an in-house design supplied by a specific institute or lab.

Array provider
The person or institution who supplied the array design. This could be the manufacturer of the array, or the institute or lab of the person who submitted the array design to ArrayExpress.

The advanced query interface results page is described here: Understanding advanced query interface results >>

Top

1.3.3 Protocol attributes

The panel on the main query page which is used to search the database for Protocols is shown below:

Protocol query panel

The attributes of a protocol which can be queried for are:

Protocol accession number
The unique identifier given to the protocol.

Protocol type
The class of protocol that is of interest. Protocol type uses a controlled vocabulary, for example, "nucleic acid extraction" or "hybridization", and is queried using the pull-down menu provided. Protocol type definitions can be found on the MGED Ontology web site.

The advanced query interface results page is described here: Understanding advanced query interface results.

Top

1.4 Login to view private data

Data can be kept private in ArrayExpress until an associated paper is published. After a custom array design or experiment is loaded into ArrayExpress the submitter is sent details of login accounts for themselves, and for journal editors and reviewers so that they can the view data before it is publicly available. Private data can only be viewed through the advanced query interface. Data is made public when the submitter gives us permission to do so, or if we find that the data has been referenced in a published article.

Submitters and reviewers can login to view their private experiments and array designs by clicking on the 'Submitter/reviewer login' link in the browse interface (www.ebi.ac.uk/arrayexpress) . This will take you to a login box. Enter the login details we have provided you with.

Login box

If the 'Remember me' box is not checked, then you will remain logged in until the browser is closed.

In the advanced interface, to login, go to this page www.ebi.ac.uk/microarray-as/aer/login or the "Login" link at the top left-hand side of the advanced query interface (www.ebi.ac.uk/microarray-as/aer/entry).

If you have forgotten your ArrayExpress login and password, email the curation team at miamexpress@ebi.ac.uk to get your login details by email. Please specify the accession number of the experiment or array design you wish view.

If you are a reviewer and have not been provided with an ArrayExpress login to access private data connected to a publication please contact the data submitter, via the journal, to request this information. We cannot provide access to private data to anyone without first getting authorization from the submitter or journal.

If you have submitted data using our MIAMExpress or Tab2MAGE submission tools please note that your submitter login account cannot be used to login to ArrayExpress. You will be sent a separate ArrayExpress reviewer login account when the processing of your submission is complete.

Top

 

1.5 RSS feed

We provide an RSS service listing experiments as they become public in the ArrayExpress archive so that you can be aware of new experiments that may be of interest to you. The URL for the RSS feed is http://www.ebi.ac.uk/microarray-as/aer/rss/experiments or you can click on the orange RSS icon on the ArrayExpress home page.

Top

 

1.6 Downloading data and programmatic access for the archive and atlas

All data, experiment descriptions and array annotation in the ArrayExpress experiment archive can be downloaded from our FTP site in MAGE-TAB and MAGE-ML formats. See the following help pages:

 

1.6.1 Files available

Information about the files available for each experiment can be found here ArrayExpress FTP downloads
 

1.6.2 MAGE-TAB format files

The MAGE-TAB format is described on this page - MAGE-TAB files
 

1.6.3 Programmatic access

How to access the experiment archive and gene expression atlas programmatically is provided here - Programmatic Access

Top

1.7 Understanding Archive search results

See the Search Results page for help about the information displayed about each experiment.

Top

 

2. Searching the Atlas of Gene Expresssion

2.1. Atlas searches

See the Atlas-specific help page for information about searching the Atlas of Gene Expression

 

2.2. Advanced Atlas searches

See Advanced Atlas search help page for information about more complex searches of the Atlas of Gene Expression

 

Top

Any further questions, please see our FAQ.

spacer
spacer