Expression Atlas < EMBL-EBI

Frequently Asked Questions (FAQs)

General questions

What is the Expression Atlas

The Expression Atlas database provides information on gene expression patterns under different biological conditions. It consists of two components: the 'differential' and 'baseline' atlases. The Differential Atlas, containing both microarray and sequencing data, allows users to query which genes are up-/down-regulated in different experimental conditions, e.g. 'in Arabidopsis shoots, what genes are upregulated in plants treated by X?' The Baseline Atlas, containing exclusively RNA-seq data, displays expression levels of gene products under 'normal' conditions (e.g. normal human tissues).

How are experiments chosen to be in the Expression Atlas?

Experiments in the Expression Atlas are selected from the ArrayExpress database of functional genomics experiments. Different criteria are used for selecting microarray and high throughput sequencing experiments. Microarray data must be gene expression data (rather than ChiP-chip or CGH for example), it must have a sufficent number of replicates to allow robust statistical analysis, and it should be possible to re-annotate the array design against Ensembl. For RNA-seq data the sequences must be of high quality, there should a good quality reference genome build available, there must be sufficent replicates for statistical analysis for differential experiments and the tissues must be 'normal' if the experiment is be used in the Baseline Atlas.

How can I obtain the original data for an experiment?

The original raw and processed data files for experiments in the Expression Atlas can be found in the ArrayExpress database. To link to the data in ArrayExpress click on the experiment accession number e.g. E-GEOD-15325 or E-MTAB-599, on any experiment page in the Expression Atlas. In ArrayExpress the original submitted data files can be downloaded as zip archives, and sample annotation is available in MAGE-TAB format text files. See the ArrayExpress Quick Start guide for further information.

It doesn't look like the whole of the original experiment is in the Expression Atlas, why not?

We sometimes only include part of an experiment in the Expression Atlas because (1) there are not sufficient replicates of all the sample groups within an experiment, or (2) the hybridization or sequencing was not of high enough quality. If there are still enough assays in the experiment after the removal of those with too few replicates or low quality then we continue processing the experiment for the Expression Atlas.

How is microarray data quality controlled?

Microarray data quality is assessed using the arrayQualityMetrics package in R. Outlier arrays are detected using distance measures, boxplots, and MA plots. If an array is classed as an outlier by all three methods, it is excluded from further analysis. Please see the arrayQualityMetrics documentation for more details on the methods used.

How is RNA-seq data quality controlled?

RNA-seq reads are discarded based on several criteria. First, reads with quality scores less than Q10 are removed. Second, the reads are mapped against a contamination reference genome (E. coli for animal data, fungal and microbial non-redundant reference for plants). Any reads that map to the contamination reference are removed. Third, reads with "uncalled" characters (i.e. "N"s) are discarded. Lastly, for paired-end libraries, any reads whose mate was lost in the previous three steps are also discarded. Please see the iRAP documentation for more details on the methods used.

How is microarray data analysed?

Raw single-channel microarray intensities are normalized using RMA via the oligo package from Bioconductor (Affymetrix data) or using quantile normalization via the limma package (Agilent data). Two-channel Agilent data is normalized using LOESS via the limma package. Pairwise comparisons are performed using a moderated t-test for each gene using limma.

How is RNA-seq data analysed?

RNA-seq data is analysed using the iRAP pipeline. FASTQ files are mapped to the reference genome from Ensembl using TopHat. Raw counts are generated using htseq-count. FPKMs are calculated from the raw counts by iRAP pipeline. Pairwise comparisons are performed using a conditioned test based on the negative binomial distribution, using DESeq.

How do I see how an individual experiment was analysed?

On any experiment page, you can see a breakdown of the analysis steps from raw data to the results you see in Expression Atlas by clicking the analysis
methods button button to the top right of the page.

How can I contact you?

If you have any questions, problems or suggestions we would love to hear from you. Please email us at arrayexpress-atlas@ebi.ac.uk.

How can I keep up with the latest Expression Atlas news?

If you would like to stay up-to-date with news about our latest releases and developments, please subscribe to the Expression Atlas mailing list.

Searching

How do I find out which genes are expressed in my favourite tissue?

Use the Experimental conditions search box on the home page to search all of Expression Atlas for organism parts e.g. kidney. Your query is expanded using EFO, so that this search will also return results matching synonyms and child terms of kidney in EFO. You will see results about baseline and differential expression of genes in the organism part(s) you searched for. You can narrow down the search to specific genes by also typing gene identifiers in the Gene query search box.

Use the Organism part box on a Baseline experiment page (e.g. Illumina Body Map) to search within a single experiment. See the Baseline Atlas help for more information about searching within a Baseline experiment.

How do I search for multiple conditions at once?

Searching with space-separated experimental variables or sample characteristics finds experiments with any one of the terms you entered, or all of them. For example, searching with liver heart will find all experiments with both liver and heart as well as ones with either liver or heart.

If you want to only find experiments with all terms you entered, separate them with AND. For example, searching for "wild type" AND Col-0 will find only experiments annotated with wild type and Col-0, but not those with only one of the terms.

How do I search for my favourite gene?

Use the Gene query search box on the home page to search all of Expression Atlas for genes. You will see results about baseline and differential expression for the gene or genes you searched for. You can narrow down the search to specific experimental conditions by also typing something in the Experimental conditions search box.

Use the Gene query search box on an experiment page (e.g. Illumina Body Map) to search for genes within a single experiment.

How do I search for multiple genes at once?

Enter gene IDs and/or keywords separated by spaces. Put multi-word search terms in quotes, e.g. "transcription factor binding".

What gene identifiers can I use to search?

You may use the following identifiers to search using the Gene query box:

  • EMBL e.g. AC000015
  • Ensembl Gene e.g. ENSG00000171658
  • Ensembl Protein e.g. ENSP00000000233
  • Ensembl Transcript e.g. ENST00000000233
  • Gene Ontology ID e.g. GO:0008134
  • Gene Ontology term e.g. "transcription factor binding" (please put multi-word query terms in quotes)
  • Interpro e.g. IPR017892
  • RefSeq e.g. NM_212505
  • UniProt e.g. Q8IZT6
  • UniProt Metabolic Enzyme e.g. O15269
  • gene or protein name e.g. WNT1 or Wnt-1
  • keyword e.g. transcription
  • gene name synonym e.g. Calmbp1
  • Ensembl gene biotype e.g. protein_coding or non_coding. See the glossary at Ensembl for more details.

How do I find a particular experiment?

All experiments currently in Expression Atlas are listed on the Experiments page. Click on the Description to see the experiment in Expression Atlas. Click on the ArrayExpress accession number (e.g. E-MTAB-1066) to see the experiment in ArrayExpress. If you know the ArrayExpress accession of the experiment you want to see, you can link to the experiment in Expression Atlas using the following format:

http://www.ebi.ac.uk/gxa/experiments/<ArrayExpress accession>

E.g. http://www.ebi.ac.uk/gxa/experiments/E-MTAB-1066

Can I search Expression Atlas programmatically?

We are working to implement a formal API to query Expression Atlas. Please check back soon for updates.

In the meantime, if you would like to query differential Expression Atlas data programmatically, you can construct queries using URLs like the ones in the table below. Please be aware that the format of these URLs is subject to change. If your queries stop working, please check back here for the latest standard or email us at arrayexpress-atlas@ebi.ac.uk.

Query URL Result
In what conditions is ASPM differentially expressed? http://www.ebi.ac.uk/gxa/query.tsv?geneQuery=ASPM A tab-delimited text file containing results of comparisons where ASPM is differentially expressed with adjusted p-value < 0.05.
What genes are differentially expressed in cancer? http://www.ebi.ac.uk/gxa/query.tsv?condition=cancer A tab-delimited text file containing results for genes that are differentially expressed in experiments about cancer.
Show me comparisons where zinc finger genes are differentially expressed in mice. http://www.ebi.ac.uk/gxa/query.tsv?geneQuery=%22zinc+finger%22&condition=%22mus+musculus%22 A tab-delimited text file containing results of comparisons where genes annotated as having zinc finger domains are differentially expressed, in Mus musculus.

Results

When I search all of Expression Atlas, what do the "Differential Expression" results show?

The heatmap table in the Differential Expression section shows all genes matching your query that are differentially expressed with adjusted p-value < 0.05. If your search included an experimental condition then the table displays comparisons with a sample property or experimental variable that matched that condition.

There is one row for each gene and comparison in which it showed differential expression. The genes are ordered so that the one with the largest log2 fold-change is at the top. If some genes have identical log2 fold-changes, the one with the lower adjusted p-value goes first. In microarray experiments, some genes may be targeted by more than one probe set. For each experiment, the table shows the values for the probe set that had the largest absolute log2 fold-change only. You can see details of all probe sets for a given gene on the experiment page. Click on the comparison name to find out more information about a particular gene/comparison combination.

Please see the Differential Atlas help page for more details.

What is a "design element"?

On microarray experiment pages, you will see the design element name alongside the gene name. A design element is also known as a probe or probe set. This is the oligonucleotide probe (or group thereof) on the microarray that targets that gene.

What is a "comparison"?

A comparison is where two groups of samples are compared in a differential expression experiment. An example of a comparison is "dicer knock-out" vs. "wild type". For each gene, the mean expression level of the test group (e.g. dicer knock-out) is compared with the mean expression level of the reference group (e.g. wild type), and a statistical test is performed to decide whether the two means are significantly different.

Where do the p-values come from?

In a microarray experiment, each gene's mean expression level in the test group is compared with its mean expression level in the reference group using a moderated t-test. This is done using the limma package from Bioconductor.

In an RNA-seq experiment, each gene's mean expression level in the test group is compared with its mean expression level in the reference group using a conditioned test based on the negative binomial distribution, analogous to Fisher's exact test. This is done using the DESeq package in Bioconductor.

Because the same test is done on thousands of genes at once, p-values are adjusted for multiple testing using the Benjamini and Hochberg (1995) false discovery rate (FDR) correction. This is what is meant by adjusted p-value.

How are the gene set enrichment plots created?

Gene set enrichment analysis is performed using the Piano package from Bioconductor. For each comparison, enrichment of terms from GO, InterPro, and Reactome is tested for within the set of differentially expressed genes, using a variation on Fisher's exact test. Gene set enrichment plots are only shown when statistically significant enrichment of terms was detected. This means that for some experiments, the menu will not display plots for all three of the aforementioned resources. Please see the Piano documentation for more details.

In Differential Atlas, what do the red and blue colours in the table mean?

A red box indicates that the gene was up-regulated in the test condition. A blue box means that the gene was down-regulated in the test condition.

The colour intensity of filled boxes in the table represents how large the log2 fold-change is for each gene. The larger the log2 fold-change, the more intense the red or blue colour.

How do I see the actual log2 fold-changes in the results table?

Click the Display log2 fold-change button to the top left of the table.

What are the blue and red bars above the results table?

The two bars show the red and blue colour intensities for the top 50 genes shown on the page. The colour intensities represent the log2 fold-changes for the genes shown. To see the actual log2 fold-changes, click the "Display log2 fold-change" button to the top left of the table.

What does the differential expression experiment page show?

Each experiment in Expression Atlas has its own experiment page, e.g. D. melanogaster CDK8 and CycC mutants. Here you can see the results of analysis for each comparison in a single experiment. Please see the Differential Atlas help page for more details.

Can I download the differential expression results?

Yes, click on the analytics download button button to the top right of any differential experiment page, e.g. D. melanogaster CDK8 and CycC mutants. For more details about the differential expression experiment pages, please see the Differential Atlas help page.

Can I download the data used for differential expression analysis?

Yes, please use the download buttons to the top right of any experiment page. For single-channel microarray experiments you can download the normalized intensities using the normalized data download button button. For two-channel microarray experiments you can download the log2 fold-changes using the download log2 fold changes button button. For RNA-seq experiments you can download the raw counts generated by htseq-count using the download
raw RNA-seq counts button. For more details about the differential expression experiment pages, please see the Differential Atlas help page.

When I search all of Expression Atlas, what do the "Baseline Expression" results show?

The table in the Baseline Expression section of the results shows all experiments that matched your search.

If your search was for an Experimental condition then it includes all experiments with a sample property or experimental variable that matched that condition. For example you might be interested in experiments that are from 'liver' samples and so you could search for this term.

If you searched for a particular gene then the experiments returned are ones in which that gene was expressed above the default minimum expression level (FPKM - Fragments Per Kilobase of transcript per Million mapped reads) which is 0.5.

Click on the name of the experiment to view more information about the experiment.

Please see the Baseline Atlas help page for more details.

What does the baseline expression experiment page show?

Each experiment in Expression Atlas has its own experiment page, e.g. RNA-seq of mouse DBA/2J x C57BL/6J heart, hippocampus, liver, lung, spleen and thymus. Here you can see a heat map showing the 50 most specifically expressed genes across all conditions studied. You can further refine the query by narrowing the search to particular genes, or gene sets, or by limiting which organism parts are searched over. Please see the Baseline Atlas help page for more details.

In Baseline Atlas, what do the different shades of blue in the heat map mean?

The heatmap shows expression levels by colour intensity, according to the gradient bar displayed above the heatmap. The intensities correspond to expression levels for the top 50 genes currently displayed (rather than all FPKM values for all genes returned by the query).

How do I see the actual expression levels in the Baseline Atlas heat map?

Click on the Display levels button in the top left corner of the heat map to show the actual FPKM values.

Can I download the baseline expression results?

Yes, click on the Download button button to download the full results of your query in tab–delimited format with no ordering.

Can I view the results in the Ensembl browser?

Yes. From the heatmap, select a gene name, and either a comparison (differential) or a condition (baseline) from the column headings. Next, click the Ensembl Genome Browser Open button, to the left of the heatmap.