Frequently Asked Questions
What is Expression Atlas
Expression Atlas is a resource to query gene and protein expression data across species and biological
conditions and to visualise down-stream analysis results to explore co-expression. Queries can be either in
a baseline context, e.g.
genes expressed in the macaque brain, or in a differential context, e.g.
find genes that are up or downregulated in response to auxin in Arabidopsis. Expression Atlas
thousands of selected microarray and RNA-sequencing data that are manually curated and annotated with
ontology terms, checked for high quality and re-analysed using standardised methods.
What experiments can be included in Expression Atlas?
To be included in Expression Atlas each experiment must meet all of the following criteria:
- Experiment measures gene or protein expression
- Raw data are available
- All samples within the dataset belong to a single species
- Samples come from non-bacterial species
- The species genome is available through Ensembl
- Annotations for microarray probes are available
- Sufficient sample annotation is provided
Additionally, we employ several “softer” guidelines to determine whether or not an experiment is eligible
for inclusion into Expression Atlas:
- (differential): The experiment should have at least 2 experimental groups, with 3 biological replicates
each and also have a clear control/reference group
- (baseline): The experiment design does not involve any perturbations and the dataset should have at
least 3 experimental groups with 3 biological replicates each
- The experiment addresses a relevant biological question (is not technical or proof of principle study)
- Experimental metadata are of high quality and confidence
- The experimental design is not too complex (e.g. not too many factors) and allows for straightforward
If an experiment is judged to be of particular interest and its inclusion in Expression Atlas is highly
valuable for the community, we may decide to include it even if it fails some of the above guidelines. We
also actively collaborate with several specialized initiatives such as the Gramene consortium and OpenTargets
and prioritize experiments that are of special interest to our partners.
Please do not hesitate to contact our team if you have any
questions about how we select experiments for Expression Atlas or you wish to recommend a dataset(s) that
you feel should be ingested and displayed in this resource.
How can I obtain the original data for an experiment?
The original raw and processed data files for experiments in Expression Atlas can be found by viewing the
experiment in ArrayExpress. On any Experiment
page, e.g. RNA-seq of 934 human cancer cell lines from the
Cancer Cell Line Encyclopedia click on the
button in the
Resources tab to view the experiment in ArrayExpress. The original submitted data files
can be downloaded as zip archives, and sample annotation is available in MAGE-TAB format text files. See
the online tutorial
ArrayExpress: Discover functional genomics data quickly and easily for further information.
It doesn't look like the whole of the original experiment is in Expression Atlas, why
We sometimes only include part of an experiment in Expression Atlas because (1) there are not sufficient
replicates of all the sample groups within an experiment, or (2) the hybridization or sequencing was not of
high enough quality. If there are still enough assays in the experiment after the removal of those with too
few replicates or low quality then we continue processing the experiment for Expression Atlas.
How is microarray data quality controlled?
Microarray data quality is assessed using the
arrayQualityMetrics package in R. Outlier arrays are
using distance measures, boxplots, and MA plots. If an array is classed as an outlier by all three methods,
it is excluded from further analysis. Please see the arrayQualityMetrics
documentation for more details on the methods used.
How is RNA-seq data quality controlled?
RNA-seq reads are discarded based on several criteria. First, reads with quality scores less than Q10 are
removed. Second, the reads are mapped against a contamination reference genome (E. coli for animal
data, fungal and microbial non-redundant reference for plants). Any reads that map to the contamination
reference are removed. Third, reads with "uncalled" characters (i.e. "N"s) are discarded. Lastly, for
paired-end libraries, any reads whose mate was lost in the previous three steps are also discarded. Please
see the iRAP
documentation for more details on the
How is microarray data analysed?
Raw single-channel microarray intensities are normalized using
RMA via the
oligo package from
Affymetrix data) or using quantile normalization via the
(Agilent data). Two-channel
Agilent data is normalized using LOESS via the
limma package. Pairwise
comparisons are performed using a moderated t-test for each gene using
How is RNA-seq data analysed?
RNA-seq data is analysed using the iRAP pipeline.
Quality-filtered reads are aligned to the latest version of the reference genome from
TopHat2. Raw counts
(number of mapped reads summarized and aggregated over each gene) are generated using
htseq-count. Then, FPKM (fragments per kilobase of exon model
per million mapped reads) and TPM (transcripts per million) are calculated. Pairwise comparisons are
performed using a conditioned test based on the negative binomial distribution, using
What are FPKM and TPM?
FPKM (fragments per kilobase of exon model per million reads mapped) and TPM (transcripts per million) are
the most common units reported to estimate gene expression based on RNA-seq data. Both units are calculated
from the number of reads that mapped to each particular gene sequence and both units are calculated taking
into account two important factors in RNA-seq:
- The number of reads from a gene depends on its length. One expects more reads to be produced from
- The number of reads from a gene depends on the sequencing depth that is the total number of reads you
sequenced. One expects more reads to be produced from the sample that has been sequenced to a greater
FPKM (introduced by Trapnell et al,
2010) are calculated with the following formula:
where qi are raw counts (number of reads that mapped for each gene), li
is gene length and
is the total number of mapped reads. The interpretation of FPKM is that if you sequence your RNA sample
again, you expect to see for gene i, FPKMi reads divided by gene i length
over a thousand and divided by the total number of reads mapped over a million.
Li and Dewey, 2011 introduced the unit TPM and
Pachter, 2011 established the relationship between both
units. It is possible to compute TPM from FPKM as follows:
How do I see how an individual experiment was analysed?
On any Experiment page, e.g.RNA-seq of coding RNA
of eight barley tissues from different developmental stages you can see a breakdown of the analysis
steps from raw data to the results you see in Expression Atlas by selecting the Experiment design or
Supplemmentary information tab on the top of the experiment section.
If you have any questions, problems or suggestions we would love to hear from you. You may reach us through
the EBI Support & feedback form.
How can I keep up with the latest Expression Atlas news?
If you would like to stay up-to-date with news about our latest releases and developments, please subscribe
to the Expression Atlas mailing
Can I use Expression Atlas anatomograms on my website?
Yes! The anatomical diagrams (anatomograms) that you see alongside the baseline data are available from
GitHub. The anatomograms are licenced under
Creative Commons Attribution (CC BY); if you’d
like to use them, all we ask is that you attribute Expression Atlas when you do.
How do I find out what genes are expressed in a particular condition (e.g. in my
favourite tissue, cell line, developmental stage)?
Use the Condition query search box on the home page to search for the
condition you are interested in e.g.
kidney. Click on
Show anatomogram in the top left corner if you want to see the corresponding anatomogram
for each of the species displayed. Your query is expanded using the Experimental Factor Ontology
(EFO), so that this search will also returns matching synonyms and
child terms of kidney in
EFO. You will see both baseline expression and differential expression results in the condition (organism
part in our example) you searched for.
Select one experiment from the Baseline multi-experiment page (e.g.
GTEx) to see the results in that particular
experiment. Use the Select button under Organism parts on the left sidebar to find
what genes are expressed in kidney in GTEx.
How do I search for multiple conditions at once?
Use the Condition query search box on the home page to search for as
many conditions as you want. You will need to type each condition, click enter and when you are done, just
search. For example, searching with
all experiments in which both liver and heart are studied as well as the ones analysing either liver or
How do I find out in which conditions my favourite gene is expressed?
Use the Gene query search box on the home page to search for your
favourite gene (e.g.
SFTPC). Click on Show anatomogram in the top left corner if you want to see the
corresponding anatomogram for each of the species displayed. You will see Baseline
expression results in different conditions (e.g. organism part, cell line, cell type,
developmental stage) and Differential expression results for biologically meaningful
Select one experiment from the Baseline expression results (e.g.
GTEx) to see the results in that particular
experiment. Use the Gene query box (e.g. SFTPC in our example) to find
in which tissues from GTEx gene SFTPC is expressed.
How do I search for multiple genes at once?
Use the Gene query search box on the home page to search for as many
genes as you want. You will need to type each gene, click enter and when you are done, just search.
What gene identifiers can I use to search?
You may use the following identifiers to search using the Gene query box:
How do I find a particular experiment?
You can easily see all experiments in Expression Atlas by clicking in All experiments on
the home page. You can narrow down the list of experiments by selecting Baseline or
Differential in the first column at the bottom of the table. You can also select Plants or Animal and Fungi
experiments (using the second column), experiments from a particular organism (fifth column) or experiments
involving a particular variable (sixth column). Click on the Experiment title to see the experiment in
If you know the ArrayExpress accession of the experiment
you want to see (e.g. E-MTAB-4202
), you can link to the experiment in Expression Atlas using the following format:
Are there any alternative ways of searching in Atlas?
It's best if you contact us and we will be able to advise you on your specific use case.
In the meantime, you can construct queries using URLs like the ones in the table below. Please be aware
that the format of these URLs is subject to change. If your queries stop working, please check back here
for the latest standard or get in touch with us through
the EBI Support & feedback form.
|In what conditions is ASPM differentially expressed?
|What genes are differentially expressed in cancer?
|Show me comparisons where zinc finger
genes are differentially expressed in mice.
When I search in Expression Atlas, what do the Baseline Expression results show?
If you search for a particular gene (e.g. CFHR2), the Baseline Expression
results will display all organisms and conditions in which CFHR2 is expressed above the default minimum
expression level of 0.5 FPKM or 0.5 TPM.
By default, we display expression data for different tissues, e.g
in which tissues is CFHR2
expressed? but you can also use the filters to find gene expression in other conditions, such as
in which mouse cell types is CFHR2 expressed?
If you search for a particular condition (e. g.
liver), in the Baseline
Expression results you will see all organisms and experiments matching your search.
In Baseline expression results, how are baseline expression levels represented?
In Baseline expression results you will see one heatmap per species. Each heatmap shows,
for each species, all conditions (columns) and all experiments (rows) that matched your search. Expression
levels are displayed in the heatmap in five different colours:
Grey box: expression level is below cutoff (0.5 FPKM or 0.5 TPM)
Light blue box: expression level is low (between 0.5 to 10 FPKM or 0.5 to 10 TPM)
Medium blue box: expression level is medium (between 11 to 1000 FPKM or 11 to 1000 TPM)
Dark blue box: expression level is high (more than 1000 FPKM or more than 1000 TPM)
White box: there is no data available
How can I see the results for each experiment in Baseline expression results?
Each Baseline expression experiment in Expression Atlas has its own Experiment page, e.g.
Strand-specific RNA-seq of 13 human tissues from Michael Snyder's
lab for the ENCODE project where you can see a heatmap showing the 50 most specifically expressed
across all conditions studied.
You can further refine the query by narrowing the search to a particular gene (e.g.
CTRB1), or gene sets (e.g.
CELA3A, CELA3B, CTRB1, CTRB2, PRSS1, PRSS2), or by limiting which organism parts are searched over
genes specifically expressed in pancreas).
In the experiment page, how are baseline expression levels represented?
In the Experiment page, e.g. Transcription profiling by high
throughput sequencing of different potato tissues (genotype RH89-039-16)expression levels are
represented in one heatmap that shows gene expression levels for the 50 most specifically expressed genes
(rows) across all conditions studied in the experiment selected (columns).
Expression levels are displayed in the heatmap by colour intensity, according to the gradient bar above the
heatmap. Hover the mouse above a cell to show a tooltip with the numerical values corresponding to each
Can I download the baseline expression results?
Yes, click on the
button above the heatmap to download the data corresponding to your query. For example, if you select
'flower' in the Organism part box, use the Download all results button to
download expression data for the
subset of genes specifically expressed in flower in tab-delimited format with no ordering.
On the other hand, by clicking on the Downloads tab of the experiment page you will download
expression data for all genes and all conditions studied in the experiment.
Can I view the results in the Ensembl browser?
Yes, you can. From the heatmap of the Experiment page, e.g. Baseline
expression from transcriptional profiling of zebrafish developmental stages just select a gene (e.g.
SNORD61) and a condition, developmental stage in that particular experiment (e.g.
gastrula 50%-epiboly) and click on the
button in the left of the heatmap. You will see
gene expression value for gene SNORD61 in developmental stage gastrula 50%-epiboly in the context of the
genomic location of SNORD61.
For plant experiments, e.g. Transcriptomes for hybrids (F1s)
between 18 Arabidopsis thaliana parents of the Multiparent Advanced Generation Inter-Cross (MAGIC) genetic
mapping resource you can also use the
button to see, for example,
gene expression value for gene EPR1 in ecotype Sf-2 x Can-0 in the context of the genomic location of
For experiments performed in Caenorhabditis elegans or in Schistosoma mansoni such as
RNA-Seq of Schistosoma mansoni (flatworms) larva and adult
individuals at different life-stages you can see the results using the
How are similarly expressed genes computed?
Similarly expressed genes across conditions in baseline experiments (e.g. tissues, developmental stages)
are shown where available. They are computed on per-experiment basis, for experiments with three or more
conditions. The method for estimating them involves two steps. The first one involves k-means clustering
(for all possible values of k, capped to 100) of the expression of each gene across (e.g.) tissues. The
second step compares the clusters on a gene-to-gene basis and outputs a ranked list of genes with
decreasing similarity of expression patterns for each gene. The top 50 similarly expressed genes can be
explored on our interface. Lowly expressed genes are filtered out from the calculation. The method has been
implemented within the Bioconductor package
When I search in Expression Atlas, what do the Differential Expression results show?
If you search for a particular gene (e.g. CFHR2), in the Differential
expression results you will see
all comparisons in all species in which
CFHR2 is differentially expressed (absolute value of log2 fold-change > 1 and
adjusted p-value < 0.05). Comparisons in which gene CFHR2 is differentially expressed are ordered so the
one with the largest absolute value of log2 fold-change is at the top. If gene CFHR2 has
identical log2 fold-change in several comparisons, then the one with the lower adjusted
p-value goes first.
If you search for a particular condition (e. g. liver), in the Differential
expression results you will see
all comparisons in all species that
matched your search. When several genes show differential expression you will see one row per each
gene and comparison. Comparisons are ordered following the same criteria as explained before.
In both cases, you can narrow down the results displayed using the filters in the left where you can select
a particular species (e.g. Mus musculus), experimental variable (e.g. compound) to see which comparisons
involve the treatment of liver with a compound in mouse.
Click on the Comparison name (e.g.
'CCl4; 1.6 gram per kilogram' at '24 hour' vs 'none' at '0 hour') to find out more information about
particular gene-comparison combination. In microarray experiments, some genes may be targeted by more than
one probe set. The table in the Differential expression results shows the values for the
probe set with the largest absolute log2 fold-change. You can see details of all probe sets for
a given gene on the Experiment page. For example,
gene Car3 is represented by 4 probe sets in the experiment
Transcriptional responses in liver upon acute CCl4 intoxication.
What is a "design element"?
On microarray experiment pages, you will see the design element name alongside the gene
name. A design element is also known as a probe or probe set. This is the oligonucleotide probe (or group
thereof) on the microarray that targets that gene.
What is a "comparison"?
A comparison is where two groups of samples are compared in a differential expression
experiment. An example of a comparison is 'breast cancer'
For each gene, the mean expression level of the test group (e.g. breast cancer) is
compared with the mean expression level of the reference group (e.normal), and a
statistical test is performed to decide whether the two means are significantly different.
What do the red and blue colours mean in the Differential expression results?
Let's search for the condition
cancer in a differential context. A red box indicates that the gene is up-regulated in
the test condition while a blue box means that the gene is down-regulated in the test
condition. The colour intensity of filled boxes in the table represents how large the log2
fold-change is for each gene. The larger the log2 fold-change, the more intense the red or blue
The two bars above the Differential expression results table show the red and blue colour intensities for
the top 50 genes shown on the page. The colour intensities represent the log2
fold-changes for the genes shown.
How do I see the actual log2 fold-changes in the Differential
expression results table?
Click on the
button to the top left corner of the table to see the numerical values of the log2 fold-change
per each gene in each comparison.
Where do the p-values come from?
In a microarray experiment, each gene's mean expression level in the test group is
compared with its mean expression level in the reference group using a moderated
t-test. This is done using the
limma package from
In an RNA-seq experiment, each gene's mean expression level in the test group is compared
with its mean expression level in the reference group using a conditioned test based on
the negative binomial distribution, analogous to Fisher's exact test. This is done using the
DESeq package in
Because the same test is done on thousands of genes at once, p-values are adjusted for
multiple testing using the Benjamini and
Hochberg (1995) false discovery rate (FDR) correction. This is what is meant by adjusted
How are the gene set enrichment plots created?
Gene set enrichment analysis is performed using the
Piano package from
Bioconductor. For each comparison, enrichment of terms from
GO, InterPro, and
Reactome is tested for within the set of differentially expressed
genes, using a variation on Fisher's exact
test. Gene set enrichment plots are only shown when statistically significant enrichment of terms was
detected. This means that for some experiments, the menu will not display plots for all three of the
Can I download the differential expression results?
Yes, click on the
button above the heatmap to download the data corresponding to your query. For example, in
Transcription profiling by array of pancreatic islets from Sgpp2
knockout mice after high fat diet if you select just one comparison, e.g. 'high fat diet' vs
in 'Sgpp2 knockout' from the Comparison box and click Apply, by clicking on the
Download all results button you will download
expression data for the subset of genes differentially expressed in that particular comparison in
tab-delimited format with no ordering.
On the other hand, by clicking on the Downloads tab of the experiment page you will download
expression data for all genes in all comparisons studied in the experiment.