Differential Expression Atlas help

About the Differential Atlas

The Differential Atlas lets you ask questions about which genes are up– or down–regulated in different experimental conditions. It is based on a set of highly curated differential expression RNA-Seq and microarray experiments from ArrayExpress that have been re-processed using our in–house differential expression statistical analysis pipeline.

Differential Atlas at–a–glance

Differential Atlas experiment page screenshot

Searching the Differential Atlas

Specifying the maximum adjusted p-value and minimum log2 fold-change

By default, the maximum adjusted p-value is 0.05, and the minimum log2 fold-change is 1. This means only genes with an adjusted p-value below 0.05 and log2 fold-change above 1 (or below -1) are displayed.

You may change the two cutoff values to whatever you like, to relax constraints or make them stricter. Changing these values will affect the number of genes displayed. Generally, the higher the chosen adjusted p-value, and the lower the minimum log2 fold-change, the more genes will be displayed.

Type any value from 0 to 1 in the Adjusted p-value cutoff box to use as the maximum false discovery rate (FDR) -adjusted p-value cutoff. Only genes that show differential expression with an adjusted p-value at or below this cutoff will be returned.

Type any numerical value in the Log2 fold-change cutoff box to use as the minimum absolute log2 fold-change. Only genes that have a larger absolute log2 fold-change will be displayed. When we say "absolute" value, we mean the value without a "+" or "-" sign in front. For example, if you choose an absolute value of 2, then genes with log2 fold-change of greater than +2 or less than -2 are shown.

Searching with genes

You can search with Ensembl gene symbols (e.g. desat1), identifiers (e.g. FBgn0086687) or biotypes (e.g. protein_coding), UniProt accessions (e.g. P35542), GO ("pheromone biosynthetic process") or InterPro terms (e.g. "Fatty acid desaturase, type 1"). A space–separated list of gene attributes will bring back genes that match at least one of the attributes in the query.

Use the radio buttons to the right of the Gene Query box to limit the search to up–regulated genes, down–regulated genes, or both.

If the "Exact match" box is checked, only genes with annotations that fully match your query will be returned. For example, to see if the desat1 gene is differentially expressed, check the "Exact match" box, type "desat1" and click "Search". If the box is not checked, genes with annotations that contain words that fully match your query will be returned. For example, to find results for genes with "desaturase" in their annotations (e.g. "stearoyl-CoA 9-desaturase activity", "Fatty acid/sphingolipid desaturase", …), uncheck the "Exact match" box, type "desaturase" and click search.

Searching with comparisons

Many experiments contain more than one comparison. If you do not select any comparisons, results for all comparisons in the experiment will be displayed. To see the results for specific comparison(s) select the ones you wish to see from the Comparison box:

Comparison selection dropdown

Select one or more comparisons to search with, or start typing to see suggestions. To see details of the groups of samples used in each comparison, click the Experiment Design button button.

Specific vs. non-specific search

If there is more than one comparison in an experiment, the default Differential Atlas search reports genes with more "specific" differential expression (with the Specific option selected by default). What we mean by this is outlined below.

If no comparisons are selected…

If no comparisons are selected, the Specific search will rank genes so that genes that are differentially expressed in just one comparison come first, followed by genes differentially expressed in two comparisons, then three and so on, reporting genes that are differentially expressed in all available comparisons at the end of the list of results. Within each group of N comparisons, genes are ordered by largest average absolute (i.e. ignoring positive or negative status) log2 fold-change.

Only log2 fold-changes for differential expression in the direction selected are considered for specificity. For example, if you search for only up–regulated genes (by clicking the "up" radio button next to the Gene Query box), only genes with positive log2 fold-changes are used in the calculations for specificity.

If at least one comparison is selected…

If at least one comparison is selected, the Specific search will promote genes with larger average absolute log2 fold-change in the selected comparison(s), and at the same time penalize genes with the same or larger log2 fold-change in the un–selected comparison(s). This is done using the "fold difference" between the selected and un–selected absolute log2 fold-changes:

  • Fold difference is calculated by dividing the average absolute log2 fold-change of the selected comparison(s) by the largest absolute log2 fold-change of the un–selected comparisons.
  • If none of the un–selected comparisons' absolute log2 fold-changes are at or above the selected log2 fold-change cutoff, the average of the selected comparisons' absolute log2 fold-changes is divided by the cutoff instead.

Genes with the greatest fold difference in absolute log2 fold-changes between selected and un–selected comparisons will be pushed towards the top of the list. Genes that have a larger absolute log2 fold-change in the un–selected comparisons are not shown.

Non–specific search

If the Specific option is not selected, the Differential Atlas search will ignore specificity of differential expression when ordering the results. In particular:

  • If no comparisons are selected, genes with largest absolute log2 fold-changes in all comparisons will be reported first.
  • If at least one comparison is selected, genes with larger absolute log2 fold-changes across the selected comparisons are reported first. Log2 fold-changes in the un–selected comparisons are ignored in the ordering of results.

Heatmap results display

Results for your query are shown in a heatmap table. The heatmap rows are labelled with gene symbols (and design elements in the case of microarray data). The columns are labelled with comparison descriptions. Mouseover the column headings to see the full description.

The heatmap ranks genes by absolute (i.e. ignoring up- or down-regulated status) log2 fold-change, with the largest at the top. If some genes have identical log2 fold-changes, these are then ordered by adjusted p-value, with the gene with the lowest adjusted p-value first. Greater colour intensity means larger absolute log2 fold-change, as illustrated by the gradient bars (e.g. heatmap colour gradient) displayed above the heatmap. Blue boxes indicate the gene is down–regulated, while red ones mean up–regulated. Mouseover a filled box in the heatmap to see the adjusted p-value and log2 fold-change (and t-statistic for microarray data). To see the log2 fold-changes for all genes, click the Display log2 fold-changes button. This also reveals the maximum and minimum log2 fold-changes in the experiment either side of the gradient bars. The gradient shows intensities corresponding to up to the top 50 differentially expressed genes currently displayed (rather than all genes returned by the query).

If you mouseover a gene name in the heatmap, synonyms, Gene Ontology terms, and Interpro terms are displayed:

Differential Atlas gene mouseover

Any terms that match your Gene Query term(s) are highlighted in yellow here. Click the gene name to see more information about that gene, including other Expression Atlas experiments in which it was found.

MA plots and gene set overlap summaries

Two types of plots can be visualised by clicking one of the histogram button buttons in the header of the heatmap table. This brings up a menu like that shown below. Different numbers of plots may be available for different experiments.

plots menu

An MA plot is generated for each comparison in every experiment. To see it, select the MA plot menu item item from the menu. This plot displays the average expression level for each gene (normalized microarray intensity level or RNA-Seq log2 counts-per-million) on the x-axis against log2 fold-change on the y-axis. This plot gives an overview of the relationship between expression level and fold-change, and can be used to assess any systematic biases in the data. Genes that were called as differentially expressed at FDR < 0.05 are shown in red in the plot.

Gene set overlap analysis is performed using the Piano package from Bioconductor. For each comparison, overlap between terms from GO, InterPro, and Reactome and the set of differentially expressed genes is performed, using Fisher's exact test with multiple testing correction (FDR < 0.1). Gene set overlap plots, are available only when statistically significant overlap of terms was detected. This means that for some experiments, the menu will not display plots for all three of the aforementioned resources. To see a plot, select one of the following options from the menu:

  • GO plot menu item to see GO term overlap,
  • InterPro plot menu item to see InterPro term overlap,
  • Reactome plot menu item to see Reactome pathway term overlap.

Doing so will bring up a plot like the one below, showing a maximum of top 10 overlapping terms (nodes) from a list sorted by the effect size (i.e. the number of observed divided by the number of expected genes annotated with a given term within the differentially expressed set of genes). The terms are linked by edges representing genes shared between them - the more more genes shared between the two terms, the thicker the edge. The size of each node represents the proportion of differentially expressed genes annotated with each term. Please see the Piano documentation for more details.

GO term overlap plot

Visualising differential expression at Ensembl

If you select a gene name and a comparison heading from the heatmap, the Ensembl Genome Browser Open button, to the left of the heatmap table, will become clickable.

screenshots showing how to visualise data in Ensembl

Clicking the Open button will take you to the Ensembl or Ensembl Genomes browser, which will display the log2 fold-change for the comparison you selected in the context of the genomic location of the gene you selected. For more details about how to use the genome browsers, please see the relevant documentation for Ensembl or Ensembl Genomes.

Downloading query results

The top 50 differentially expressed genes resulting from your search are displayed on the page. Click on the Download button button to download the full results of your query in tab–delimited format with no ordering.

Downloading other experimental data

You can download RNA-seq raw counts (click the raw counts download button button), normalized microarray intensity data (click the normalized data download button button), and all statistical analytics results for all comparisons in the experiment (click the analytics download button button). All data is provided as tab-delimited text files. You can also download data for each experiment ready to load into R by clicking the R data
        download button on any experiment page. See our help page about Expression Atlas data in R for more details.

QC reports

Click the array QC report button button to see the results of quality assessment for the experiment data files.

For microarray experiments, this report is generated by the arrayQualityMetrics package from Bioconductor in R. Please see the arrayQualityMetrics documentation for more details about the methods used. Breifly, outlier arrays are detected using distance measures, box plots, and MA plots. Any array that is found to be an outlier by all three of these methods is excluded from further analysis.

For RNA-seq experiments, the QC report is generated by the iRAP pipeline. Please see the iRAP documentation for more details on the methods used.

Analysis methods page

The Analysis Methods page (e.g. for D. melanogaster CDK8 and CycC mutants or Arabidopsis lncRNA-mediated transcriptional silencing mutants) lists the data analysis methods applied to the raw experimental data (FASTQ files or microarray intensity data), to obtain the differential expression statistics shown in the Differential Atlas. You can access it by clicking the Analysis Methods button button.

Experiment design page

The Experiment Design page (e.g. for D. melanogaster CDK8 and CycC mutants is accessible via the Experiment Design button. Here you can see RNA-Seq processing run accessions (from ENA) or microarray assay accessions, along with their corresponding biological sample characteristics and experimental variables. If you select a comparison from the "Comparison" dropdown menu, the runs/assays used in that comparison will be highlighted in the table. Runs/assays not used in the selected comparison are not highlighted. Runs/assays assigned to the "Reference" group are highlighted one colour, and those assigned to the "Test" group are highlighted in another. The up– or down–regulated expression calls shown in the heatmap are always from the perspective of the "Test" group.

Differential Atlas experiment design page

Sorting the experiment design

You can sort the experiment design table by clicking on a column. Click the column again to reverse the sort order. To then sort by another column while retaining the ordering of the first column, hold down the shift key and click another column. The first selected column forms the primary sort order and the shift+click–ed columns form secondary, tertiary, etc. sort orders in turn. If you shift+click a column to sort by it and then wish to undo this, just shift+click it again until the column is unsorted (unsorted
        arrows).

The experiment design table is also searchable — entering a keyword automatically selects the subsection of the table that matches the keyword. Click on the Download button button on the right–hand side to download the full results of the current query (with no ordering applied) in tab–delimited format.

Providing feedback

We value your feedback on what Expression Atlas is doing right, what does not work or what could be achieved more intuitively. We would also be grateful for any feedback on the analysis methods we have adopted — we are passionate about not only about intuitive data presentation, the highest level of experimental metadata curation, but also about the quality and biological validity of the data presented in the Differential Atlas.

To send us your feedback, please fill in the form accessed by clicking on the Feedback link in the top–right of the page:

Feedback Pop-up