Baseline Expression Atlas help

About the Baseline Atlas

The Baseline Atlas represents information about which gene products are present (and at what abundance) in which "normal" condition (e.g., tissue, cell type). The Baseline Atlas consists of highly curated RNA–seq experiments from ArrayExpress, such as:

The above experiments have been re–analyzed using an in–house RNA–seq processing pipeline. The resulting data matrix stores gene expression profiles across the analyzed experimental variables as a set of normalized expression levels (FPKMs). You can search the data by a combination of genes attributes, experimental variables, and a minimum FPKM cutoff. See below for more details on searching.

Baseline Atlas at–a–glance

Baseline Atlas Experiment View 'At a Glance'

Filtering Baseline Atlas data

filtering

Some experiments have more than one experimental variable type. For example, in experiment E-GEOD-26284, three different types of RNA were extracted from six cellular components from 23 human cell lines. Results for these experiments are limited to a certain combination of variables, e.g. showing expression of total RNA in the whole cell. To limit the search results differently, use the "Change filters" menu. In a 3–factor experiment, you can choose one value from any two of the factors, and then search among the values for the remaining factor. E.g., if you choose to limit to poly(A) RNA extracted from the cytosol, you can then search among the cell lines for which this data is available.

Searching Baseline Atlas

Specifying the minimum expression level

You can specify a minimum expression level to act as a cutoff, and only genes expressed above this level are displayed. This is to aid distinction between "background noise" and "real" gene expression. Our default FPKM cutoff is 0.5. To change it, enter a new value into the "Expression level cutoff" input box, or drag the slider until the "Expression level cutoff" input box shows your desired cutoff value. A histogram showing the number of genes expressed above a range of cutoffs is also available. This lets you see how many genes you are excluding by selecting a given cutoff. If you see that you're excluding a lot of genes, you may wish to decrease the cutoff. Clicking on the Histogram enable/disable button button toggles display of the histogram.

  • If you have not selected any experimental variables, the histogram shows the number of genes expressed in any experimental variable for each of the shown cutoffs.
  • If you have selected at least one factor value, the histogram shows the number of genes expressed in at least one of the selected factor values, for each of the shown cutoffs.

Searching with genes

You can search with Ensembl gene symbols (e.g. SAA4), identifiers (e.g. ENSG00000148965) or biotypes (e.g. protein_coding), UniProt accessions (e.g. P35542), GO ("phosphoglycerate kinase activity") or InterPro terms (e.g. phosphatase). A space–separated list of gene attributes will bring back genes that match at least one of the attributes in the query. Put multi-word query terms in quotes e.g. "transcription factor binding".

If the "Exact match" box is checked, only genes with annotations that fully match your query will be returned. For example, to find the expression of the calmodulin 1 gene, check the "Exact match" box, type "CALM1" and click "Search". If the box is not checked, genes with annotations that contain words that fully match your query will be returned. For example, to find expressions for genes with "calmodulin" in their annotations (e.g. "calmodulin binding", "calmodulin regulated", …), uncheck the "Exact match" box, type "calmodulin" and click search.

If you search with a single gene you have the option of seeing the variation of expression among the biological replicates for each tissue (or other condition) in the experiment, for this gene. Select the "Display variation" to display a plot showing the minimum, maximum, median, and upper and lower quartiles for each set of biological replicates. Mouseover the box plots to see the actual values.

Gene expression variation in biological replicates

Similar expression

Another option available for single-gene queries is to view a list of genes that show a similar expression pattern across the tested conditions. Click the button "Add similary expressed genes" below the heatmap to display similar genes in the same heatmap and adjust the number of presented genes using the slider at the bottom. The order of the additional genes is determined by the degree of similarity, with the most similar expression pattern at the top.

Show similarly expressed genes

Gene sets

If you search for a term that matches multiple genes, e.g. a Reactome pathway ID such as "R-HSA-5693565", the heatmap will show one row per gene matching your term:

Gene set query before

You can summarize the results for each "gene set" term by clicking "show by gene set":

Gene set query after

Click "show individual genes" to return to the original view.

Searching with experimental variables

Use the experimental variable input box to search for e.g. organism part or cell line values, or leave it blank to search in all. You can search:

When you click on the experiment factor value input box, a dropdown menu appears, containing the list of all factor values available for selection:

Factor values dropdown

Select one or more factor values to search with, or start typing to see suggestions. To see all factor values in an experiment, click the Experiment Design button.

Specific vs. non–specific search

The default Baseline Atlas search reports genes with more "specific" expression (with the Specific option selected by default). What we mean by this is outlined below.

If the query contains no factor values…

If the query contains no experimental variables, the Specific search will report genes expressed in just one experimental variable, then genes expressed in just two factor values, then three and so on, reporting any genes expressed in all the available factor values at the end of the list of results. Within each group of genes expressed in N factor values only, genes are sorted by the highest average FPKM across the N factor values in which they are expressed.

If the query contains at least one factor value…

If the query contains at least one experimental variable, the Specific search will promote genes with higher average FPKM in the queried factor value(s), and at the same time penalize genes with higher FPKMs in the non–queried factor values. This is done using the "fold difference" between the query and non-query FPKMs:

  • Fold difference is calculated by dividing the average FPKM of the queried factor values by the highest FPKM of the non-query factor values.
  • If none of the non-query FPKMs are above the selected expression level cutoff, the average of the query FPKMs is divided by the cutoff instead.
  • If none of the non-query FPKMs are above the cutoff and the cutoff is zero, 0.09 is used as the denominator instead of the cutoff.

Genes with the greatest fold difference in expression between query and non–query factor values will be pushed towards the top of the list. Genes that have higher expression in non-query factor values than in the query factor values are not shown.

Non–specific search

If the Specific option is not selected, the Baseline Atlas search will ignore specificity of expression when ordering the results. In particular:

  • If the query contains no experimental variables, genes with the highest expression across all factor values will be reported first.
  • If the query contains at least one experimental variable, genes with higher FPKM across the selected factor values are reported first. Expression levels across the non–selected factor values are ignored in the ordering of results.

Heatmap results display

Results for your query are shown in a heatmap table. The heatmap rows and columns are labelled with gene symbols and experimental variables, respectively. The heatmap shows expressions by colour intensity, according to the gradient bar,heatmap colour gradient, displayed above the heatmap. Greater colour intensity means higher relative gene expression. To see the exact FPKM for all genes, click on the Display levels button. This also reveals minimum and maximum FPKMs on each side of the gradient bar. The gradient shows intensities corresponding to expression levels for the top 50 genes currently displayed (rather than all FPKM values for all genes returned by the query).

When displayed factor values are organism parts, mousing over each heatmap column highlights the corresponding tissue in the anatomogram on the left (if available). If the selected organism part is gender–specifc, and anatomogram is available for each gender, you can switch between them by clicking the Gender selection button button.

If you mouse-over a gene name in the heatmap, synonyms, Gene Ontology terms, and Interpro terms are displayed:

gene mouseover popup

Any terms that match your search query term(s) are highlighted in yellow here. Click the gene name to see more information about that gene, including other Expression Atlas experiments in which it was found.

Visualising baseline expression at Ensembl

If you select a gene name and a condition heading from the heatmap, the Ensembl Genome Browser Open button, to the left of the heatmap, will become clickable.

screenshots showing how to visualise data in Ensembl

Clicking the Open button will take you to the Ensembl or Ensembl Genomes browser, which will display the FPKM for the condition (e.g. tissue) that you selected in the context of the genomic location of the gene you selected. For more details about how to use the genome browsers, please see the relevant documentation for Ensembl or Ensembl Genomes.

Hierarchical clustering visualisation

Click the hierarchical clustering button to see the results of hierarchical clustering using the top 100 most variable genes across all conditions (e.g. tissues, strains, developmental stages, etc) in the experiment. Clustering is performed using the heatmap.2 function from the gplots package in R. For more information about this method, please refer to the package documentation.

Anatomogram

The anatomogram provides a visualisation of the tissues studied in a given experiment. Mouseover a tissue in the anatomogram to highlight the corresponding column in the heatmap table.

Downloading query results

The top 50 genes resulting from your search are displayed on the page. Click on the Download button button to download the full results of your query in tab–delimited format with no ordering.

Experiment design page

The Experiment Design page (e.g. Illumina Body Map) shows RNA–seq processing run acessions (from ENA), along with their corresponding biological sample characteristcs and factor values. You can access it from the experiment page by clicking the Experiment Design button button. Only the runs used in the Baseline Atlas are shown, although the dataset these were taken from may contain more. To see all runs in the dataset, uncheck Show Analysed only.

Sorting the experiment design

You can sort the experiment design table by clicking on a column. Click the column again to reverse the sort order. To then sort by another column while retaining the ordering of the first column, hold down the shift key and click another column. The first selected column forms the primary sort order and the shift+click–ed columns form secondary, tertiary, etc. sort orders in turn. If you shift+click a column to sort by it and then wish to undo this, just shift+click it again until the column is unsorted (unsorted
        arrows).

The experiment design table is also searchable — entering a keyword automatically selects the subsection of the table that matches the keyword. Click on the Download button button on the right–hand side to download the full results of the current query (with no ordering applied) in tab–delimited format.

Analysis methods page

The Analysis Methods page (e.g. Illumina Body Map) lists the data analysis methods we applied to the raw experimental data in FASTQ format, to obtain the gene expression levels shown in the Baseline Atlas. You can access it by clicking the Analysis Methods
        button button.

Providing feedback

We value your feedback on what Expression Atlas is doing right, what does not work or what could be achieved more intuitively. We would also be grateful for any feedback on the analysis methods we have adopted — we are passionate not only about intuitive data presentation, the highest level of experimental metadata curation, but also about the quality and biological validity of the expression data presented in Baseline Atlas.

To send us your feedback, please fill in the form accessed by clicking on the Feedback link on the top right of the page:

Feedback Pop-up