Baseline Expression Atlas help
The Baseline Atlas represents information about which gene
products are present (and at what abundance) in which "normal" condition (e.g.,
tissue, cell type). The Baseline Atlas consists of
highly curated RNA–seq experiments from ArrayExpress, such as:
The above experiments have been re–analyzed using an in–house RNA–seq
processing pipeline. The resulting data matrix stores gene
expression profiles across the analyzed experimental variables as a set of
normalized expression levels (FPKMs).
You can search the data by a combination of genes attributes, experimental
variables, and a minimum FPKM cutoff. See below for more details on searching.
Some experiments have more than one experimental variable type. For example, in experiment
three different types of RNA were extracted from six cellular components from 23
human cell lines. Results for these experiments are limited to a certain
combination of variables, e.g. showing expression of total RNA in the
whole cell. To limit the search results differently, use the "Change
filters" menu. In a 3–factor experiment, you can choose one value from any two
of the factors, and then search among the values for the remaining factor.
E.g., if you choose to limit to poly(A) RNA extracted from the
cytosol, you can then search among the cell lines for which this
data is available.
You can specify a minimum expression level to act as a cutoff, and only genes expressed above this level are displayed. This is to aid distinction between "background noise" and "real" gene expression. Our default FPKM cutoff is 0.5. To change it, enter a new value into the "Expression level cutoff" input box, or drag the slider until the "Expression level cutoff" input box shows your desired cutoff value. A histogram showing the number of genes expressed above a range of cutoffs is also available. This lets you see how many genes you are excluding by selecting a given cutoff. If you see that you're excluding a lot of genes, you may wish to decrease the cutoff. Clicking on the button toggles display of the histogram.
- If you have not selected any experimental variables, the histogram shows the number of genes expressed in any experimental variable for each of the shown cutoffs.
- If you have selected at least one factor value, the histogram shows the number of genes expressed in at least one of the selected factor values, for each of the shown cutoffs.
You can search with Ensembl gene symbols (e.g. SAA4), identifiers (e.g. ENSG00000148965) or biotypes (e.g. protein_coding), UniProt accessions (e.g. P35542), GO ("phosphoglycerate kinase activity") or InterPro terms (e.g. phosphatase). A space–separated list of gene attributes will bring back genes that match at least one of the attributes in the query. Put multi-word query terms in quotes e.g. "transcription factor binding".
If the "Exact match" box is checked, only genes with annotations that fully match your query will be returned. For example, to find the expression of the calmodulin 1 gene, check the "Exact match" box, type "CALM1" and click "Search". If the box is not checked, genes with annotations that contain words that fully match your query will be returned. For example, to find expressions for genes with "calmodulin" in their annotations (e.g. "calmodulin binding", "calmodulin regulated", …), uncheck the "Exact match" box, type "calmodulin" and click search.
If you search with a single gene you have the option of seeing the variation of expression among the biological replicates for each tissue (or other condition) in the experiment, for this gene. Select the "Display variation" to display a plot showing the minimum, maximum, median, and upper and lower quartiles for each set of biological replicates. Mouseover the box plots to see the actual values.
Another option available for single-gene queries is to view a list of genes that show a similar expression pattern across the tested conditions. Click the button "Add similary expressed genes" below the heatmap to display similar genes in the same heatmap and adjust the number of presented genes using the slider at the bottom. The order of the additional genes is determined by the degree of similarity, with the most similar expression pattern at the top.
If you search for a term that matches multiple genes, e.g. a Reactome pathway ID such as "R-HSA-5693565", the heatmap will show one row per gene matching your term:
You can summarize the results for each "gene set" term by clicking "show by gene set":
Click "show individual genes" to return to the original view.
Use the experimental variable input box to search for e.g. organism part or cell line values, or leave it blank to search in all. You can search:
When you click on the experiment factor value input box, a dropdown menu appears, containing the list of all factor values available for selection:
Select one or more factor values to search with, or start typing to see suggestions. To see all factor values in an experiment, click the button.
Specific vs. non–specific search
The default Baseline Atlas search reports genes with more "specific" expression (with the Specific option selected by default). What we mean by this is outlined below.
If the query contains no factor values…
If the query contains no experimental variables, the Specific search will report genes expressed in just one experimental variable, then genes expressed in just two factor values, then three and so on, reporting any genes expressed in all the available factor values at the end of the list of results. Within each group of genes expressed in N factor values only, genes are sorted by the highest average FPKM across the N factor values in which they are expressed.
If the query contains at least one factor value…
If the query contains at least one experimental variable, the Specific search will promote genes with higher average FPKM in the queried factor value(s), and at the same time penalize genes with higher FPKMs in the non–queried factor values. This is done using the "fold difference" between the query and non-query FPKMs:
- Fold difference is calculated by dividing the average FPKM of the queried factor values by the highest FPKM of the non-query factor values.
- If none of the non-query FPKMs are above the selected expression level cutoff, the average of the query FPKMs is divided by the cutoff instead.
- If none of the non-query FPKMs are above the cutoff and the cutoff is zero, 0.09 is used as the denominator instead of the cutoff.
Genes with the greatest fold difference in expression between query and non–query factor values will be pushed towards the top of the list. Genes that have higher expression in non-query factor values than in the query factor values are not shown.
If the Specific option is not selected, the Baseline Atlas search will ignore specificity of expression when ordering the results. In particular:
- If the query contains no experimental variables, genes with the highest expression across all factor values will be reported first.
- If the query contains at least one experimental variable, genes with higher FPKM across the selected factor values are reported first. Expression levels across the non–selected factor values are ignored in the ordering of results.
Results for your query are shown in a heatmap table. The heatmap rows and columns are labelled with gene symbols and experimental variables, respectively. The heatmap shows expressions by colour intensity, according to the gradient bar,, displayed above the heatmap. Greater colour intensity means higher relative gene expression. To see the exact FPKM for all genes, click on the button. This also reveals minimum and maximum FPKMs on each side of the gradient bar. The gradient shows intensities corresponding to expression levels for the top 50 genes currently displayed (rather than all FPKM values for all genes returned by the query).
When displayed factor values are organism parts, mousing over each heatmap column highlights the corresponding tissue in the anatomogram on the left (if available). If the selected organism part is gender–specifc, and anatomogram is available for each gender, you can switch between them by clicking the button.
If you mouse-over a gene name in the heatmap, synonyms, Gene Ontology terms, and Interpro terms are displayed:
Any terms that match your search query term(s) are highlighted in yellow here. Click the gene name to see more information about that gene, including other Expression Atlas experiments in which it was found.
If you select a gene name and a condition heading from the heatmap, the Ensembl Genome Browser Open button, to the left of the heatmap, will become clickable.
Clicking the Open button will take you to the Ensembl or Ensembl Genomes browser, which will
display the FPKM for the condition (e.g. tissue) that you selected in the
context of the genomic location of the gene you selected. For more details
about how to use the genome browsers, please see the relevant documentation for
Ensembl or Ensembl Genomes.
Click the button to see the results of hierarchical
clustering using the top 100 most variable genes across all conditions (e.g.
tissues, strains, developmental stages, etc) in the experiment. Clustering is
performed using the heatmap.2
function from the gplots package in R. For more information about this method,
please refer to the package
The anatomogram provides a visualisation of the tissues studied in a given experiment. Mouseover a tissue in the anatomogram to highlight the corresponding column in the heatmap table.
The top 50 genes resulting from your search are displayed on the page. Click on the button to download the full results of your query in tab–delimited format with no ordering.
The Experiment Design page (e.g. Illumina Body Map)
shows RNA–seq processing run acessions (from ENA), along with their corresponding
biological sample characteristcs and factor values. You can access it from the
experiment page by clicking the button. Only the
runs used in the Baseline Atlas are shown, although the dataset these were
taken from may contain more. To see all runs in the dataset, uncheck Show
You can sort the experiment design table by clicking on a column. Click the
column again to reverse the sort order. To then sort by another column while
retaining the ordering of the first column, hold down the shift key and
click another column. The first selected column forms the primary sort order
and the shift+click–ed columns form secondary, tertiary, etc. sort orders
in turn. If you shift+click a column to sort by it and then wish to undo this,
just shift+click it again until the column is unsorted ().
The experiment design table is also searchable — entering a keyword
automatically selects the subsection of the table that matches the keyword.
Click on the button on the
right–hand side to download the full results of the current query (with
no ordering applied) in tab–delimited format.
The Analysis Methods page (e.g. Illumina Body
Map) lists the data analysis methods we applied to the raw experimental
data in FASTQ
format, to obtain the gene expression levels shown in the Baseline Atlas. You can
access it by clicking the button.
We value your feedback on what Expression Atlas is doing right, what does not work or what could be achieved more intuitively. We would also be grateful for any feedback on the analysis methods we have adopted — we are passionate not only about intuitive data presentation, the highest level of experimental metadata curation, but also about the quality and biological validity of the expression data presented in Baseline Atlas.
To send us your feedback, please fill in the form accessed by clicking on the Feedback link on the top right of the page: