Searching and visualising data in EBI Metagenomics portal

Simple sample searching

Data in EBI Metagenomics is structured into projects, samples and runs. Each project contains one or more samples, and each sample can have one or more experiments associated with it (e.g. metagenomic and metatranscriptomic), which can be from individual runs or pooled runs from a sequencing machine.

You can access projects and samples via the navigation menu on the website. The resulting lists can be filtered to narrow down the information shown; for example, searching with a word or phrase (e.g. ‘Gut’) will show only those projects or samples with metadata containing that term. You can also explore the project and sample list by biome using a drop-down menu (Figure 2a).

Figure 2a  Search text fields and drop-down menus allow easy searching on the Projects and Samples pages (example here is a Projects page).

Project overview pages

In the Project section, you can see the list of projects and the number of samples associated with each project. Clicking on a particular project name will display the Project Overview page, where a description of the project, links to any related publications, links to the ENA archive and contact details for the submitter are provided (Figure 2b). The samples associated with the project are listed, with links to the Sample Overview and Run Analysis result pages. In addition, by clicking on the ‘Analysis summary’ tab you can access abundance tables that summarise the count for functional and taxonomy categories (where available), across all samples from a particular project. This enables you to easily download project summary data and identify patterns in datasets.

Figure 2b  Example of a Project Overview page. Links are provided to the Sample and Run Overviews, as well as to the Run Analysis pages.

Sample and Run Overview page

The Sample Overview page displays the run(s) associated with a sample, as well as the type of analysis performed (metagenomic, metatranscriptomic, etc) and the pipeline version used to analyse the run(s). A link to the ENA archive is also provided. By selecting a run id, users can access the run description and associated contextual data (such as the latitude and longitude where sampling was performed, the temperature, etc).

The analysis results for the run are also provided, through four navigation tabs (Figure 2c):

1) Quality control: describes the quality control steps performed on the run and the read count after each of these steps.

2) Taxonomy analysis contains interactive charts displaying the prokaryotic content of the run data. Users can also choose alternative representations with the taxonomic data summarised at the phylum level or species level (Figure 2d). All of these charts can be downloaded in a range of formats.

3) Functional analysis: includes general statistics about the features determined from the run. It also displays protein function annotations, in the form of InterPro analysis results, and a representation of the Gene Ontology terms predicted for the run, (Figure 2e). You can download all representations.

4) Download: provides links to the raw data sequences that was analysed. It also lists a number of sequence files generated during the different stages of the analysis, in FASTA format, and a number of functional and taxonomic analysis result files, in tab-separated and machine-readable formats.

Figure 2c  Example of a Run Overview page and contextual data. Links to the Run Analysis result pages are provided as tabs.

Figure 2d  Taxonomy analysis page showing static and interactive charts of taxonomy assignments.

 

Figure 2e  Functional analysis page showing feature statistics with InterPro and GO annotations.

Comparing EBI Metagenomics analysis results

The analysis summary files provided at the project level allow you to compare the read abundance between all samples of a project for InterPro and GO terms, as well as taxonomic assignments. The data are in tab-separated format, which is compatible with widely available software such as Excel or Libreoffice Calc.

In addition, EBI Metagenomics has developed a Comparison tool that allows you to compare the GO terms predicted for runs within a project (Figure 3a). Once a project has been selected, you can choose data to compare from a list of available metagenomic or metatranscriptomic runs. The tool then generates corresponding comparitive , such as barcharts, stacked columns, heatmaps and Principal Component Analysis (PCA) charts (Figure 3b). These charts are available to download, along with the underlying data in table form.

Figure 3a  EBI Metagenomics Comparison tool front page.

Figure 3b  Example of EBI Metagenomics Comparison tool outputs.