What can I do with Ensembl Genomes

Using Ensembl Genomes data

The genome is a natural entry point for many types of bioinformatics data, providing both a reference framework and a scientific context within which the results of transcriptomic or proteomic data can be interpreted.

Using Ensembl Genomes, you can:

  • Retrieve all or part of a genome sequence.
  • Use the sequence alignment search tool BLAST against any genome.
  • Link to genome annotation from microarray results.
  • Examine features (e.g. protein coding genes, non-protein coding genes and SNPs) in a chromosomal region.
  • View all alternative transcripts (including variants caused by alternative splicing and promoter usage) for a gene.
  • View positions and sequences of mRNAs and proteins that align with a gene.
  • Explore homologues and phylogenetic trees across, within and between clades.
  • View sequence alignments and conserved regions across species.
  • Export sequence, or create a table of gene or SNP information using BioMart (see Getting data from Ensembl Genomes).
  • Upload your own data, including next-generation sequencing reads (in common alignment formats such as BAM) to view in the context of public annotation.
  • Predict the effect of nucleotide sequence variants on genes and their products.

The taxonomic spread of Ensembl Genomes allows you to view comparative genomic information at many levels, focusing either on a narrow taxonomic range (potentially as small as one bacterial species) or a pan-taxonomic focus (from microbe to man). Additionally, the provision of data for pathogenic species, their vectors and their hosts through a common interface provides a unique resource for the study of pathogen-mediated diseases of medical and agricultural importance.

Genome browser

The genome browser is the main point of access for most users of Ensembl Genomes. The browser displays gene structures, supporting evidence, cross-references, genome-wide assays and a range of other features. It is easily configurable and functions as a client for the DAS protocol, allowing you to rapidly integrate additional data.

The browser enables you to upload your own next generation RNA sequence data in common file formats (such as BAM). You can also visualise your alignments alongside the reference annotation (Figures 2 and 3).

Visualising a BAM file in Ensembl Genomes

Figure 2 Visualising a BAM file in Ensembl Genomes. In this example, conventional EST data (green tracks) are visualised against the reference gene annotation, and a BAM file (showing RNA-seq data) has also been visualised. Individual read alignments (grey tracks, with red ends on each read), and overall read coverage at each position (top track, grey), are shown.

 

Figure 3 Variant annotation in Location View. This example shows a simple visualisation of sequence variations identified at the displayed locus, colour coded by their functional effects on their most proximate gene. This is one of many views that provide access to variation data in Ensembl Genomes.

Bacterial genomes

Ensembl Bacteria provides special graphical representations to support the correct depiction of bacterial genomes, including circular chromosomes, alternative translational initiation and polycistronic transcripts (Figure 4).

Figure 4 Ensembl Bacteria web page showing the circular navigation tool. Users can click on the round handles at the end of the selected sector (shown in red) to adjust the region shown in the 'Region in detail' window.

Tools for comparative genomics

The Ensembl comparative genomics analyses (Ensembl Compara) are applied to all domains within Ensembl and Ensembl Genomes. The DNA comparison modules are suitable for use over a narrow taxonomic range, and can be used to support multiple sequence alignments and to make predictions of ancestral DNA sequence (Figure 5). The protein comparison modules compute the evolutionary history of a gene familiy, and can be applied over much larger evolutionary distances. A 'gene tree' provides a summary of this information (and the anchor point for outward links to pairwsie and multiple-sequence alignments, and orthologue/paralogue lists).

Figure 5 Visualisation of homology relationships in Ensembl Genomes: viewing only the current gene CRP34 and its immediate orthologues. Using the view options, you can expand selected nodes to see more distant branches of the tree in more detail.

Visualising genomic polymorphism

The Ensembl variation schema is used to capture information from population-wide surveys of sequence polymorphism, and is currently populated for one mosquito, two fungi, one protist, and four plant species. Data are available for download or can be visualised in the genome browser (Figure 6).

Figure 6 Visualising genomic variation in a genetic context. The figure shows the location and functional consequence of genomic polymorphisms in the context of a neighbouring gene, and the domain structure of the protein it encodes.