Ensembl genes

The gene set

The Ensembl gene set is based on evidence, and includes manual annotation for our most used species (Figure 6). 

Sequences in public databases are aligned to the genome in order to determine positions of genes, along with splice variants

Figure 6. Sequences in public databases are aligned to the genome in order to determine positions of genes, along with splice variants.

 

Help

How can I view the genes, and information like sequence?  Jump to this section, or watch the video below.

 

The GeneBuild

The initial step is to obtain sequenced genomes from official centres. The sequenced genomes are then annotated in the Ensembl pipeline (also known as the Ensembl genebuild) using both automatic annotation, and manual curation for some species. Human, mouse, and zebrafish gene sets include manual annotation from the HAVANA project. The Ensembl gene set for human, including Havana transcripts, is the GENCODE set.

All Ensembl transcripts are based on experimental evidence, and draw on mRNAs and protein sequences deposited into public databases (such as UniProtKB and NCBI RefSeq) from the scientific community. The Ensembl gene set also includes automatically-annotated pseudogenes, non-coding RNAs, and alternative splicing events for model organisms. The resulting analyses of the genomes are stored in the Ensembl databases and can be accessed via the Ensembl website, BioMart and programmatically.

Help

See the annotation article for more about the Ensembl genebuild pipeline, gene names and annotation.

 

Information

For more information you can read about the mouse genebuild.