Exploring an EMBL-Bank entry

Overview of an EMBL-Bank entry

EMBL-Bank provides an easy-to-read view of the data, where information such as taxonomy and annotation features are grouped into separate sections. In addition there is a graphical view of assembly and annotation features. EMBL-Bank also has a plain text view that is useful for programmatic access (Figure 33). 

EMBL-Bank entry for BN000065; on the left is the default view and on the right the plain text view

Figure 33. EMBL-Bank entry for BN000065; on the left is the default view and on the right the plain text view.

Notes

[A] In the default view, the entry summary provides information on the organism, data class, taxonomic division and sequence; you can also download sequence and change the view of the entry.

[B] Navigation provides links to other resources, including the taxonomy portal, Ensembl and the Sequence Version Archive (of old versions of the entry).

[C] Overview provides a graphical display of assembly and annotation data.

[D] Source Feature(s) provides information on the source of the sequence, such as the organism, organelle or country it was isolated in.

[E] Other Features provides detailed information about the function of different regions of the sequence.

[F] Assembly provides detailed information on how the sequence has been constructed from lower level sequences.

[G] References enable you to view the paper(s) citing the sequence and its annotation.

[H] Sequence can be used to search for similar sequences in the database.

[I] The text view of the same entry; this can be accessed by clicking on 'TEXT' in section [A]. This view is useful of you are writing programs as it provides all the line codes that identifies the line type; for example 'DE' identifies the 'Description' line.

EMBL-Bank entry - General summary section

The top of an EMBL-Bank entry provides a general summary of the data, the ability to change the view of the entry and to download information (Figure 34).

EMBL-Bank entry BX548174 displaying general entry information

Figure 34. EMBL-Bank entry BX548174 displaying general entry information.

Notes

[A] Accession and Description of the entry (line codes AC and DE); 'BN000065.1' shows it is version 1 of sequence BN000065.

[B] View enables you to change the view to plain TEXT or XML, or just the sequence in FASTA format.

[C] Download enables you to download the entry as TEXT or XML, or the sequence in FASTA format.

[D] Navigation bar allows you to jump to a specific section of the entry.

[E] Summary of entry data, including the date the entry became public and the date of the last revision (line code DT).

[F] Keywords can be used in a text search (line code KW).

[G] Secondary Accession(s) allow tracking of split/merged entries as well as entries used to construct a sequence (line code AC).

[H] Lineage provides the full lineage of the organism; clicking on any node of the lineage will take you the taxonomy portal (line code OC).

 

EMBL-Bank entry - Navigation section

EMBL-Bank provides cross-references to almost forty other databases, including Ensembl, UniProtKB, InterPro, RFAM, WormBase, GrainGenes, dictyBase, FlyBase, VectorBase, GOA, PDB and IMGT/HLA. An entry will contain links to the external database(s) that have information on the sequence, providing a valuable source of additional annotation (Figure 35).

EMBL-Bank entry BX548174 showing Navigation section (DR line)

Figure 35. EMBL-Bank entry BX548174 showing Navigation section (DR line); each cross-reference has a link to the relevant database entry, such as the RNA database Rfam.

Notes

[A] Up arrow allows you to navigate up to a higher level record

[B] Tree symbol allows you to navigate the taxonomy tree (line code FT)

[C] Across arrow provides a cross-reference database link (line code DR)

 

EMBL-Bank entry - Overview section

The overview section provides an at-a-glance graphical display of the assembly and annotation features of the sequence (Figure 36). Annotation features describe where genes, mRNA, exons, introns, CDS (coding sequence) and other features are located on the sequence. This information is supplied by the author, or occasionally as third party annotation (see section 'How is the sequence annotated').

EMBL-Bank entry BN000065 showing the Overview section

 Figure 36. EMBL-Bank entry BN000065 showing the Overview section.
Notes

[A] Base range enables you to zoom in to a specific region of the sequence.

[B] Overview shows the full length of the sequence as a grey bar, with a red box around the region being described below.

[C] Assembly shows the clones used in the assembly of the sequence (line code AS).

[D] Source describes the source of the sequence (line code FT).

[E] Features such as genes, mRNA, exon, CDS and intron are shown relative to their position on the sequence (line code FT).

EMBL-Bank entry - Source features section

The source features section details where the sequence came from (Figure 37). For more information please see the section on 'How to search ENA with taxonomy'.

EMBL-Bank entry Z71230 showing the Source Feature(s) section

Figure 37. EMBL-Bank entry Z71230 showing the Source Feature(s) section.

Notes

[A] Taxon provides a link to the taxonomy portal, which provides a summary of all the sequence available for an organism (line code FT).

EMBL-Bank entry - Other features section

In addition to the graphical display of the annotation features we saw in the Overview section, EMBL-Bank also provides a detailed description of each feature in the 'Other Feature(s)' section (Figure 38). There are over fifty different features that can provide annotation for a sequence, and over seventy different qualifiers that help refine these features. Which features are described in a particular entry depends on the data the author submitted (ENA curators do not add features; they are provided by either the author or by third party annotation).

EMBL-Bank entry BN000065 showing the Other Feature(s) section

Figure 38. EMBL-Bank entry BN000065 showing the Other Feature(s) section.

Notes

[A] Base range allows you to restrict the annotation features to those within a specific sequence range.

[B] Show main features only restricts the display to main features such as mRNA and CDS.

[C] Features describe the annotation for the sequence; there are >50 features, including CDS, mRNA, exon and intron (line code FT).

[D] Qualifiers refine each feature (line code FT).

In this example, the feature 'CDS' is further refined by the qualifiers 'gene', 'product' and 'translation'.

[E] Navigation provides cross-references to other databases, including UniProtKB, InterPro and GOA (line code FT).

 

Information

For a full list of the features and qualifiers available, please see here.

 

EMBL-Bank entry - References section

Literature references relating to the submitted sequence, including third party annotation, are provided in the reference list (Figure 39). Cited literature should be considered as a pointer to scientific information and not a credit for the elucidation of the sequence.

EMBL-Bank entry BN000065 showing the reference section

Figure 39. EMBL-Bank entry BN000065 showing the reference section.

Notes

[A] Abstract can be expanded to view the abstract.

[B] Links are provided to the full paper as a pdf, doi, html or as a cross reference to CiteXplore.

 

EMBL-Bank entry - Sequence section

Either the full or part of the sequence can be viewed in FASTA format (Figure 40).

EMBL-Bank entry showing the sequence section

Figure 40. EMBL-Bank entry showing the sequence section.

Notes

[A] Base range allows you to restrict the sequence to a specified range.

[B] Find similar sequences will launch a sequence search on the displayed sequence.

[C] Sequence is shown in FASTA format (line code SQ).