spacer
spacer

Nucleotide Databases

<<< 2/3 >>>

EMBL website Explanation of an EMBL entry

The EMBL Nucleotide sequence database (also known as EMBL-Bank) is divided into sections that reflect major taxonomic divisions.

These taxonomic divisions include:

  • Invertebrates
  • Other Mammals
  • Mus musculus
  • Organelles
  • Bacteriophage
  • Plants
  • Prokaryotes
  • Rodents
  • Unclassified Viruses
  • Other Vertebrates
Non taxonomic sequence groups also part of EMBL-Bank are:
  • patents
  • htg
  • htc
  • gss
  • wgs
  • est (although some are within species specific files)
Each entry in a database must have a unique identifier that is a string of letters and/or numbers that only that record has. This unique identifier, which is known as the accession number, can be quoted in the scientific literature, as it will never change.

As the accession number must always remain the same, another code is used to indicate the different versions due to sequence corrections. You should therefore always take care to quote both the unique identifier and the version number, when referring to records in a nucleotide sequence database.

Click here to see a EMBL nucleotide sequence example entry

The nucleotide sequence identifier is of the form of 'Accession.Version' (eg, AJ000012.1). The first part is the never changing accession number, followed by a period and a version number. The accession number part will be stable, but the version part will be incremented when the sequence changes.

Although the nucleotide sequence data are checked for integrity and obvious errors by the data library staff, the quality of the data is the responsibility of the submitter. As a consequence, there are many errors in the database: many sequence entries are either mislabelled, contaminated, incompletely or erroneously annotated, or contain sequencing errors. In addition, the database is very redundant, in the sense that the same sequence from the same organism may be included many times, simply reflecting the redundancy of the original scientific reports.

Nucleotide Databases <<< 2/3 >>>


spacer
spacer