 |
Nucleotide Databases
|
Explanation of an EMBL entry
The EMBL Nucleotide sequence database (also known as EMBL-Bank) is divided into sections that
reflect major taxonomic divisions.
These taxonomic divisions include:
- Invertebrates
- Other Mammals
- Mus musculus
- Organelles
- Bacteriophage
- Plants
- Prokaryotes
- Rodents
- Unclassified
Viruses
- Other Vertebrates
Non taxonomic sequence groups also part of EMBL-Bank are:
- patents
- htg
- htc
- gss
- wgs
- est (although some are within species specific files)
Each entry in a database must have a unique identifier that is a string of letters and/or numbers that
only that record has. This unique identifier, which is known as the accession number, can be quoted in
the scientific literature, as it will never change.
As the accession number must always remain the same,
another code is used to indicate the different versions due to sequence corrections. You should therefore
always take care to quote both the unique identifier and the version number, when referring to records
in a nucleotide sequence database.
Click here to see a EMBL nucleotide sequence example entry
The nucleotide sequence identifier is of the form of 'Accession.Version' (eg, AJ000012.1).
The first part is the never changing accession number, followed by a period and a version number.
The accession number part will be stable, but the version part will be incremented when the
sequence changes.
Although the nucleotide sequence data are checked for integrity and obvious errors by the data
library staff, the quality of the data is the responsibility of the submitter. As a consequence,
there are many errors in the database: many sequence entries are either mislabelled, contaminated,
incompletely or erroneously annotated, or contain sequencing errors. In addition, the database is
very redundant, in the sense that the same sequence from the same organism may be included many
times, simply reflecting the redundancy of the original scientific reports.
|
|
|
Nucleotide Databases <<< 2/3 >>> |
|