EMBL Nucleotide Sequence Database Release Notes
Release 75 June 2003
EMBL Outstation
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD
United Kingdom
Telephone: +44-1223-494400
Telefax : +44-1223-494468
Email: datalib@ebi.ac.uk
URL: http://www.ebi.ac.uk
CONTENTS
1 RELEASE 75
1.1 Feature Table Definition Document v5
1.2 Database Files
1.2.1 Naming Conventions
1.2.2 EST Database Files
1.2.3 GSS Database Files
1.2.4 INV Database Files
1.2.5 HUM Database Files
1.2.6 HTG Database Files
1.2.7 PAT Database Files
1.2.8 CON Database Files
1.2.9 CRC Values for Distributed Files
1.3 Cross-Reference Information
1.3.1 Cross-references to GeneDB
1.3.2 PUBMED/MEDLINE references
1.3.3 Cross-references statistics
1.4 Sequence Retrieval System (SRS)
1.5 EMBL Database FAQ
1.6 Disclaimer
2 FORTHCOMING CHANGES
2.1 Sequence length limit : update
2.2 New accession number formats
2.3 Molecule type information : update
2.4 New feature : gap
2.5 New qualifier : /ecotype
2.6 Line length in the flatfiles
2.7 Retrofits : /segment and /variety
2.8 Electronic resources: modification of RL line
3 SEQUENCE SUBMISSION SYSTEMS
4 CITING THE EMBL NUCLEOTIDE SEQUENCE DATABASE
5 EBI NETWORK SERVICES
5.1 Electronic Mail Server
5.2 Anonymous FTP Server
5.3 World Wide Web (WWW) Server
5.4 Sequence Version Archive
5.5 Sequence Similarity Search Servers
6 DISTRIBUTION FILES
6.1 Release 75 Files
APPENDIX A : DATABASE GROWTH TABLE
1 RELEASE 75
The EMBL Nucleotide Sequence Database was frozen to make Release 75 on
30-MAY-2003. The release contains 25,214,767 sequence entries comprising
32,195,012,823 nucleotides. This represents an increase of about 10% over
Release 74.
A breakdown of Release 75 by division is shown below:
| Division | Entries | Nucleotides |
| Constructed | 184 | 838,699,179 (see Note) |
| ESTs | 16,928,340 | 8,595,205,394 |
| Fungi | 77,835 | 121,563,447 |
| GSSs | 5,360,409 | 3,158,737,181 |
| HTC | 151,653 | 199,532,408 |
| HTG | 68,836 | 11,785,274,832 |
| Human | 248,193 | 3,969,969,766 |
| Invertebrates | 121,828 | 609,642,910 |
| Other Mammals | 48,751 | 103,252,725 |
| Mus musculus | 74,469 | 1,135,155,874 |
| Organelles | 204,489 | 168,241,510 |
| Patents | 1,122,385 | 616,778,214 |
| Bacteriophage | 2,332 | 8,445,422 |
| Plants | 176,501 | 568,303,125 |
| Prokaryotes | 190,734 | 638,372,821 |
| Rodents | 24,936 | 45,228,463 |
| STSs | 165,775 | 69,132,571 |
| Synthetic | 8,750 | 16,009,034 |
| Unclassified | 1,485 | 2,124,638 |
| Viruses | 188,213 | 167,797,523 |
| Other Vertebrates | 48,669 | 216,244,965 |
| Total | 25,214,767 | 32,195,012,823 |
EMBL database statistics are available at
URL: http://www.ebi.ac.uk/embl/Services/DBStats/
Note: The nucleotide count for CON(structed) entries is included
in the table, but not in the total. Starting from
this release, the nucleotide count for CON entries will not
be included in the sum, because CON entries don't include
sequence as such, just assembly information for the segments.
The nucleotide count for the segments of a CON entry is in the
taxonomic divisions and already included into the total.
1.1 Feature Table Definition Document v5
The last version of the Feature Table Definition Document (FTv5) has been
implemented on 15-DEC-2002. The document is available from the EBI
servers at:
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/
Next edition of the Feature table document will become available in
Octover 2003
1.2 Database Files
In order to keep the size of the data files within reasonable limits for
handling purposes, additional division files will be added in subsequent
releases as appropriate.
1.2.1 Naming Conventions
When a division is split into several files, these are named so that
they sort sequentially, e.g. est_hum01.dat, est_hum02.dat,......,
est_hum22.dat, est_hum23.dat etc
1.2.2 EST Database Files
ESTs (single pass cDNA reads) constitute a major source of sequence
records.
EST files are split according to taxonomic subdivisions following
the model of the taxonomic split of all other EMBL database divisions,
for example files est_fun01.dat - est_fun02.dat contains fungi EST data.
For the full list of distribution files see table in section 6.1
1.2.3 GSS Database Files
Genome Survey Sequences (GSS) are of similar nature to EST data, except
that sequences are genomic rather than cDNA (mRNA). The GSS division
contains e.g. random `single pass read' genome survey sequences, single
pass reads from cosmid/BAC/YAC ends, exon trapped genomic sequences and
Alu PCR sequences.
GSS division files are also split according to taxonomic subdivisions.
Example: gss_hum01.dat - gss_hum09.dat contain human GSS data
For the full list of distribution files see table in section 6.1
1.2.4 INV Database Files
The INV division has been split into 3 files (inv01.dat - inv03.dat).
1.2.5 HUM Database Files
The HUM division has been split into 25 files (hum01.dat-hum25.dat).
1.2.6 HTG Database Files
'Unfinished' DNA sequences generated by the high-throughput sequencing
centres are represented in the HTG division and are rapidly made
available
to the scientific community for homology searches. Entries in this
division all contain keywords to indicate the status of the sequencing
(e.g., HTGS_PHASE1). A single accession number is assigned to one clone,
and as sequencing progresses and the entry passes from one phase to
another, it will retain the same accession number. Once 'finished', HTG
sequences are moved into the appropriate primary EMBL taxonomic division.
HTGS_PHASE0 entries typically consist of one-to-few pass reads of a single
clone, have not been assembled into contigs and are unoriented, unordered,
unannotated and contain gaps of unknown length.
Low-pass sequence sampling is useful for identifying clones that may be
gene-rich. Phase0 sequences are used to check whether another centre is
already sequencing this clone. If not, it will be sequenced through phase
1 and phase 2. When records are updated, the accession numbers will be
preserved.
HTG division files are split according to taxonomic subdivisions.
Example: htg_hum01.dat - htg_hum05.dat - Human HTG data
For the full list of distribution files see table in section 6.1
1.2.7 PAT Database Files
PAT files include sequence data incorporated from the European patent
literature (EPO) and complemented by American and Japanese patent
data integrated from NCBI(USA)and DDBJ(Japan).
The Patent division has been split into 6 files (pat01.dat - pat06.dat)
1.2.8 CON Database File
CON files include construct information for building contig sequences
of chromosomes, genomes and other long DNA sequences. CON entries in
file 'embl.con' do not contain sequence data per se. CON division
data is included in file embl.con
1.2.9 CRC values for distributed files
To help users verify the integrity of release data files, we supply files
containing 32-bit checksum Cyclic Redundancy Check (CRC) values, plus byte
counts, for both compressed and uncompressed release files.
CRC values are calculated based on the IEEE Std 1003.2-1992 (POSIX 1003.2)
and X/Open CAE specifications. These values are generated by default by
the 'cksum' command on Irix, RedHat Linux, SunOS, Solaris. On Tru64 unix,
the environment variable CMD_ENV needs to be set to xpg4.
File: crc_gz.txt for compressed data files
File: crc.txt for uncompressed data files
Example from crc.txt
1820298576 265044866 est_fun01.dat
This output shows that the checksum of the file est_fun01.dat is
1820298576 and the file contains 265044866 bytes.
1.3 Cross-Reference Information
1.3.1 Cross-references to GeneDB
Cross-references to GeneDB database were added to EMBL records.
GeneDB is a curated database resource mainly for three organisms:
Schizosaccharomyces pombe, Leishmania major and Trypanosoma brucei.
Cross-linking to GeneDB was done via protein_id and corresponding
Swiss-Prot record.
1.3.2 PUBMED/MEDLINE references
PubMed, a service of the National Library of Medicine, provides access
to over 12 million MEDLINE citations. In release 75, 152,598
PUBMED/MEDLINE references have been included into EMBL database entries.
Example (entry AJ269470):
RX PUBMED; 12000641.
1.3.3 Cross-references statistics
Links to external databases allow integration with specialised data
collections, such as protein databases, species-specific databases,
taxonomy databases etc. The WWW-based sequence retrieval system SRS
enables users to easily navigate between cross-referenced database
entries.
EMBL Release 75 includes 32563155 cross-references to related
databases. 2228238 of these are also referring to individual
features e.g. CDS (coding sequences) via the /db_xref feature qualifier
in EMBL entries.
EMBL cross-references to other databases:
| UNILIB | 16795735 |
| RZPD | 12273786 |
| TrEMBL | 1010433 |
| GOA | 862585 |
| GrainGenes | 792048 |
| SWISS-PROT | 234474 |
| MaizeDB | 211206 |
| RemTrEMBL | 123469 |
| ENSEMBL | 74297 |
| IMGT/LIGM | 63782 |
| MGD | 37415 |
| FLYBASE | 22751 |
| MENDEL | 21033 |
| SGD | 10974 |
| GDB | 8430 |
| GENEDB | 6750 |
| TRANSFAC | 6620 |
| IMGT/HLA | 3832 |
| EPD | 3384 |
| Total | 32563004 |
1.4 Sequence Retrieval System (SRS)
EBI's SRS server is available at URL http://srs.ebi.ac.uk
All external services are available via the 'Toolbox' tab
on EBI's Web pages. If you have any comments and/or suggestions
please send these to support@ebi.ac.uk
1.5 EMBL Database FAQ
EMBL Database FAQ are available from the EBI at URL
http://www.ebi.ac.uk/embl/Documentation/FAQ/
1.6 Disclaimer
No guarantee is given and no legal liability or responsibility is assumed
for the completeness and accuracy of the database entries, in particular
the conformity of sequence data in the database with the journal
publication where the sequence is also disclosed.
2 FORTHCOMING CHANGES
2.1 Sequence length limit
Currently sequence length in database entries is limited to 350kb.
The size restriction is going to be to removed completely in 1 years
time (June 2004).
This will allow representation of complete genomic units
(such as complete chromosomes) in one entry.
The practical size limit for sequences in EMBL entries will
be defined by the size of the longest sequenced chromosome.
Introduction of "gap" feature (see below) will allow representation
of sequences with gaps in a single entry.
2.2 New accession number formats
Two new accession number formats will be introduced for
WGS (whole genome shotgun) data entries in June 2004.
WGS data will be included from the June 2004 release onwards.
Formats:
4 letters + 2 digits for assembly number + 6 digits
and
4 letters + 2 digits for assembly number + 7 digits
Examples: AABA01000001 or BBAC010000001
2.3 Molecule type information : update
From 01-JUL-2003 /mol_type will be a mandatory qualifier to the "source"
feature key and will consistently display the in vivo molecule type of
the sequence.
The qualifier will be mandatory in all source features in the next
release (Sep 2003)
List of mol_type values:
"genomic DNA", "genomic RNA", "mRNA" (incl. EST), "tRNA",
"rRNA", "snoRNA", "snRNA", "scRNA", "pre-mRNA",
"other RNA" (incl. synthetic),"other DNA" (incl. synthetic),
"unassigned DNA" (incl. unknown),"unassigned RNA" (incl. unknown)
Molecule type in ID lines:
Starting from the next release (Sep 2003), molecule type information in
the flatfile ID line will corresponding to the value from the
/mol_type qualifier.
Examples:
ID AB016606 standard; circular genomic DNA; ORG; 17407 BP.
ID TRBG361 standard; mRNA; PLN; 1859 BP.
2.4 New feature : gap
At the recent collaborative meeting it was agreed that a new "gap"
feature will be introduced with the next edition of the Feature
Table document (Oct 2003).
The purpose of the feature is to distinguish the ambigious sequence from
gap in the sequence. The format of the feature is
Feature Key gap
Qualifiers /gap_length=number
/gap_type=
Where the length gap_type qualifier value format could be
"estimated" or "arbitrary".
Example:
FT gap 101..200
FT /gap_length=30000
FT /gap_type=estimated
This feature describes a gap of estimated length of 3000 bp,
represented in the sequence itself by a run of "NNN" from position
101 to position 200.
2.5 New qualifier : /ecotype
New qualifier /ecotype is going to be introduced into the next edition of
the Feature Table Document (Oct 2003).
Description of the feature:
Qualifier: /ecotype
Definition: A distinct population of organisms of a widespread
species that has adapted genetically to its own local
habitat. Nevertheless they can still reproduce with
members of other ecotypes of the same species.
Value Format: "text"
Example: /ecotype="Columbia"
Comment: 'Ecotype' is often applied to standard genetic stocks
of Arabidopsis thaliana, but it can be applied to any
organism, especially sessile organisms like plants.
2.6 Line length in the flatfiles
The 80 characters length limit is relaxed for some of the lines in the
flatfile. The lines affected are SQ line and in the future the ID line.
The change is due to the very long sequences that are now being stored
in the database.
2.7 Retrofits : /segment and /variety
The retrofit of old entries to the current standard of annotation for
"source" feature qualifiers /segment and /variety is now finished.
/segment qualifier is used to store the name of viral or phage
segment sequenced. Example:
/segment="6"
/variety is used to store variety (= varietas, a formal Linnaean rank)
of organism from which sequence was derived. Example:
/variety="insularis"
2.8 Electronic resources: modification of RL line
From December 2003, a new publication type will become legal in
EMBL. The new format will be used specifically to
describe a publication in an electronic resource (such as e-journal).
The new format
RL (er) Free text
("RL" followed by three spaces, followed by "(er)" for
electronic resource, followed by space, followed by the
free-text reference).
The format will be used in the case of an electronic journal
where the citation information doesn't have the same format
as for "normal" journals (with journal abbreviation, volume,
issue, page numbers and year).
Many electronic journals in practice follow the same
conventions as for paper publication, and these citations will not
contain the "(er)"
Example:
RL (er) Microbial Ecology DOI: 10.1007/s00248-002-2038-4
3 SEQUENCE SUBMISSION SYSTEM
Information on submission of sequence data to the EMBL Nucleotide
Sequence Database is available at:
http://www.ebi.ac.uk/embl/Submission/
For further information on submission of sequence data to the
EMBL Nucleotide
Sequence Database please contact database staff at:
EMBL Nucleotide Sequence Submissions
e-mail: datasubs@ebi.ac.uk
telephone: +44-1223-494499
telefax: +44-1223-494472
4 CITING THE EMBL NUCLEOTIDE SEQUENCE DATABASE
We encourage authors to include a reference to the EMBL Database in
publications related to their research.
When citing data in the EMBL Database, we suggest to give the
primary accession number and the publication in which the sequence first
appeared. For unpublished data, we suggest to contact the original
submitters for recent publication information or revisions of the data.
We suggest to also provide a reference for the EMBL Database itself. Our
recent publication describing the EMBL database should be cited:
Stoesser G., Baker W., van den Broek A., Garcia-Pastor M., Kanz C.,
Kulikova T., Leinonen R., Lin Q., Lombard V., Lopez R., Mancuso R.,
Nardone F., Stoehr P., Tuli M., Tzouvara K. and Vaughan R. (2003)
"The EMBL Nucleotide Sequence Database: major new developments"
Nucleic Acids Research 31(1): 17-22 (2003).
Example: The numbers in parentheses refer to the reference citation in
the EMBL database entry, and to the EMBL citation above.
"Sequence entry X56734 (1) has been retrieved from the EMBL Database (2)
and showed significant sequence similarity to ..."
(1) Oxtoby, E., et al., Plant Mol. Biol. 17:209-219(1991).
(2) Stoesser G. et al., Nucleic Acids Res 31:17-22(2003).
5 EBI NETWORK SERVICES
5.1 Electronic Mail Server
Copies of database entries and other information could be obtained by
sending commands via email to a server running at EBI. New and updated
EMBL nucleotide sequence entries are made available on the server on a
daily basis.
Send file server commands to the address
netserv@ebi.ac.uk. Each line of the mail message should consist of a
single file server request.
The most important file server request, to get started, is:
HELP
If the file server receives this command, it will return a helpfile to
the sender, explaining in some detail how to use the facility. For
example, to request a copy of the nucleotide sequence with accession
number X55652, use the command:
GET NUC:X55652
The file server offers various other services, (eg., access to nucleotide
and protein sequence data, protein structure data, software), details of
which are provided in the HELP file.
5.2 Anonymous FTP Server
An alternative method of accessing the EBI archives is to use the file
transfer protocol (ftp). Researchers with direct access to the Internet
can use the FTP program on their local machine to connect to the host
FTP.EBI.AC.UK and enter the username "anonymous" and their email address
as password.
The directory pub/help contains detailed information about the data
available from the EBI anonymous FTP server which includes the complete
EMBL Nucleotide Sequence Database releases as well as daily and weekly
updates and a cumulative update file (gzip compressed format) in the
following directories:
EMBL quarterly release: pub/databases/embl/release
EMBL updates: pub/databases/embl/new
5.3 World Wide Web (WWW) Server
The EBI operates a WWW server at URL http://www.ebi.ac.uk/
providing information about the EBI and it's products and services.
Data Retrieval:
Nucleotide sequences can be retrieved by a simple query by
accession number at http://www.ebi.ac.uk/cgi-bin/emblfetch
More complex queries can be constructed using the SRS
databank browser at http://srs.ebi.ac.uk
Data Submission: Nucleotide sequences can be submitted to the
database using the interactive submission system Webin at
http://www.ebi.ac.uk/embl/Submission/webin.html
5.4 Sequence Version Archive
The EMBL Sequence Version Archive (SVA) is a new publicly available
database containing all versions of any entry which has ever appeared
in the EMBL database.
The archive can be accessed programmatically via
dbfetch at http://www.ebi.ac.uk/cgi-bin/dbfetch or interactively via a Web interface at
http://www.ebi.ac.uk/embl/sva/
5.5 Sequence Similarity Search Servers
The EBI offers two network servers for sequence similarity searches via
electronic mail or interactive WWW forms:
FASTA based on W. Pearson's FASTA algorithm. Allows local
similarity searches of protein and nucleotide sequence databases.
Send "help" to fasta@ebi.ac.uk or use URL http://www.ebi.ac.uk/fasta33/
Complete genomes (and proteomes) could be searched at the following URL:
http://www.ebi.ac.uk/fasta33/genomes.html
Alternatively, send "help" command to gpfasta@ebi.ac.uk
BLAST based on the NCBI and WU-BLAST software
Send "help" to blast@ebi.ac.uk or use URL http://www.ebi.ac.uk/blast2/
6 DISTRIBUTION FILES
6.1 Release 75 Files
The release contains the files shown below.
| File Number | File Name | Description |
| 1 | crc.txt | Checksum CRC uncompressed files |
| 2 | crc_gz.txt | Checksum CRC compressed files |
| 3 | deleteac.txt | Deleted accession numbers |
| 4 | embl.con | Constructed Sequences |
| 5 | ftable.txt | Feature Table Documentation |
| 6 | relnotes.txt | Release Notes (this document) |
| 7 | subinfo.txt | Data Submission Documentation |
| 8 | update.txt | Data Update Form |
| 9 | usrman.txt | User Manual |
| 10 | acnumber.ndx | Accession Number Index |
| 11 | citation.ndx | Citation Index |
| 12 | division.ndx | Division Index |
| 13 | keyword.ndx | Keyword Index |
| 14 | shortdir.ndx | Short Directory Index |
| 15 | species.ndx | Species Index |
| 16 | est_fun01.dat | EST Sequences |
| 17 | est_fun02.dat | EST Sequences |
| 18 | est_hum01.dat | EST Sequences |
| 19 | est_hum02.dat | EST Sequences |
| 20 | est_hum03.dat | EST Sequences |
| 21 | est_hum04.dat | EST Sequences |
| 22 | est_hum05.dat | EST Sequences |
| 23 | est_hum06.dat | EST Sequences |
| 24 | est_hum07.dat | EST Sequences |
| 25 | est_hum08.dat | EST Sequences |
| 26 | est_hum09.dat | EST Sequences |
| 27 | est_hum10.dat | EST Sequences |
| 28 | est_hum11.dat | EST Sequences |
| 29 | est_hum12.dat | EST Sequences |
| 30 | est_hum13.dat | EST Sequences |
| 31 | est_hum14.dat | EST Sequences |
| 32 | est_hum15.dat | EST Sequences |
| 33 | est_hum16.dat | EST Sequences |
| 34 | est_hum17.dat | EST Sequences |
| 35 | est_hum18.dat | EST Sequences |
| 36 | est_hum19.dat | EST Sequences |
| 37 | est_hum20.dat | EST Sequences |
| 38 | est_hum21.dat | EST Sequences |
| 39 | est_hum22.dat | EST Sequences |
| 40 | est_hum23.dat | EST Sequences |
| 41 | est_hum24.dat | EST Sequences |
| 42 | est_hum25.dat | EST Sequences |
| 43 | est_hum26.dat | EST Sequences |
| 44 | est_hum27.dat | EST Sequences |
| 45 | est_hum28.dat | EST Sequences |
| 46 | est_hum29.dat | EST Sequences |
| 47 | est_hum30.dat | EST Sequences |
| 48 | est_hum31.dat | EST Sequences |
| 49 | est_hum32.dat | EST Sequences |
| 50 | est_hum33.dat | EST Sequences |
| 51 | est_hum34.dat | EST Sequences |
| 52 | est_hum35.dat | EST Sequences |
| 53 | est_hum36.dat | EST Sequences |
| 54 | est_hum37.dat | EST Sequences |
| 55 | est_hum38.dat | EST Sequences |
| 56 | est_hum39.dat | EST Sequences |
| 57 | est_hum40.dat | EST Sequences |
| 58 | est_hum41.dat | EST Sequences |
| 59 | est_hum42.dat | EST Sequences |
| 60 | est_hum43.dat | EST Sequences |
| 61 | est_hum44.dat | EST Sequences |
| 62 | est_hum45.dat | EST Sequences |
| 63 | est_hum46.dat | EST Sequences |
| 64 | est_hum47.dat | EST Sequences |
| 65 | est_hum48.dat | EST Sequences |
| 66 | est_hum49.dat | EST Sequences |
| 67 | est_hum50.dat | EST Sequences |
| 68 | est_hum51.dat | EST Sequences |
| 69 | est_hum52.dat | EST Sequences |
| 70 | est_hum53.dat | EST Sequences |
| 71 | est_inv01.dat | EST Sequences |
| 72 | est_inv02.dat | EST Sequences |
| 73 | est_inv03.dat | EST Sequences |
| 74 | est_inv04.dat | EST Sequences |
| 75 | est_inv05.dat | EST Sequences |
| 76 | est_inv06.dat | EST Sequences |
| 77 | est_inv07.dat | EST Sequences |
| 78 | est_inv08.dat | EST Sequences |
| 79 | est_inv09.dat | EST Sequences |
| 80 | est_inv10.dat | EST Sequences |
| 81 | est_inv11.dat | EST Sequences |
| 82 | est_inv12.dat | EST Sequences |
| 83 | est_inv13.dat | EST Sequences |
| 84 | est_inv14.dat | EST Sequences |
| 85 | est_inv15.dat | EST Sequences |
| 86 | est_inv16.dat | EST Sequences |
| 87 | est_inv17.dat | EST Sequences |
| 88 | est_inv18.dat | EST Sequences |
| 89 | est_inv19.dat | EST Sequences |
| 90 | est_mam01.dat | EST Sequences |
| 91 | est_mam02.dat | EST Sequences |
| 92 | est_mam03.dat | EST Sequences |
| 93 | est_mam04.dat | EST Sequences |
| 94 | est_mam05.dat | EST Sequences |
| 95 | est_mam06.dat | EST Sequences |
| 96 | est_mus01.dat | EST Sequences |
| 97 | est_mus02.dat | EST Sequences |
| 98 | est_mus03.dat | EST Sequences |
| 99 | est_mus04.dat | EST Sequences |
| 100 | est_mus05.dat | EST Sequences |
| 101 | est_mus06.dat | EST Sequences |
| 102 | est_mus07.dat | EST Sequences |
| 103 | est_mus08.dat | EST Sequences |
| 104 | est_mus09.dat | EST Sequences |
| 105 | est_mus10.dat | EST Sequences |
| 106 | est_mus11.dat | EST Sequences |
| 107 | est_mus12.dat | EST Sequences |
| 108 | est_mus13.dat | EST Sequences |
| 109 | est_mus14.dat | EST Sequences |
| 110 | est_mus15.dat | EST Sequences |
| 111 | est_mus16.dat | EST Sequences |
| 112 | est_mus17.dat | EST Sequences |
| 113 | est_mus18.dat | EST Sequences |
| 114 | est_mus19.dat | EST Sequences |
| 115 | est_mus20.dat | EST Sequences |
| 116 | est_mus21.dat | EST Sequences |
| 117 | est_mus22.dat | EST Sequences |
| 118 | est_mus23.dat | EST Sequences |
| 119 | est_mus24.dat | EST Sequences |
| 120 | est_mus25.dat | EST Sequences |
| 121 | est_mus26.dat | EST Sequences |
| 122 | est_mus27.dat | EST Sequences |
| 123 | est_mus28.dat | EST Sequences |
| 124 | est_mus29.dat | EST Sequences |
| 125 | est_mus30.dat | EST Sequences |
| 126 | est_mus31.dat | EST Sequences |
| 127 | est_mus32.dat | EST Sequences |
| 128 | est_mus33.dat | EST Sequences |
| 129 | est_mus34.dat | EST Sequences |
| 130 | est_mus35.dat | EST Sequences |
| 131 | est_mus36.dat | EST Sequences |
| 132 | est_mus37.dat | EST Sequences |
| 133 | est_mus38.dat | EST Sequences |
| 134 | est_pln01.dat | EST Sequences |
| 135 | est_pln02.dat | EST Sequences |
| 136 | est_pln03.dat | EST Sequences |
| 137 | est_pln04.dat | EST Sequences |
| 138 | est_pln05.dat | EST Sequences |
| 139 | est_pln06.dat | EST Sequences |
| 140 | est_pln07.dat | EST Sequences |
| 141 | est_pln08.dat | EST Sequences |
| 142 | est_pln09.dat | EST Sequences |
| 143 | est_pln10.dat | EST Sequences |
| 144 | est_pln11.dat | EST Sequences |
| 145 | est_pln12.dat | EST Sequences |
| 146 | est_pln13.dat | EST Sequences |
| 147 | est_pln14.dat | EST Sequences |
| 148 | est_pln15.dat | EST Sequences |
| 149 | est_pln16.dat | EST Sequences |
| 150 | est_pln17.dat | EST Sequences |
| 151 | est_pln18.dat | EST Sequences |
| 152 | est_pln19.dat | EST Sequences |
| 153 | est_pln20.dat | EST Sequences |
| 154 | est_pln21.dat | EST Sequences |
| 155 | est_pln22.dat | EST Sequences |
| 156 | est_pln23.dat | EST Sequences |
| 157 | est_pln24.dat | EST Sequences |
| 158 | est_pln25.dat | EST Sequences |
| 159 | est_pln26.dat | EST Sequences |
| 160 | est_pln27.dat | EST Sequences |
| 161 | est_pln28.dat | EST Sequences |
| 162 | est_pln29.dat | EST Sequences |
| 163 | est_pln30.dat | EST Sequences |
| 164 | est_pln31.dat | EST Sequences |
| 165 | est_pln32.dat | EST Sequences |
| 166 | est_pln33.dat | EST Sequences |
| 167 | est_pln34.dat | EST Sequences |
| 168 | est_pro.dat | EST Sequences |
| 169 | est_rod01.dat | EST Sequences |
| 170 | est_rod02.dat | EST Sequences |
| 171 | est_rod03.dat | EST Sequences |
| 172 | est_rod04.dat | EST Sequences |
| 173 | est_rod05.dat | EST Sequences |
| 174 | est_rod06.dat | EST Sequences |
| 175 | est_vrt01.dat | EST Sequences |
| 176 | est_vrt02.dat | EST Sequences |
| 177 | est_vrt03.dat | EST Sequences |
| 178 | est_vrt04.dat | EST Sequences |
| 179 | est_vrt05.dat | EST Sequences |
| 180 | est_vrt06.dat | EST Sequences |
| 181 | est_vrt07.dat | EST Sequences |
| 182 | est_vrt08.dat | EST Sequences |
| 183 | est_vrt09.dat | EST Sequences |
| 184 | est_vrt10.dat | EST Sequences |
| 185 | est_vrt11.dat | EST Sequences |
| 186 | est_vrt12.dat | EST Sequences |
| 187 | est_vrt13.dat | EST Sequences |
| 188 | est_vrt14.dat | EST Sequences |
| 189 | est_vrt15.dat | EST Sequences |
| 190 | est_vrt16.dat | EST Sequences |
| 191 | fun.dat | Fungi Sequences |
| 192 | gss_fun.dat | Genome Survey Sequences |
| 193 | gss_hum01.dat | Genome Survey Sequences |
| 194 | gss_hum02.dat | Genome Survey Sequences |
| 195 | gss_hum03.dat | Genome Survey Sequences |
| 196 | gss_hum04.dat | Genome Survey Sequences |
| 197 | gss_hum05.dat | Genome Survey Sequences |
| 198 | gss_hum06.dat | Genome Survey Sequences |
| 199 | gss_hum07.dat | Genome Survey Sequences |
| 200 | gss_hum08.dat | Genome Survey Sequences |
| 201 | gss_hum09.dat | Genome Survey Sequences |
| 202 | gss_inv01.dat | Genome Survey Sequences |
| 203 | gss_inv02.dat | Genome Survey Sequences |
| 204 | gss_inv03.dat | Genome Survey Sequences |
| 205 | gss_inv04.dat | Genome Survey Sequences |
| 206 | gss_inv05.dat | Genome Survey Sequences |
| 207 | gss_inv06.dat | Genome Survey Sequences |
| 208 | gss_inv07.dat | Genome Survey Sequences |
| 209 | gss_mam01.dat | Genome Survey Sequences |
| 210 | gss_mam02.dat | Genome Survey Sequences |
| 211 | gss_mam03.dat | Genome Survey Sequences |
| 212 | gss_mus01.dat | Genome Survey Sequences |
| 213 | gss_mus02.dat | Genome Survey Sequences |
| 214 | gss_mus03.dat | Genome Survey Sequences |
| 215 | gss_mus04.dat | Genome Survey Sequences |
| 216 | gss_mus05.dat | Genome Survey Sequences |
| 217 | gss_mus06.dat | Genome Survey Sequences |
| 218 | gss_mus07.dat | Genome Survey Sequences |
| 219 | gss_mus08.dat | Genome Survey Sequences |
| 220 | gss_mus09.dat | Genome Survey Sequences |
| 221 | gss_mus10.dat | Genome Survey Sequences |
| 222 | gss_phg.dat | Genome Survey Sequences |
| 223 | gss_pln01.dat | Genome Survey Sequences |
| 224 | gss_pln02.dat | Genome Survey Sequences |
| 225 | gss_pln03.dat | Genome Survey Sequences |
| 226 | gss_pln04.dat | Genome Survey Sequences |
| 227 | gss_pln05.dat | Genome Survey Sequences |
| 228 | gss_pln06.dat | Genome Survey Sequences |
| 229 | gss_pln07.dat | Genome Survey Sequences |
| 230 | gss_pln08.dat | Genome Survey Sequences |
| 231 | gss_pln09.dat | Genome Survey Sequences |
| 232 | gss_pln10.dat | Genome Survey Sequences |
| 233 | gss_pln11.dat | Genome Survey Sequences |
| 234 | gss_pln12.dat | Genome Survey Sequences |
| 235 | gss_pln13.dat | Genome Survey Sequences |
| 236 | gss_pln14.dat | Genome Survey Sequences |
| 237 | gss_pln15.dat | Genome Survey Sequences |
| 238 | gss_pln16.dat | Genome Survey Sequences |
| 239 | gss_pln17.dat | Genome Survey Sequences |
| 240 | gss_pro.dat | Genome Survey Sequences |
| 241 | gss_rod01.dat | Genome Survey Sequences |
| 242 | gss_rod02.dat | Genome Survey Sequences |
| 243 | gss_rod03.dat | Genome Survey Sequences |
| 244 | gss_rod04.dat | Genome Survey Sequences |
| 245 | gss_vrl.dat | Genome Survey Sequences |
| 246 | gss_vrt01.dat | Genome Survey Sequences |
| 247 | gss_vrt02.dat | Genome Survey Sequences |
| 248 | gss_vrt03.dat | Genome Survey Sequences |
| 249 | gss_vrt04.dat | Genome Survey Sequences |
| 250 | gss_vrt05.dat | Genome Survey Sequences |
| 251 | gss_vrt06.dat | Genome Survey Sequences |
| 252 | htc.dat | High throughput cDNAs |
| 253 | htg_hum01.dat | High Throughput Genome Sequences |
| 254 | htg_hum02.dat | High Throughput Genome Sequences |
| 255 | htg_hum03.dat | High Throughput Genome Sequences |
| 256 | htg_hum04.dat | High Throughput Genome Sequences |
| 257 | htg_hum05.dat | High Throughput Genome Sequences |
| 258 | htg_inv01.dat | High Throughput Genome Sequences |
| 259 | htg_inv02.dat | High Throughput Genome Sequences |
| 260 | htg_mam01.dat | High Throughput Genome Sequences |
| 261 | htg_mam02.dat | High Throughput Genome Sequences |
| 262 | htg_mus01.dat | High Throughput Genome Sequences |
| 263 | htg_mus02.dat | High Throughput Genome Sequences |
| 264 | htg_mus03.dat | High Throughput Genome Sequences |
| 265 | htg_mus04.dat | High Throughput Genome Sequences |
| 266 | htg_mus05.dat | High Throughput Genome Sequences |
| 267 | htg_other.dat | High Throughput Genome Sequences |
| 268 | htg_pln.dat | High Throughput Genome Sequences |
| 269 | htg_rod01.dat | High Throughput Genome Sequences |
| 270 | htg_rod02.dat | High Throughput Genome Sequences |
| 271 | htg_rod03.dat | High Throughput Genome Sequences |
| 272 | htg_rod04.dat | High Throughput Genome Sequences |
| 273 | htg_rod05.dat | High Throughput Genome Sequences |
| 274 | htg_rod06.dat | High Throughput Genome Sequences |
| 275 | htg_rod07.dat | High Throughput Genome Sequences |
| 276 | htg_rod08.dat | High Throughput Genome Sequences |
| 277 | htg_vrt.dat | High Throughput Genome Sequences |
| 278 | htgo_hum.dat | High Throughput Genome Sequences phase 0 |
| 279 | htgo_mus.dat | High Throughput Genome Sequences phase 0 |
| 280 | htgo_other.dat | High Throughput Genome Sequences phase 0 |
| 281 | hum01.dat | Human Sequences |
| 282 | hum02.dat | Human Sequences |
| 283 | hum03.dat | Human Sequences |
| 284 | hum04.dat | Human Sequences |
| 285 | hum05.dat | Human Sequences |
| 286 | hum06.dat | Human Sequences |
| 287 | hum07.dat | Human Sequences |
| 288 | hum08.dat | Human Sequences |
| 289 | hum09.dat | Human Sequences |
| 290 | hum10.dat | Human Sequences |
| 291 | hum11.dat | Human Sequences |
| 292 | hum12.dat | Human Sequences |
| 293 | hum13.dat | Human Sequences |
| 294 | hum14.dat | Human Sequences |
| 295 | hum15.dat | Human Sequences |
| 296 | hum16.dat | Human Sequences |
| 297 | hum17.dat | Human Sequences |
| 298 | hum18.dat | Human Sequences |
| 299 | hum19.dat | Human Sequences |
| 300 | hum20.dat | Human Sequences |
| 301 | hum21.dat | Human Sequences |
| 302 | hum22.dat | Human Sequences |
| 303 | hum23.dat | Human Sequences |
| 304 | hum24.dat | Human Sequences |
| 305 | hum25.dat | Human Sequences |
| 306 | inv01.dat | Invertebrate Sequences |
| 307 | inv02.dat | Invertebrate Sequences |
| 308 | inv03.dat | Invertebrate Sequences |
| 309 | mam.dat | Other Mammal Sequences |
| 310 | mus.dat | Mus musculus Sequences |
| 311 | org.dat | Organelle Sequences |
| 312 | pat01.dat | Patent Sequences |
| 313 | pat02.dat | Patent Sequences |
| 314 | pat03.dat | Patent Sequences |
| 315 | pat04.dat | Patent Sequences |
| 316 | pat05.dat | Patent Sequences |
| 317 | pat06.dat | Patent Sequences |
| 318 | phg.dat | Bacteriophage Sequences |
| 319 | pln.dat | Plant Sequences |
| 320 | pro01.dat | Prokaryote Sequences |
| 321 | pro02.dat | Prokaryote Sequences |
| 322 | pro03.dat | Prokaryote Sequences |
| 323 | pro04.dat | Prokaryote Sequences |
| 324 | rod.dat | Rodent Sequences |
| 325 | sts.dat | STS Sequences |
| 326 | syn.dat | Synthetic Sequences |
| 327 | unc.dat | Unclassified Sequences |
| 328 | vrl.dat | Viral Sequences |
| 329 | vrt.dat | Other Vertebrate Sequences |
APPENDIX A
DATABASE GROWTH TABLE
The following table shows the growth of the EMBL Nucleotide Sequence Database
at each release.
| Release | Month | Entries | Nucleotides |
| 1 | 06/1982 | 568 | 585433 |
| 2 | 04/1983 | 811 | 1114447 |
| 3 | 12/1983 | 1481 | 1654863 |
| 4 | 08/1984 | 1698 | 2147205 |
| 5 | 04/1985 | 2378 | 2874493 |
| 6 | 08/1985 | 4835 | 4567592 |
| 7 | 12/1985 | 5789 | 5622638 |
| 8 | 04/1986 | 6395 | 6353040 |
| 9 | 09/1986 | 7630 | 7813214 |
| 10 | 12/1986 | 8817 | 9766948 |
| 11 | 04/1987 | 11621 | 12189783 |
| 12 | 07/1987 | 12706 | 13638061 |
| 13 | 10/1987 | 14397 | 16023478 |
| 14 | 01/1988 | 15344 | 17272160 |
| 15 | 05/1988 | 17961 | 20318442 |
| 16 | 08/1988 | 19592 | 22625941 |
| 17 | 11/1988 | 20695 | 24211054 |
| 18 | 02/1989 | 22938 | 27249830 |
| 19 | 05/1989 | 24365 | 29066676 |
| 20 | 08/1989 | 26223 | 31240948 |
| 21 | 11/1989 | 28679 | 34748087 |
| 22 | 02/1990 | 31508 | 38165786 |
| 23 | 05/1990 | 34902 | 42923803 |
| 24 | 08/1990 | 37784 | 47354438 |
| 25 | 11/1990 | 41580 | 52900354 |
| 26 | 02/1991 | 43745 | 55859549 |
| 27 | 05/1991 | 46871 | 59915244 |
| 28 | 09/1991 | 54558 | 70448052 |
| 29 | 12/1991 | 57655 | 75400487 |
| 30 | 03/1992 | 63378 | 83574342 |
| 31 | 06/1992 | 72481 | 94390065 |
| 32 | 09/1992 | 79377 | 101292310 |
| 33 | 12/1992 | 89100 | 111413979 |
| 34 | 03/1993 | 99591 | 121420828 |
| 35 | 06/1993 | 108973 | 131880111 |
| 36 | 09/1993 | 127933 | 145401156 |
| 37 | 12/1993 | 146576 | 158171400 |
| 38 | 03/1994 | 167777 | 177550115 |
| 39 | 06/1994 | 182615 | 192195819 |
| 40 | 09/1994 | 209352 | 211017104 |
| 41 | 12/1994 | 230950 | 226259607 |
| 42 | 03/1995 | 303206 | 262559786 |
| 43 | 06/1995 | 420111 | 315840053 |
| 44 | 09/1995 | 506190 | 363273777 |
| 45 | 12/1995 | 622566 | 427620278 |
| 46 | 03/1996 | 701246 | 473691480 |
| 47 | 06/1996 | 827174 | 550739395 |
| 48 | 09/1996 | 928067 | 608931850 |
| 49 | 12/1996 | 1047263 | 696183789 |
| 50 | 03/1997 | 1187455 | 789755858 |
| 51 | 06/1997 | 1432941 | 931351601 |
| 52 | 10/1997 | 1787004 | 1181167498 |
| 53 | 12/1997 | 1917868 | 1281391651 |
| 54 | 03/1998 | 2125225 | 1427634373 |
| 55 | 06/1998 | 2330040 | 1607673907 |
| 56 | 09/1998 | 2689618 | 1904091473 |
| 57 | 12/1998 | 3046471 | 2164718256 |
| 58 | 03/1999 | 3272064 | 2355200790 |
| 59 | 06/1999 | 3952878 | 2924568545 |
| 60 | 09/1999 | 4719266 | 3543553093 |
| 61 | 12/1999 | 5303436 | 4508169737 |
| 62 | 03/2000 | 5865742 | 6120908677 |
| 63 | 06/2000 | 6760113 | 8255674441 |
| 64 | 09/2000 | 8344436 | 9650223037 |
| 65 | 12/2000 | 9549382 | 10710321435 |
| 66 | 03/2001 | 11169673 | 11916112872 |
| 67 | 06/2001 | 12044420 | 12821742622 |
| 68 | 09/2001 | 12964797 | 13727100206 |
| 69 | 12/2001 | 14366182 | 15383451165 |
| 70 | 03/2002 | 15851373 | 17807926047 |
| 71 | 06/2002 | 17226422 | 20020556107 |
| 72 | 09/2002 | 18324246 | 23090186146 |
| 73 | 12/2002 | 20857746 | 27903283528 |
| 74 | 03/2003 | 23234788 | 30356786718 |
| 75 | 06/2003 | 25214767 | 32195012823 |