EMBL Nucleotide Sequence Database Release Notes Release 75 June 2003

EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD United Kingdom Telephone: +44-1223-494400 Telefax : +44-1223-494468 Email: datalib@ebi.ac.uk URL: http://www.ebi.ac.uk CONTENTS 1 RELEASE 75 1.1 Feature Table Definition Document v5 1.2 Database Files 1.2.1 Naming Conventions 1.2.2 EST Database Files 1.2.3 GSS Database Files 1.2.4 INV Database Files 1.2.5 HUM Database Files 1.2.6 HTG Database Files 1.2.7 PAT Database Files 1.2.8 CON Database Files 1.2.9 CRC Values for Distributed Files 1.3 Cross-Reference Information 1.3.1 Cross-references to GeneDB 1.3.2 PUBMED/MEDLINE references 1.3.3 Cross-references statistics 1.4 Sequence Retrieval System (SRS) 1.5 EMBL Database FAQ 1.6 Disclaimer 2 FORTHCOMING CHANGES 2.1 Sequence length limit : update 2.2 New accession number formats 2.3 Molecule type information : update 2.4 New feature : gap 2.5 New qualifier : /ecotype 2.6 Line length in the flatfiles 2.7 Retrofits : /segment and /variety 2.8 Electronic resources: modification of RL line 3 SEQUENCE SUBMISSION SYSTEMS 4 CITING THE EMBL NUCLEOTIDE SEQUENCE DATABASE 5 EBI NETWORK SERVICES 5.1 Electronic Mail Server 5.2 Anonymous FTP Server 5.3 World Wide Web (WWW) Server 5.4 Sequence Version Archive 5.5 Sequence Similarity Search Servers 6 DISTRIBUTION FILES 6.1 Release 75 Files APPENDIX A : DATABASE GROWTH TABLE 1 RELEASE 75 The EMBL Nucleotide Sequence Database was frozen to make Release 75 on 30-MAY-2003. The release contains 25,214,767 sequence entries comprising 32,195,012,823 nucleotides. This represents an increase of about 10% over Release 74.

A breakdown of Release 75 by division is shown below:
Division Entries Nucleotides
Constructed 184 838,699,179 (see Note)
ESTs 16,928,340 8,595,205,394
Fungi 77,835 121,563,447
GSSs 5,360,409 3,158,737,181
HTC 151,653 199,532,408
HTG 68,836 11,785,274,832
Human 248,193 3,969,969,766
Invertebrates 121,828 609,642,910
Other Mammals 48,751 103,252,725
Mus musculus 74,469 1,135,155,874
Organelles 204,489 168,241,510
Patents 1,122,385 616,778,214
Bacteriophage 2,332 8,445,422
Plants 176,501 568,303,125
Prokaryotes 190,734 638,372,821
Rodents 24,936 45,228,463
STSs 165,775 69,132,571
Synthetic 8,750 16,009,034
Unclassified 1,485 2,124,638
Viruses 188,213 167,797,523
Other Vertebrates 48,669 216,244,965
Total 25,214,767 32,195,012,823

EMBL database statistics are available at 
URL: http://www.ebi.ac.uk/embl/Services/DBStats/ 

Note: The nucleotide count for CON(structed) entries is included 
in the table, but not in the total. Starting from 
this release, the nucleotide count for CON entries will not 
be included in the sum, because CON entries don't include 
sequence as such, just assembly information for the segments.
The nucleotide count for the segments of a CON entry is in the 
taxonomic divisions and already included into the total.
 
1.1 Feature Table Definition Document v5     

The last version of the Feature Table Definition Document (FTv5) has been 
implemented on 15-DEC-2002. The document is available from the EBI 
servers at: 

http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/

Next edition of the Feature table document will become available in 
Octover 2003

1.2 Database Files

In order to keep the size of the data files within reasonable limits for 
handling purposes, additional division files will be added in subsequent
releases as appropriate.


1.2.1 Naming Conventions

When a division is split into several files, these are named so that 
they sort sequentially, e.g. est_hum01.dat, est_hum02.dat,......,
est_hum22.dat, est_hum23.dat etc 


1.2.2 EST Database Files

ESTs (single pass cDNA reads) constitute a major source of sequence 
records. 

EST files are split according to taxonomic subdivisions following 
the model of the taxonomic split of all other EMBL database divisions, 
for example files est_fun01.dat - est_fun02.dat contains fungi EST data.
For the full list of distribution files see table in section 6.1
 
1.2.3 GSS Database Files

Genome Survey Sequences (GSS) are of similar nature to EST data, except 
that sequences are genomic rather than cDNA (mRNA). The GSS division 
contains e.g. random `single pass read' genome survey sequences, single 
pass reads from cosmid/BAC/YAC ends, exon trapped genomic sequences and 
Alu PCR sequences.

GSS division files are also split according to taxonomic subdivisions.
Example: gss_hum01.dat - gss_hum09.dat contain human GSS data
For the full list of distribution files see table in section 6.1
 
1.2.4 INV Database Files

The INV division has been split into 3 files (inv01.dat - inv03.dat).
 
1.2.5 HUM Database Files 

The HUM division has been split into 25 files (hum01.dat-hum25.dat).


1.2.6 HTG Database Files

'Unfinished' DNA sequences generated by the high-throughput sequencing 
centres are represented in the HTG division and are rapidly made 
available
to the scientific community for homology searches.  Entries in this 
division all contain keywords to indicate the status of the sequencing 
(e.g., HTGS_PHASE1). A single accession number is assigned to one clone,
and as sequencing progresses and the entry passes from one phase to 
another, it will retain the same accession number.  Once 'finished', HTG 
sequences are moved into the appropriate primary EMBL taxonomic division.

HTGS_PHASE0 entries typically consist of one-to-few pass reads of a single
clone, have not been assembled into contigs and are unoriented, unordered,
unannotated and contain gaps of unknown length.
Low-pass sequence sampling is useful for identifying clones that may be 
gene-rich. Phase0 sequences are used to check whether another centre is 
already sequencing this clone. If not, it will be sequenced through phase
1 and phase 2. When records are updated, the accession numbers will be 
preserved.

HTG division files are split according to taxonomic subdivisions. 
Example: htg_hum01.dat - htg_hum05.dat - Human HTG data
For the full list of distribution files see table in section 6.1

1.2.7 PAT Database Files 

PAT files include sequence data incorporated from the European patent 
literature (EPO) and complemented by American and Japanese patent 
data integrated from NCBI(USA)and DDBJ(Japan). 
The Patent division has been split into 6 files (pat01.dat - pat06.dat)

1.2.8 CON Database File

CON files include construct information for building contig sequences 
of chromosomes, genomes and other long DNA sequences. CON entries in 
file 'embl.con' do not contain sequence data per se. CON division
data is included in file embl.con 

1.2.9 CRC values for distributed files

To help users verify the integrity of release data files, we supply files
containing 32-bit checksum Cyclic Redundancy Check (CRC) values, plus byte
counts, for both compressed and uncompressed release files.

CRC values are calculated based on the IEEE Std 1003.2-1992 (POSIX 1003.2)
and X/Open CAE specifications. These values are generated by default by 
the 'cksum' command on Irix, RedHat Linux, SunOS, Solaris. On Tru64 unix, 
the environment variable CMD_ENV needs to be set to xpg4.

File: crc_gz.txt      for compressed data files 
File: crc.txt         for uncompressed data files 

Example from crc.txt 
1820298576 265044866 est_fun01.dat

This output shows that the checksum of the file est_fun01.dat is 
1820298576 and the file contains 265044866 bytes.  


1.3 Cross-Reference Information

1.3.1 Cross-references to GeneDB

Cross-references to GeneDB database were added to EMBL records.
GeneDB  is a curated database resource mainly for three organisms: 
Schizosaccharomyces pombe, Leishmania major and Trypanosoma brucei.
Cross-linking to GeneDB was done via protein_id and corresponding 
Swiss-Prot record. 

1.3.2 PUBMED/MEDLINE references 

PubMed, a service of the National Library of Medicine, provides access 
to over 12 million MEDLINE citations. In release 75, 152,598 
PUBMED/MEDLINE references have been included into EMBL database entries.
Example (entry AJ269470): 
RX   PUBMED; 12000641.


1.3.3 Cross-references statistics

Links to external databases allow integration with specialised data 
collections, such as protein databases, species-specific databases, 
taxonomy databases etc. The WWW-based sequence retrieval system SRS 
enables users to easily navigate between cross-referenced database 
entries.

EMBL Release 75 includes 32563155 cross-references to related 
databases. 2228238 of these are also referring to individual 
features e.g. CDS (coding sequences) via the /db_xref feature qualifier 
in EMBL entries. 

EMBL cross-references to other databases:
UNILIB 16795735
RZPD 12273786
TrEMBL 1010433
GOA 862585
GrainGenes 792048
SWISS-PROT 234474
MaizeDB 211206
RemTrEMBL 123469
ENSEMBL 74297
IMGT/LIGM 63782
MGD 37415
FLYBASE 22751
MENDEL 21033
SGD 10974
GDB 8430
GENEDB 6750
TRANSFAC 6620
IMGT/HLA 3832
EPD 3384
Total 32563004
 
1.4 Sequence Retrieval System (SRS) 

EBI's SRS server is available at URL http://srs.ebi.ac.uk 
All external services are available via the 'Toolbox' tab  
on EBI's Web pages. If you have any comments and/or suggestions 
please send these to support@ebi.ac.uk


1.5 EMBL Database FAQ

EMBL Database FAQ are available from the EBI at URL 
http://www.ebi.ac.uk/embl/Documentation/FAQ/

1.6 Disclaimer

No guarantee is given and no legal liability or responsibility is assumed
for the completeness and accuracy of the database entries, in particular
the conformity of sequence data in the database with the journal 
publication where the sequence is also disclosed.
 
2 FORTHCOMING CHANGES

2.1 Sequence length limit

Currently sequence length in database entries is limited to 350kb. 
The size restriction is going to be to removed completely in 1 years
time (June 2004).
This will allow representation of complete genomic units 
(such as complete chromosomes) in one entry. 
The practical size limit for sequences in EMBL entries will 
be defined by the size of the longest sequenced chromosome. 
Introduction of "gap" feature (see below) will allow representation
of sequences with gaps in a single entry.  

2.2 New accession number formats  

Two new accession number formats will be introduced for 
WGS (whole genome shotgun) data entries in June 2004. 
WGS data will be included from the June 2004 release onwards.

Formats: 
4 letters + 2 digits for assembly number + 6 digits 
and 
4 letters + 2 digits for assembly number + 7 digits 

Examples: AABA01000001 or BBAC010000001

2.3 Molecule type information : update

From 01-JUL-2003 /mol_type will be a mandatory qualifier to the "source" 
feature key and will consistently display the in vivo molecule type of 
the sequence.
The qualifier will be mandatory in all source features in the next 
release (Sep 2003)
List of mol_type values:
"genomic DNA", "genomic RNA", "mRNA" (incl. EST), "tRNA",
"rRNA", "snoRNA", "snRNA", "scRNA", "pre-mRNA", 
"other RNA" (incl. synthetic),"other DNA" (incl. synthetic), 
"unassigned DNA" (incl. unknown),"unassigned RNA" (incl. unknown)

Molecule type in ID lines:
Starting from the next release (Sep 2003), molecule type information in
the flatfile ID line will corresponding to the value from the
/mol_type qualifier. 
Examples: 
ID   AB016606   standard; circular genomic DNA; ORG; 17407 BP.
ID   TRBG361    standard; mRNA; PLN; 1859 BP.
   
2.4 New feature : gap

At the recent collaborative meeting it was agreed that a new "gap" 
feature will be introduced with the next edition of the Feature 
Table document (Oct 2003).
The purpose of the feature is to distinguish the ambigious sequence from 
gap in the sequence. The format of the feature is

Feature Key         gap

Qualifiers          /gap_length=number
		    /gap_type=

Where the length gap_type qualifier value format could be 
"estimated" or "arbitrary".

Example: 

FT   gap             101..200
FT                   /gap_length=30000
FT                   /gap_type=estimated

This feature describes a gap of estimated length of 3000 bp, 
represented in the sequence itself by a run of "NNN" from position 
101 to position 200. 

2.5 New qualifier : /ecotype
New qualifier /ecotype is going to be introduced into the next edition of
the Feature Table Document (Oct 2003). 

Description of the feature: 
Qualifier:	/ecotype
Definition:	A distinct population of organisms of a widespread 
                species that has adapted genetically to its own local
                habitat. Nevertheless they can still reproduce with 
                members of other ecotypes of the same species.
Value Format:	"text"
Example:	/ecotype="Columbia"
Comment: 	'Ecotype' is often applied to standard genetic stocks 
                of Arabidopsis thaliana, but it can be applied to any 
                organism, especially sessile organisms like plants.

2.6 Line length in the flatfiles

The 80 characters length limit is relaxed for some of the lines in the 
flatfile. The lines affected are SQ line and in the future the ID line.
The change is due to the very long sequences that are now being stored 
in the database. 

2.7 Retrofits : /segment and /variety

The retrofit of old entries to the current standard of annotation for  
"source" feature qualifiers /segment and /variety is now finished.  
/segment qualifier is used to store the name of viral or phage 
segment sequenced. Example: 
/segment="6"
/variety is used to store variety (= varietas, a formal Linnaean rank) 
of organism from which sequence was derived. Example: 
/variety="insularis"

2.8 Electronic resources: modification of RL line

From December 2003, a new publication type will become legal in 
EMBL. The new format will be used specifically to 
describe a publication in an electronic resource (such as e-journal). 

The new format 
RL   (er) Free text

("RL" followed by three spaces, followed by "(er)" for 
electronic resource, followed by space, followed by the
free-text reference). 

The format will be used in the case of an electronic journal 
where the citation information doesn't have the same format 
as for "normal" journals (with journal abbreviation, volume, 
issue, page numbers and year). 
Many electronic journals in practice follow the same
conventions as for paper publication, and these citations will not 
contain the "(er)"

Example: 
RL   (er) Microbial Ecology DOI: 10.1007/s00248-002-2038-4

 
3 SEQUENCE SUBMISSION SYSTEM

Information on submission of sequence data to the EMBL Nucleotide 
Sequence Database is available at:
http://www.ebi.ac.uk/embl/Submission/

For further information on submission of sequence data to the 
EMBL Nucleotide 
Sequence Database please contact database staff at:

EMBL Nucleotide Sequence Submissions
e-mail: datasubs@ebi.ac.uk
telephone: +44-1223-494499
telefax: +44-1223-494472


4 CITING THE EMBL NUCLEOTIDE SEQUENCE DATABASE

We encourage authors to include a reference to the EMBL Database in 
publications related to their research.

When citing data in the EMBL Database, we suggest to give the
primary accession number and the publication in which the sequence first 
appeared. For unpublished data, we suggest to contact the original 
submitters for recent publication information or revisions of the data.

We suggest to also provide a reference for the EMBL Database itself. Our 
recent publication describing the EMBL database should be cited:

Stoesser G., Baker W., van den Broek A., Garcia-Pastor M., Kanz C., 
Kulikova T., Leinonen R., Lin Q., Lombard V., Lopez R., Mancuso R., 
Nardone F., Stoehr P., Tuli M., Tzouvara K. and Vaughan R. (2003) 
"The EMBL Nucleotide Sequence Database: major new developments"
Nucleic Acids Research 31(1): 17-22 (2003).

Example: The numbers in parentheses refer to the reference citation in 
the EMBL database entry, and to the EMBL citation above.

"Sequence entry X56734 (1) has been retrieved from the EMBL Database (2) 
and showed significant sequence similarity to ..."

(1) Oxtoby, E., et al., Plant Mol. Biol. 17:209-219(1991).
(2) Stoesser G. et al., Nucleic Acids Res 31:17-22(2003).
 

5 EBI NETWORK SERVICES

5.1 Electronic Mail Server

Copies of database entries and other information could be obtained by 
sending commands via email to a server running at EBI. New and updated 
EMBL nucleotide sequence entries are made available on the server on a 
daily basis.

Send file server commands to the address 
netserv@ebi.ac.uk. Each line of the mail message should consist of a 
single file server request.

The most important file server request, to get started, is:

HELP

If the file server receives this command, it will return a helpfile to 
the sender, explaining in some detail how to use the facility. For 
example, to request a copy of the nucleotide sequence with accession 
number X55652, use the command:

GET NUC:X55652

The file server offers various other services, (eg., access to nucleotide 
and protein sequence data, protein structure data, software), details of 
which are provided in the HELP file.


5.2 Anonymous FTP Server

An alternative method of accessing the EBI archives is to use the file 
transfer protocol (ftp). Researchers with direct access to the Internet 
can use the FTP program on their local machine to connect to the host 
FTP.EBI.AC.UK and enter the username "anonymous" and their email address 
as password. 
The directory pub/help contains detailed information about the data 
available from the EBI anonymous FTP server which includes the complete 
EMBL Nucleotide Sequence Database releases as well as daily and weekly 
updates and a cumulative update file (gzip compressed format) in the 
following directories:

EMBL quarterly release:   pub/databases/embl/release
EMBL updates:             pub/databases/embl/new


5.3 World Wide Web (WWW) Server

The EBI operates a WWW server at URL http://www.ebi.ac.uk/ 
providing information about the EBI and it's products and services. 

Data Retrieval: 

Nucleotide sequences can be retrieved by a simple query by 
accession number at http://www.ebi.ac.uk/cgi-bin/emblfetch

More  complex queries can be constructed using the SRS 
databank browser at http://srs.ebi.ac.uk 

Data Submission: Nucleotide sequences can be submitted to the 
database using the interactive submission system Webin at
http://www.ebi.ac.uk/embl/Submission/webin.html 


5.4 Sequence Version Archive 

The EMBL Sequence Version Archive (SVA) is a new publicly available 
database containing all versions of any entry which has ever appeared
in the EMBL database.

The archive can be accessed programmatically via 
dbfetch at http://www.ebi.ac.uk/cgi-bin/dbfetch or interactively via a Web interface at 
http://www.ebi.ac.uk/embl/sva/

5.5 Sequence Similarity Search Servers

The EBI offers two network servers for sequence similarity searches via 
electronic mail or interactive WWW forms:
      
FASTA based on W. Pearson's FASTA algorithm. Allows local 
similarity searches of protein and nucleotide sequence databases. 
Send "help" to fasta@ebi.ac.uk or use URL http://www.ebi.ac.uk/fasta33/      
Complete genomes (and proteomes) could be searched at the following URL: 
http://www.ebi.ac.uk/fasta33/genomes.html 
Alternatively, send "help" command to gpfasta@ebi.ac.uk

BLAST based on the NCBI and WU-BLAST software 
Send "help" to blast@ebi.ac.uk or use URL http://www.ebi.ac.uk/blast2/ 
         
6 DISTRIBUTION FILES

6.1 Release 75 Files

The release contains the files shown below.
File NumberFile NameDescription
1crc.txtChecksum CRC uncompressed files
2crc_gz.txtChecksum CRC compressed files
3deleteac.txtDeleted accession numbers
4embl.conConstructed Sequences
5ftable.txtFeature Table Documentation
6relnotes.txtRelease Notes (this document)
7subinfo.txtData Submission Documentation
8update.txtData Update Form
9usrman.txtUser Manual
10acnumber.ndxAccession Number Index
11citation.ndxCitation Index
12division.ndxDivision Index
13keyword.ndxKeyword Index
14shortdir.ndxShort Directory Index
15species.ndxSpecies Index
16est_fun01.datEST Sequences
17est_fun02.datEST Sequences
18est_hum01.datEST Sequences
19est_hum02.datEST Sequences
20est_hum03.datEST Sequences
21est_hum04.datEST Sequences
22est_hum05.datEST Sequences
23est_hum06.datEST Sequences
24est_hum07.datEST Sequences
25est_hum08.datEST Sequences
26est_hum09.datEST Sequences
27est_hum10.datEST Sequences
28est_hum11.datEST Sequences
29est_hum12.datEST Sequences
30est_hum13.datEST Sequences
31est_hum14.datEST Sequences
32est_hum15.datEST Sequences
33est_hum16.datEST Sequences
34est_hum17.datEST Sequences
35est_hum18.datEST Sequences
36est_hum19.datEST Sequences
37est_hum20.datEST Sequences
38est_hum21.datEST Sequences
39est_hum22.datEST Sequences
40est_hum23.datEST Sequences
41est_hum24.datEST Sequences
42est_hum25.datEST Sequences
43est_hum26.datEST Sequences
44est_hum27.datEST Sequences
45est_hum28.datEST Sequences
46est_hum29.datEST Sequences
47est_hum30.datEST Sequences
48est_hum31.datEST Sequences
49est_hum32.datEST Sequences
50est_hum33.datEST Sequences
51est_hum34.datEST Sequences
52est_hum35.datEST Sequences
53est_hum36.datEST Sequences
54est_hum37.datEST Sequences
55est_hum38.datEST Sequences
56est_hum39.datEST Sequences
57est_hum40.datEST Sequences
58est_hum41.datEST Sequences
59est_hum42.datEST Sequences
60est_hum43.datEST Sequences
61est_hum44.datEST Sequences
62est_hum45.datEST Sequences
63est_hum46.datEST Sequences
64est_hum47.datEST Sequences
65est_hum48.datEST Sequences
66est_hum49.datEST Sequences
67est_hum50.datEST Sequences
68est_hum51.datEST Sequences
69est_hum52.datEST Sequences
70est_hum53.datEST Sequences
71est_inv01.datEST Sequences
72est_inv02.datEST Sequences
73est_inv03.datEST Sequences
74est_inv04.datEST Sequences
75est_inv05.datEST Sequences
76est_inv06.datEST Sequences
77est_inv07.datEST Sequences
78est_inv08.datEST Sequences
79est_inv09.datEST Sequences
80est_inv10.datEST Sequences
81est_inv11.datEST Sequences
82est_inv12.datEST Sequences
83est_inv13.datEST Sequences
84est_inv14.datEST Sequences
85est_inv15.datEST Sequences
86est_inv16.datEST Sequences
87est_inv17.datEST Sequences
88est_inv18.datEST Sequences
89est_inv19.datEST Sequences
90est_mam01.datEST Sequences
91est_mam02.datEST Sequences
92est_mam03.datEST Sequences
93est_mam04.datEST Sequences
94est_mam05.datEST Sequences
95est_mam06.datEST Sequences
96est_mus01.datEST Sequences
97est_mus02.datEST Sequences
98est_mus03.datEST Sequences
99est_mus04.datEST Sequences
100est_mus05.datEST Sequences
101est_mus06.datEST Sequences
102est_mus07.datEST Sequences
103est_mus08.datEST Sequences
104est_mus09.datEST Sequences
105est_mus10.datEST Sequences
106est_mus11.datEST Sequences
107est_mus12.datEST Sequences
108est_mus13.datEST Sequences
109est_mus14.datEST Sequences
110est_mus15.datEST Sequences
111est_mus16.datEST Sequences
112est_mus17.datEST Sequences
113est_mus18.datEST Sequences
114est_mus19.datEST Sequences
115est_mus20.datEST Sequences
116est_mus21.datEST Sequences
117est_mus22.datEST Sequences
118est_mus23.datEST Sequences
119est_mus24.datEST Sequences
120est_mus25.datEST Sequences
121est_mus26.datEST Sequences
122est_mus27.datEST Sequences
123est_mus28.datEST Sequences
124est_mus29.datEST Sequences
125est_mus30.datEST Sequences
126est_mus31.datEST Sequences
127est_mus32.datEST Sequences
128est_mus33.datEST Sequences
129est_mus34.datEST Sequences
130est_mus35.datEST Sequences
131est_mus36.datEST Sequences
132est_mus37.datEST Sequences
133est_mus38.datEST Sequences
134est_pln01.datEST Sequences
135est_pln02.datEST Sequences
136est_pln03.datEST Sequences
137est_pln04.datEST Sequences
138est_pln05.datEST Sequences
139est_pln06.datEST Sequences
140est_pln07.datEST Sequences
141est_pln08.datEST Sequences
142est_pln09.datEST Sequences
143est_pln10.datEST Sequences
144est_pln11.datEST Sequences
145est_pln12.datEST Sequences
146est_pln13.datEST Sequences
147est_pln14.datEST Sequences
148est_pln15.datEST Sequences
149est_pln16.datEST Sequences
150est_pln17.datEST Sequences
151est_pln18.datEST Sequences
152est_pln19.datEST Sequences
153est_pln20.datEST Sequences
154est_pln21.datEST Sequences
155est_pln22.datEST Sequences
156est_pln23.datEST Sequences
157est_pln24.datEST Sequences
158est_pln25.datEST Sequences
159est_pln26.datEST Sequences
160est_pln27.datEST Sequences
161est_pln28.datEST Sequences
162est_pln29.datEST Sequences
163est_pln30.datEST Sequences
164est_pln31.datEST Sequences
165est_pln32.datEST Sequences
166est_pln33.datEST Sequences
167est_pln34.datEST Sequences
168est_pro.datEST Sequences
169est_rod01.datEST Sequences
170est_rod02.datEST Sequences
171est_rod03.datEST Sequences
172est_rod04.datEST Sequences
173est_rod05.datEST Sequences
174est_rod06.datEST Sequences
175est_vrt01.datEST Sequences
176est_vrt02.datEST Sequences
177est_vrt03.datEST Sequences
178est_vrt04.datEST Sequences
179est_vrt05.datEST Sequences
180est_vrt06.datEST Sequences
181est_vrt07.datEST Sequences
182est_vrt08.datEST Sequences
183est_vrt09.datEST Sequences
184est_vrt10.datEST Sequences
185est_vrt11.datEST Sequences
186est_vrt12.datEST Sequences
187est_vrt13.datEST Sequences
188est_vrt14.datEST Sequences
189est_vrt15.datEST Sequences
190est_vrt16.datEST Sequences
191fun.datFungi Sequences
192gss_fun.datGenome Survey Sequences
193gss_hum01.datGenome Survey Sequences
194gss_hum02.datGenome Survey Sequences
195gss_hum03.datGenome Survey Sequences
196gss_hum04.datGenome Survey Sequences
197gss_hum05.datGenome Survey Sequences
198gss_hum06.datGenome Survey Sequences
199gss_hum07.datGenome Survey Sequences
200gss_hum08.datGenome Survey Sequences
201gss_hum09.datGenome Survey Sequences
202gss_inv01.datGenome Survey Sequences
203gss_inv02.datGenome Survey Sequences
204gss_inv03.datGenome Survey Sequences
205gss_inv04.datGenome Survey Sequences
206gss_inv05.datGenome Survey Sequences
207gss_inv06.datGenome Survey Sequences
208gss_inv07.datGenome Survey Sequences
209gss_mam01.datGenome Survey Sequences
210gss_mam02.datGenome Survey Sequences
211gss_mam03.datGenome Survey Sequences
212gss_mus01.datGenome Survey Sequences
213gss_mus02.datGenome Survey Sequences
214gss_mus03.datGenome Survey Sequences
215gss_mus04.datGenome Survey Sequences
216gss_mus05.datGenome Survey Sequences
217gss_mus06.datGenome Survey Sequences
218gss_mus07.datGenome Survey Sequences
219gss_mus08.datGenome Survey Sequences
220gss_mus09.datGenome Survey Sequences
221gss_mus10.datGenome Survey Sequences
222gss_phg.datGenome Survey Sequences
223gss_pln01.datGenome Survey Sequences
224gss_pln02.datGenome Survey Sequences
225gss_pln03.datGenome Survey Sequences
226gss_pln04.datGenome Survey Sequences
227gss_pln05.datGenome Survey Sequences
228gss_pln06.datGenome Survey Sequences
229gss_pln07.datGenome Survey Sequences
230gss_pln08.datGenome Survey Sequences
231gss_pln09.datGenome Survey Sequences
232gss_pln10.datGenome Survey Sequences
233gss_pln11.datGenome Survey Sequences
234gss_pln12.datGenome Survey Sequences
235gss_pln13.datGenome Survey Sequences
236gss_pln14.datGenome Survey Sequences
237gss_pln15.datGenome Survey Sequences
238gss_pln16.datGenome Survey Sequences
239gss_pln17.datGenome Survey Sequences
240gss_pro.datGenome Survey Sequences
241gss_rod01.datGenome Survey Sequences
242gss_rod02.datGenome Survey Sequences
243gss_rod03.datGenome Survey Sequences
244gss_rod04.datGenome Survey Sequences
245gss_vrl.datGenome Survey Sequences
246gss_vrt01.datGenome Survey Sequences
247gss_vrt02.datGenome Survey Sequences
248gss_vrt03.datGenome Survey Sequences
249gss_vrt04.datGenome Survey Sequences
250gss_vrt05.datGenome Survey Sequences
251gss_vrt06.datGenome Survey Sequences
252htc.datHigh throughput cDNAs
253htg_hum01.datHigh Throughput Genome Sequences
254htg_hum02.datHigh Throughput Genome Sequences
255htg_hum03.datHigh Throughput Genome Sequences
256htg_hum04.datHigh Throughput Genome Sequences
257htg_hum05.datHigh Throughput Genome Sequences
258htg_inv01.datHigh Throughput Genome Sequences
259htg_inv02.datHigh Throughput Genome Sequences
260htg_mam01.datHigh Throughput Genome Sequences
261htg_mam02.datHigh Throughput Genome Sequences
262htg_mus01.datHigh Throughput Genome Sequences
263htg_mus02.datHigh Throughput Genome Sequences
264htg_mus03.datHigh Throughput Genome Sequences
265htg_mus04.datHigh Throughput Genome Sequences
266htg_mus05.datHigh Throughput Genome Sequences
267htg_other.datHigh Throughput Genome Sequences
268htg_pln.datHigh Throughput Genome Sequences
269htg_rod01.datHigh Throughput Genome Sequences
270htg_rod02.datHigh Throughput Genome Sequences
271htg_rod03.datHigh Throughput Genome Sequences
272htg_rod04.datHigh Throughput Genome Sequences
273htg_rod05.datHigh Throughput Genome Sequences
274htg_rod06.datHigh Throughput Genome Sequences
275htg_rod07.datHigh Throughput Genome Sequences
276htg_rod08.datHigh Throughput Genome Sequences
277htg_vrt.datHigh Throughput Genome Sequences
278htgo_hum.datHigh Throughput Genome Sequences phase 0
279htgo_mus.datHigh Throughput Genome Sequences phase 0
280htgo_other.datHigh Throughput Genome Sequences phase 0
281hum01.datHuman Sequences
282hum02.datHuman Sequences
283hum03.datHuman Sequences
284hum04.datHuman Sequences
285hum05.datHuman Sequences
286hum06.datHuman Sequences
287hum07.datHuman Sequences
288hum08.datHuman Sequences
289hum09.datHuman Sequences
290hum10.datHuman Sequences
291hum11.datHuman Sequences
292hum12.datHuman Sequences
293hum13.datHuman Sequences
294hum14.datHuman Sequences
295hum15.datHuman Sequences
296hum16.datHuman Sequences
297hum17.datHuman Sequences
298hum18.datHuman Sequences
299hum19.datHuman Sequences
300hum20.datHuman Sequences
301hum21.datHuman Sequences
302hum22.datHuman Sequences
303hum23.datHuman Sequences
304hum24.datHuman Sequences
305hum25.datHuman Sequences
306inv01.datInvertebrate Sequences
307inv02.datInvertebrate Sequences
308inv03.datInvertebrate Sequences
309mam.datOther Mammal Sequences
310mus.datMus musculus Sequences
311org.datOrganelle Sequences
312pat01.datPatent Sequences
313pat02.datPatent Sequences
314pat03.datPatent Sequences
315pat04.datPatent Sequences
316pat05.datPatent Sequences
317pat06.datPatent Sequences
318phg.datBacteriophage Sequences
319pln.datPlant Sequences
320pro01.datProkaryote Sequences
321pro02.datProkaryote Sequences
322pro03.datProkaryote Sequences
323pro04.datProkaryote Sequences
324rod.datRodent Sequences
325sts.datSTS Sequences
326syn.datSynthetic Sequences
327unc.datUnclassified Sequences
328vrl.datViral Sequences
329vrt.datOther Vertebrate Sequences
 
APPENDIX A

DATABASE GROWTH TABLE

The following table shows the growth of the EMBL Nucleotide Sequence Database
at each release.
Release Month Entries Nucleotides
1 06/1982 568 585433
2 04/1983 811 1114447
3 12/1983 1481 1654863
4 08/1984 1698 2147205
5 04/1985 2378 2874493
6 08/1985 4835 4567592
7 12/1985 5789 5622638
8 04/1986 6395 6353040
9 09/1986 7630 7813214
10 12/1986 8817 9766948
11 04/1987 11621 12189783
12 07/1987 12706 13638061
13 10/1987 14397 16023478
14 01/1988 15344 17272160
15 05/1988 17961 20318442
16 08/1988 19592 22625941
17 11/1988 20695 24211054
18 02/1989 22938 27249830
19 05/1989 24365 29066676
20 08/1989 26223 31240948
21 11/1989 28679 34748087
22 02/1990 31508 38165786
23 05/1990 34902 42923803
24 08/1990 37784 47354438
25 11/1990 41580 52900354
26 02/1991 43745 55859549
27 05/1991 46871 59915244
28 09/1991 54558 70448052
29 12/1991 57655 75400487
30 03/1992 63378 83574342
31 06/1992 72481 94390065
32 09/1992 79377 101292310
33 12/1992 89100 111413979
34 03/1993 99591 121420828
35 06/1993 108973 131880111
36 09/1993 127933 145401156
37 12/1993 146576 158171400
38 03/1994 167777 177550115
39 06/1994 182615 192195819
40 09/1994 209352 211017104
41 12/1994 230950 226259607
42 03/1995 303206 262559786
43 06/1995 420111 315840053
44 09/1995 506190 363273777
45 12/1995 622566 427620278
46 03/1996 701246 473691480
47 06/1996 827174 550739395
48 09/1996 928067 608931850
49 12/1996 1047263 696183789
50 03/1997 1187455 789755858
51 06/1997 1432941 931351601
52 10/1997 1787004 1181167498
53 12/1997 1917868 1281391651
54 03/1998 2125225 1427634373
55 06/1998 2330040 1607673907
56 09/1998 2689618 1904091473
57 12/1998 3046471 2164718256
58 03/1999 3272064 2355200790
59 06/1999 3952878 2924568545
60 09/1999 4719266 3543553093
61 12/1999 5303436 4508169737
62 03/2000 5865742 6120908677
63 06/2000 6760113 8255674441
64 09/2000 8344436 9650223037
65 12/2000 9549382 10710321435
66 03/2001 11169673 11916112872
67 06/2001 12044420 12821742622
68 09/2001 12964797 13727100206
69 12/2001 14366182 15383451165
70 03/2002 15851373 17807926047
71 06/2002 17226422 20020556107
72 09/2002 18324246 23090186146
73 12/2002 20857746 27903283528
74 03/2003 23234788 30356786718
75 06/2003 25214767 32195012823