EMBL Outstation - The European Bioinformatics Institute EMBL Nucleotide Sequence Database Release Notes Release 89 December 2006 EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom Telephone: 1223-494400 Telefax : 1223-494468 URL: http://www.ebi.ac.uk/embl Feedback form : http://www.ebi.ac.uk/support/ CONTENTS 1 RELEASE 89 1.1 Feature Table Definition Document v6.6 1.2 Database Files 1.2.1 Naming Conventions 1.2.2 CRC Values for Distributed Files 1.3 Cross-Reference Information 1.4 Digital Object Identifiers (DOI) and PubMed references 1.5 EMBL Database FAQ 1.6 Disclaimer 1.7 Acknowledgements 2 CHANGES IN THIS RELEASE 2.1 Introduction of TGN taxonomic division 2.2 Use of 'O' for pyrrolysine 2.3 Location descriptor "single base from a range" (n.m) is discontinued 2.4 Usage of qualifier /operon changed 2.5 New qualifier /mobile_element and dropping of two old qualifiers 3 FORTHCOMING CHANGES 3.1 New line type for project IDs 4 SEQUENCE SUBMISSION SYSTEMS 5 CITING THE EMBL NUCLEOTIDE SEQUENCE DATABASE 6 EBI NETWORK SERVICES 6.1 Electronic Mail Server 6.2 Anonymous FTP Server 6.3 World Wide Web (WWW) Server 6.4 Sequence Version Archive 6.5 Sequence Similarity Search Servers 7 RELEASE 89 FILES APPENDIX A DATABASE GROWTH TABLE 1 RELEASE 89 The EMBL Nucleotide Sequence Database was frozen to make Release 89 on 30-NOV-2006. The release contains 83,666,567 sequence entries comprising 150,163,403,742 nucleotides, of which 18,535,353 entries (81,521,656,204 nucleotides) are WGS (whole genome shotgun) data. The release 89 files total 69 GB compressed and 376 GB uncompressed. A breakdown of Release 89 by dataclass and taxonomic division is shown below: Breakdown by dataclass Class entries nucleotides ---------------------------------------------------------------- CON:Constructed 849,126 62,900,561,886 EST:Expressed Sequence Tag 39,808,988 21,920,348,867 GSS:Genome Sequence Scan 16,073,640 10,211,729,434 HTC:High Throughput CDNA sequencing 450,564 553,627,818 HTG:High Throughput Genome sequencing 97,088 16,352,298,997 PAT:Patents 3,556,157 2,137,088,250 STD:Standard 3,399,493 16,630,347,569 STS:Sequence Tagged Site 890,995 500,504,557 TPA:Third Party Annotation 5,163 335,802,046 WGS:Whole Genome Shotgun 18,535,353 81,521,656,204 ---------- -------------- Total 83,666,567 150,163,403,742 Breakdown by taxonomic division Division entries nucleotides ---------------------------------------------------------------- ENV:Environmental Samples 2,162,420 1,545,072,677 FUN:Fungi 1,488,539 2,230,258,753 HUM:Human 11,475,120 21,333,688,751 INV:Invertebrates 9,870,449 15,374,595,645 MAM:Other Mammals 17,362,895 50,366,774,550 MUS:Mus musculus 8,167,813 13,853,936,225 PHG:Bacteriophage 4,081 22,102,277 PLN:Plants 19,239,706 15,375,178,666 PRO:Prokaryotes 516,770 3,219,947,843 ROD:Rodents 3,615,200 15,116,847,721 SYN:Synthetic 717,339 272,882,442 TGN:Transgenic 789 458,737 UNC:Unclassified 1,289,894 590,194,931 VRL:Viruses 428,087 428,811,568 VRT:Other Vertebrates 7,327,465 10,432,652,956 ---------- -------------- Total 83,666,567 150,163,403,742 Breakdown by both taxonomic division and dataclass can be found in divisions.ndx, distributed together with the release EMBL database statistics are available at URL: http://www.ebi.ac.uk/embl/Services/DBStats/ Note: The nucleotide count for CON(structed) entries is included in the tables, but not in the total, because it is already included with the statistics for the segments of each constructed entry. 1.1 Feature Table Definition Document v6.5 The last version of the Feature Table Definition Document (FTv6.6) has been implemented in October 2006. The document is available from the EBI servers at: http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/ The next edition of the Feature table document will become available in April 2007. 1.2 Database Files For the full list of distribution files see table in section 7 1.2.1 Naming Conventions For all data apart from WGS, the data file names in the release look as follows rel_dtc_tax_nn_rRN.dat where "dtc" is a three lowercase letters abbreviation for the dataclas "tax" is a three lowercase letters taxonomic division abbreviation "nn" - number of the file in a particular sequence (starting from "01") "RN" - number of the release where the file belongs Examples: rel_est_hum_01_r89.dat rel_htg_mus_04_r89.dat Dataclass list : EST, GSS, HTC, HTG, PAT, STS, STD, TPA, CON Taxonomic division list : HUM, MUS, ROD, PRO, MAM, VRT, FUN, PLN, ENV, INV, SYN, UNC, VRL, PHG Where STD dataclass abbreviation stands for "standard" entries. Filesize is kept under 4 Gb by regulating the number of entries in each file (doesn't apply to WGS files) For WGS data - one data file is formed per WGS project, and filenames incorporate the project prefix and the indication of the taxonomic division of the entries, e.g. wgs_caae_vrt.dat. 1.2.2 CRC values for distributed files To help users verify the integrity of release data files, we supply files containing 32-bit checksum Cyclic Redundancy Check (CRC) values, plus byte counts, for both compressed and uncompressed release files. CRC values are calculated based on POSIX standard, which is implemented as a default behaviour by the 'cksum' command in most of modern Unix and Linux platforms. However, it has been found that some implementations wrap the file byte count to zero when the file size reaches over 4 Gbytes. We are now using 'cksum' from fileutils package to do the calculations. If you are in any doubt whether your 'cksum' is POSIX-compliant, or it has 4 Gbyte-limit(32-bit unsigned integer), you could download the utility from http://www.gnu.org/software/fileutils/, and install it. File: crc_gz.txt for compressed data files File: crc.txt for uncompressed data files Example from crc.txt 1985969636 415093160 rel_con_env_01_r89.dat This output shows that the checksum of the file env01.dat is 1985969636 and the file contains 415093160 bytes. 1.3 Cross-Reference Information Links to external databases allow integration with specialised data collections, such as protein databases, species-specific databases, taxonomy databases etc. The WWW-based sequence retrieval system SRS enables users to easily navigate between cross-referenced database entries. EMBL Release 89 includes 64357622 cross-references to related databases. 14235630 of these also refer to individual features e.g. CDS (coding sequences) via the /db_xref feature qualifier in EMBL entries. EMBL cross-references to other databases: DATABASES Nr of Links -------------------- ----------- UniProtKB/TrEMBL 3852587 SGD 14423 PDB 171502 CABRI 67256 Flybase 135745 GeneDB 7171 GrainGenes 1080193 GOA 2914298 HGNC 145625 H-InvDB 167992 HSSP 724211 EPD 7529 IMGT/HLA 5864 Interpro 5846489 IMGT/LIGM 107738 TRANSFAC 6620 RZPD 8123298 UniProtKB/Swiss-Prot 450748 SubtiList 4106 UNILIB 38566191 Unite 73 VectorBase 30279 VBASE2 2884 WormBase 23732 GDB 1663824 MGI 220666 ZFIN 16578 ----------- Total 64357622 Cross-references in the feature table DATABASES Nr of Links -------------------- ----------- UniProtKB/TrEMBL 3804320 SGD 14422 PDB 171502 Flybase 63288 GeneDB 7171 GOA 2882024 HGNC 145625 HSSP 724211 Interpro 5846489 UniProtKB/Swiss-Prot 421342 SubtiList 4106 VectorBase 29868 GDB 48091 MGI 56593 ZFIN 16578 ----------- Total 14235630 Apart from cross-references to the external resources listed above, internal cross-references can be present in the header of EMBL entries. Such "intradatabase" cross-references include EMBL-TPA, EMBL-ANN, EMBL-CON, EMBL-ALIGN and EMBL-JOIN. Formats and explanation: DR EMBL-TPA; acc#. used in a standard entry that serves as primary source for a TPA entry acc# DR EMBL-ANN; acc#. used in a standard entry that serves as segment in an annotated CON entry acc# DR EMBL-CON; acc#. used in a standard entry that serves as segment in a CON entry acc# DR EMBL-ALIGN; acc#. used in a standard entry that participates in an alignment entry in EMBL Alignment database acc# DR EMBL-JOIN; acc#. used in a standard entry any part of sequence of which is used in a "join" operator in a different entry acc# 1.4 Digital Object Identifiers (DOI) and PubMed references Digital Object Identifiers (DOIs) provide unique references to the URLs of full text versions of cited publications. DOI identifiers are provided for 98897 citations. The number of EMBL entries containing at least one citation with DOI is 18185398. PubMed references are provided for 180222 citations. The number of EMBL entries containing at least one citation with PubMed reference is 19932232. 1.5 EMBL Database FAQ EMBL Database FAQ are available from the EBI at URL http://www.ebi.ac.uk/embl/Documentation/FAQ/ 1.6 Disclaimer No guarantee is given and no legal liability or responsibility is assumed for the completeness and accuracy of the database entries, in particular the conformity of sequence data in the database with the journal publication where the sequence is also disclosed. 1.7 Acknowledgements EMBL database is maintained by: Ruth Akhtar, Philippe Aldebert, Nicola Althorpe, Alastair Baldwin, Kirsty Bates, Sumit Bhattacharyya, Lawrence Bower, Paul Browne, Matias Castro, Guy Cochrane, Nadeem Faruque, Gemma Hoad, Carola Kanz, Tamara Kulikova, Rasko Leinonen, Quan Lin, Dariusz Lorenc, Rodrigo Lopez, Hamish McWilliam, Gaurab Mukherjee, Francesco Nardone, Sheila Plaister, Siamak Sobhany, Robert Vaughan, Dan Wu, Weimin Zhu, and Rolf Apweiler 2 CHANGES IN THIS RELEASE 2.1 Introduction of TGN taxonomic division A new database taxonomic division, Transgenic (TGN), was created in the release 89. Entries representing transgenic organisms (indicated by the inclusion of the /transgenic qualifier in one of the source features), are now be stored in the new TGN division. 2.2 Use of 'O' for pyrrolysine A single-letter amino acid abbreviation "O" is used to represent pyrrolysine in the CDS translation starting from October 2006. 2.3 Location descriptor "single base from a range" (n.m) is discontinued Use of Locaction descriptor defined as "a single base chosen from a range of bases" which is currently indicated by the first base number and the last base number of the range separated by a single period (e.g., '12.21') was discontinued in October 2006. 2.4 Usage of qualifier /operon Qualifier /operon will became valid on the rRNA feature in Oct 2006. 2.5 New qualifier /mobile_element and dropping of two old qualifiers New qualifier /mobile_element was introduced in December 2006 to hold type and name or identifier of the mobile element which is described by the parent feature . At the same time, two less generic qualifiers - /transposon and /insertion sequence were dropped and all existing instances of them retrofitted to make use of the new qualifier. Please note that this change doesn't affect the release itself, but update files after 8 December 2006. 3. Forthcoming changes 3.1 New line type for project IDs New line type, provisional two-character line type code PR, will be introduced into EMBL flatfiles with the March release of EMBL database. The line will contain INSDC-assigned ID for the sequencing project. 4 SEQUENCE SUBMISSION SYSTEM Information on submission of sequence data to the EMBL Nucleotide Sequence Database is available at: http://www.ebi.ac.uk/embl/Submission/ For further information on submission of sequence data to the EMBL Nucleotide Sequence Database please contact database staff at: EMBL Nucleotide Sequence Submissions e-mail: datasubs@ebi.ac.uk telephone: 1223-494499 telefax: 1223-494472 5 CITING THE EMBL NUCLEOTIDE SEQUENCE DATABASE We encourage authors to include a reference to the EMBL Database in publications related to their research. When citing data in the EMBL Database, we suggest authors provide the primary accession number and the publication in which the sequence first appeared. For unpublished data, we suggest authors contact the original submitters for recent publication information or revisions of the data. We suggest authors also provide a reference to the EMBL Database itself. Our recent publication describing the EMBL database should be cited: Tamara Kulikova*, Ruth Akhtar, Philippe Aldebert, Nicola Althorpe, Mikael Andersson, Alastair Baldwin, Kirsty Bates, Sumit Bhattacharyya, Lawrence Bower, Paul Browne, Matias Castro, Guy Cochrane, Karyn Duggan, Ruth Eberhardt, Nadeem Faruque, Gemma Hoad, Carola Kanz, Charles Lee, Rasko Leinonen, Quan Lin, Vincent Lombard, Rodrigo Lopez, Dariusz Lorenc, Hamish McWilliam, Gaurab Mukherjee, Francesco Nardone, Maria Pilar Garcia Pastor, Sheila Plaister, Siamak Sobhany, Peter Stoehr, Robert Vaughan, Dan Wu, Weimin Zhu and Rolf Apweiler EMBL Nucleotide Sequence Database in 2006 Nucleic Acids Research, doi:10.1093/nar/gkl913 6 EBI NETWORK SERVICES 6.1 Electronic Mail Server Copies of database entries and other information could be obtained by sending commands via email to a server running at EBI. New and updated EMBL nucleotide sequence entries are made available on the server on a daily basis. Send file server commands to the address netserv@ebi.ac.uk. Each line of the mail message should consist of a single file server request; the first request to get started, is: HELP When the file server receives this command, it will return a helpfile to the sender, explaining in some detail how to use the facility. 6.2 Anonymous FTP Server The file transfer protocol (ftp) can be used to access the EBI data archives. Researchers with direct access to the Internet can use the FTP program on their local machine to connect to the host ftp.ebi.ac.uk and enter the username "anonymous" and their email address as password. The directory pub/help contains detailed information about the data available from the EBI anonymous FTP server which includes the complete EMBL Nucleotide Sequence Database releases as well as daily and weekly updates and a cumulative update file (gzip compressed format) in the following directories: EMBL quarterly release: /pub/databases/embl/release/ EMBL updates: /pub/databases/embl/new There are other EMBL database datasets available, please check ftp://ftp.ebi.ac.uk/pub/databases/embl/README for more detailed information 6.3 World Wide Web (WWW) Server EMBL database data directory at the EBI : http://www.ebi.ac.uk/embl/ Data Retrieval: Nucleotide sequences can be retrieved with a simple query by accession number at http://www.ebi.ac.uk/cgi-bin/emblfetch Entries can be retrieved in flatfile format, fasta format and in two XML formats - emblxml and insdxml. For programmatic access to the dbfetch, new Web Services version implemented using SOAP and HTTP is recommended: http://www.ebi.ac.uk/Tools/webservices/ More complex queries can be constructed using the SRS databank browser at http://srs.ebi.ac.uk Data Submission: Nucleotide sequences can be submitted to the database using the interactive submission system Webin at http://www.ebi.ac.uk/embl/Submission/webin.html 6.4 Sequence Version Archive The EMBL Sequence Version Archive (SVA) is a publicly available database containing all versions of any entry which has ever appeared in the EMBL database. The archive can be accessed programmatically via dbfetch at http://www.ebi.ac.uk/cgi-bin/dbfetch or interactively via a Web interface at http://www.ebi.ac.uk/embl/sva/ Batch retrieval from the SVA is available at http://www.ebi.ac.uk/cgi-bin/sva/sva.pl?&do_batch=1 Note : Expanded versions of CON entries can be downloaded from the SVA via the batch retrieval form. 6.5 Sequence Similarity Search Servers The EBI offers two network servers for sequence similarity searches via electronic mail or interactive WWW forms: FASTA based on W. Pearson's FASTA algorithm. Allows local similarity searches of protein and nucleotide sequence databases. Send "help" to fasta@ebi.ac.uk or use URL http://www.ebi.ac.uk/fasta33/ Complete genomes and whole genome shotgun datasets can be searched at the following URLs: http://www.ebi.ac.uk/fasta33/genomes.html http://www.ebi.ac.uk/fasta33/wgs.html Alternatively, send "help" command to gpfasta@ebi.ac.uk BLAST based on the NCBI and WU-BLAST software Send "help" to blast@ebi.ac.uk or use URL http://www.ebi.ac.uk/blast2/ 7 RELEASE 89 FILES The release contains the files shown below. File Number File Name Description The release contains the files shown below. File Number File Name Description The release contains the files shown below. File Number File Name Description 1 crc.txt Checksum CRC uncompressed files 2 crc_gz.txt Checksum CRC compressed files 3 deleteac.txt Deleted accession numbers 4 ftable.txt Feature Table Documentation 5 relnotes.txt Release Notes (this document) 6 subinfo.txt Data Submission Documentation 7 update.txt Data Update Form 8 usrman.txt User Manual 9 division.ndx Division Index 10 rel_est_env_01_r89.dat EST Sequences 11 rel_est_fun_01_r89.dat EST Sequences 12 rel_est_fun_02_r89.dat EST Sequences 13 rel_est_hum_01_r89.dat EST Sequences 14 rel_est_hum_02_r89.dat EST Sequences 15 rel_est_hum_03_r89.dat EST Sequences 16 rel_est_hum_04_r89.dat EST Sequences 17 rel_est_hum_05_r89.dat EST Sequences 18 rel_est_hum_06_r89.dat EST Sequences 19 rel_est_hum_07_r89.dat EST Sequences 20 rel_est_hum_08_r89.dat EST Sequences 21 rel_est_hum_09_r89.dat EST Sequences 22 rel_est_hum_10_r89.dat EST Sequences 23 rel_est_hum_11_r89.dat EST Sequences 24 rel_est_hum_12_r89.dat EST Sequences 25 rel_est_hum_13_r89.dat EST Sequences 26 rel_est_hum_14_r89.dat EST Sequences 27 rel_est_inv_01_r89.dat EST Sequences 28 rel_est_inv_02_r89.dat EST Sequences 29 rel_est_inv_03_r89.dat EST Sequences 30 rel_est_inv_04_r89.dat EST Sequences 31 rel_est_inv_05_r89.dat EST Sequences 32 rel_est_inv_06_r89.dat EST Sequences 33 rel_est_inv_07_r89.dat EST Sequences 34 rel_est_inv_08_r89.dat EST Sequences 35 rel_est_inv_09_r89.dat EST Sequences 36 rel_est_inv_10_r89.dat EST Sequences 37 rel_est_mam_01_r89.dat EST Sequences 38 rel_est_mam_02_r89.dat EST Sequences 39 rel_est_mam_03_r89.dat EST Sequences 40 rel_est_mam_04_r89.dat EST Sequences 41 rel_est_mam_05_r89.dat EST Sequences 42 rel_est_mus_01_r89.dat EST Sequences 43 rel_est_mus_02_r89.dat EST Sequences 44 rel_est_mus_03_r89.dat EST Sequences 45 rel_est_mus_04_r89.dat EST Sequences 46 rel_est_mus_05_r89.dat EST Sequences 47 rel_est_mus_06_r89.dat EST Sequences 48 rel_est_mus_07_r89.dat EST Sequences 49 rel_est_mus_08_r89.dat EST Sequences 50 rel_est_pln_01_r89.dat EST Sequences 51 rel_est_pln_02_r89.dat EST Sequences 52 rel_est_pln_03_r89.dat EST Sequences 53 rel_est_pln_04_r89.dat EST Sequences 54 rel_est_pln_05_r89.dat EST Sequences 55 rel_est_pln_06_r89.dat EST Sequences 56 rel_est_pln_07_r89.dat EST Sequences 57 rel_est_pln_08_r89.dat EST Sequences 58 rel_est_pln_09_r89.dat EST Sequences 59 rel_est_pln_10_r89.dat EST Sequences 60 rel_est_pln_11_r89.dat EST Sequences 61 rel_est_pln_12_r89.dat EST Sequences 62 rel_est_pln_13_r89.dat EST Sequences 63 rel_est_pln_14_r89.dat EST Sequences 64 rel_est_pln_15_r89.dat EST Sequences 65 rel_est_pln_16_r89.dat EST Sequences 66 rel_est_pln_17_r89.dat EST Sequences 67 rel_est_pln_18_r89.dat EST Sequences 68 rel_est_pln_19_r89.dat EST Sequences 69 rel_est_pro_01_r89.dat EST Sequences 70 rel_est_rod_01_r89.dat EST Sequences 71 rel_est_rod_02_r89.dat EST Sequences 72 rel_est_unc_01_r89.dat EST Sequences 73 rel_est_vrt_01_r89.dat EST Sequences 74 rel_est_vrt_02_r89.dat EST Sequences 75 rel_est_vrt_03_r89.dat EST Sequences 76 rel_est_vrt_04_r89.dat EST Sequences 77 rel_est_vrt_05_r89.dat EST Sequences 78 rel_est_vrt_06_r89.dat EST Sequences 79 rel_est_vrt_07_r89.dat EST Sequences 80 rel_est_vrt_08_r89.dat EST Sequences 81 rel_est_vrt_09_r89.dat EST Sequences 82 rel_est_vrt_10_r89.dat EST Sequences 83 rel_gss_env_01_r89.dat Genome Survey Sequences 84 rel_gss_fun_01_r89.dat Genome Survey Sequences 85 rel_gss_hum_01_r89.dat Genome Survey Sequences 86 rel_gss_hum_02_r89.dat Genome Survey Sequences 87 rel_gss_inv_01_r89.dat Genome Survey Sequences 88 rel_gss_inv_02_r89.dat Genome Survey Sequences 89 rel_gss_mam_01_r89.dat Genome Survey Sequences 90 rel_gss_mam_02_r89.dat Genome Survey Sequences 91 rel_gss_mam_03_r89.dat Genome Survey Sequences 92 rel_gss_mam_04_r89.dat Genome Survey Sequences 93 rel_gss_mus_01_r89.dat Genome Survey Sequences 94 rel_gss_mus_02_r89.dat Genome Survey Sequences 95 rel_gss_mus_03_r89.dat Genome Survey Sequences 96 rel_gss_phg_01_r89.dat Genome Survey Sequences 97 rel_gss_pln_01_r89.dat Genome Survey Sequences 98 rel_gss_pln_02_r89.dat Genome Survey Sequences 99 rel_gss_pln_03_r89.dat Genome Survey Sequences 100 rel_gss_pln_04_r89.dat Genome Survey Sequences 101 rel_gss_pln_05_r89.dat Genome Survey Sequences 102 rel_gss_pln_06_r89.dat Genome Survey Sequences 103 rel_gss_pln_07_r89.dat Genome Survey Sequences 104 rel_gss_pln_08_r89.dat Genome Survey Sequences 105 rel_gss_pro_01_r89.dat Genome Survey Sequences 106 rel_gss_rod_01_r89.dat Genome Survey Sequences 107 rel_gss_vrl_01_r89.dat Genome Survey Sequences 108 rel_gss_vrt_01_r89.dat Genome Survey Sequences 109 rel_htc_fun_01_r89.dat High throughput cDNAs 110 rel_htc_hum_01_r89.dat High throughput cDNAs 111 rel_htc_inv_01_r89.dat High throughput cDNAs 112 rel_htc_mam_01_r89.dat High throughput cDNAs 113 rel_htc_mus_01_r89.dat High throughput cDNAs 114 rel_htc_pln_01_r89.dat High throughput cDNAs 115 rel_htc_pro_01_r89.dat High throughput cDNAs 116 rel_htc_rod_01_r89.dat High throughput cDNAs 117 rel_htc_vrt_01_r89.dat High throughput cDNAs 118 rel_htg_env_01_r89.dat High Throughput Genome Sequences 119 rel_htg_fun_01_r89.dat High Throughput Genome Sequences 120 rel_htg_hum_01_r89.dat High Throughput Genome Sequences 121 rel_htg_hum_02_r89.dat High Throughput Genome Sequences 122 rel_htg_inv_01_r89.dat High Throughput Genome Sequences 123 rel_htg_inv_02_r89.dat High Throughput Genome Sequences 124 rel_htg_mam_01_r89.dat High Throughput Genome Sequences 125 rel_htg_mam_02_r89.dat High Throughput Genome Sequences 126 rel_htg_mam_03_r89.dat High Throughput Genome Sequences 127 rel_htg_mus_01_r89.dat High Throughput Genome Sequences 128 rel_htg_phg_01_r89.dat High Throughput Genome Sequences 129 rel_htg_pln_01_r89.dat High Throughput Genome Sequences 130 rel_htg_pro_01_r89.dat High Throughput Genome Sequences 131 rel_htg_rod_01_r89.dat High Throughput Genome Sequences 132 rel_htg_rod_02_r89.dat High Throughput Genome Sequences 133 rel_htg_rod_03_r89.dat High Throughput Genome Sequences 134 rel_htg_vrl_01_r89.dat High Throughput Genome Sequences 135 rel_htg_vrt_01_r89.dat High Throughput Genome Sequences 136 rel_pat_env_01_r89.dat Patent Sequences 137 rel_pat_fun_01_r89.dat Patent Sequences 138 rel_pat_hum_01_r89.dat Patent Sequences 139 rel_pat_hum_02_r89.dat Patent Sequences 140 rel_pat_inv_01_r89.dat Patent Sequences 141 rel_pat_mam_01_r89.dat Patent Sequences 142 rel_pat_mus_01_r89.dat Patent Sequences 143 rel_pat_phg_01_r89.dat Patent Sequences 144 rel_pat_pln_01_r89.dat Patent Sequences 145 rel_pat_pro_01_r89.dat Patent Sequences 146 rel_pat_rod_01_r89.dat Patent Sequences 147 rel_pat_syn_01_r89.dat Patent Sequences 148 rel_pat_unc_01_r89.dat Patent Sequences 149 rel_pat_unc_02_r89.dat Patent Sequences 150 rel_pat_vrl_01_r89.dat Patent Sequences 151 rel_pat_vrt_01_r89.dat Patent Sequences 152 rel_std_env_01_r89.dat Standard Sequences 153 rel_std_fun_01_r89.dat Standard Sequences 154 rel_std_hum_01_r89.dat Standard Sequences 155 rel_std_hum_02_r89.dat Standard Sequences 156 rel_std_hum_03_r89.dat Standard Sequences 157 rel_std_hum_04_r89.dat Standard Sequences 158 rel_std_hum_05_r89.dat Standard Sequences 159 rel_std_hum_06_r89.dat Standard Sequences 160 rel_std_hum_07_r89.dat Standard Sequences 161 rel_std_hum_08_r89.dat Standard Sequences 162 rel_std_hum_09_r89.dat Standard Sequences 163 rel_std_hum_10_r89.dat Standard Sequences 164 rel_std_hum_11_r89.dat Standard Sequences 165 rel_std_hum_12_r89.dat Standard Sequences 166 rel_std_hum_13_r89.dat Standard Sequences 167 rel_std_hum_14_r89.dat Standard Sequences 168 rel_std_hum_15_r89.dat Standard Sequences 169 rel_std_hum_16_r89.dat Standard Sequences 170 rel_std_hum_17_r89.dat Standard Sequences 171 rel_std_hum_18_r89.dat Standard Sequences 172 rel_std_hum_19_r89.dat Standard Sequences 173 rel_std_hum_20_r89.dat Standard Sequences 174 rel_std_hum_21_r89.dat Standard Sequences 175 rel_std_hum_22_r89.dat Standard Sequences 176 rel_std_hum_23_r89.dat Standard Sequences 177 rel_std_hum_24_r89.dat Standard Sequences 178 rel_std_hum_25_r89.dat Standard Sequences 179 rel_std_hum_26_r89.dat Standard Sequences 180 rel_std_hum_27_r89.dat Standard Sequences 181 rel_std_inv_01_r89.dat Standard Sequences 182 rel_std_inv_02_r89.dat Standard Sequences 183 rel_std_mam_01_r89.dat Standard Sequences 184 rel_std_mus_01_r89.dat Standard Sequences 185 rel_std_mus_02_r89.dat Standard Sequences 186 rel_std_mus_03_r89.dat Standard Sequences 187 rel_std_mus_04_r89.dat Standard Sequences 188 rel_std_phg_01_r89.dat Standard Sequences 189 rel_std_pln_01_r89.dat Standard Sequences 190 rel_std_pln_02_r89.dat Standard Sequences 191 rel_std_pln_03_r89.dat Standard Sequences 192 rel_std_pro_01_r89.dat Standard Sequences 193 rel_std_pro_02_r89.dat Standard Sequences 194 rel_std_rod_01_r89.dat Standard Sequences 195 rel_std_syn_01_r89.dat Standard Sequences 196 rel_std_tgn_01_r89.dat Standard Sequences 197 rel_std_unc_01_r89.dat Standard Sequences 198 rel_std_vrl_01_r89.dat Standard Sequences 199 rel_std_vrl_02_r89.dat Standard Sequences 200 rel_std_vrt_01_r89.dat Standard Sequences 201 rel_sts_fun_01_r89.dat STS Sequences 202 rel_sts_hum_01_r89.dat STS Sequences 203 rel_sts_inv_01_r89.dat STS Sequences 204 rel_sts_mam_01_r89.dat STS Sequences 205 rel_sts_mus_01_r89.dat STS Sequences 206 rel_sts_pln_01_r89.dat STS Sequences 207 rel_sts_pro_01_r89.dat STS Sequences 208 rel_sts_rod_01_r89.dat STS Sequences 209 rel_sts_vrt_01_r89.dat STS Sequences 210 rel_tpa_fun_01_r89.dat Third Party Annotation 211 rel_tpa_hum_01_r89.dat Third Party Annotation 212 rel_tpa_inv_01_r89.dat Third Party Annotation 213 rel_tpa_mam_01_r89.dat Third Party Annotation 214 rel_tpa_mus_01_r89.dat Third Party Annotation 215 rel_tpa_phg_01_r89.dat Third Party Annotation 216 rel_tpa_pln_01_r89.dat Third Party Annotation 217 rel_tpa_pro_01_r89.dat Third Party Annotation 218 rel_tpa_rod_01_r89.dat Third Party Annotation 219 rel_tpa_syn_01_r89.dat Third Party Annotation 220 rel_tpa_vrl_01_r89.dat Third Party Annotation 221 rel_tpa_vrt_01_r89.dat Third Party Annotation 222 rel_con_env_01_r89.dat Constructed Sequences 223 rel_con_fun_01_r89.dat Constructed Sequences 224 rel_con_hum_01_r89.dat Constructed Sequences 225 rel_con_inv_01_r89.dat Constructed Sequences 226 rel_con_mam_01_r89.dat Constructed Sequences 227 rel_con_mus_01_r89.dat Constructed Sequences 228 rel_con_pln_01_r89.dat Constructed Sequences 229 rel_con_pro_01_r89.dat Constructed Sequences 230 rel_con_rod_01_r89.dat Constructed Sequences 231 rel_con_vrt_01_r89.dat Constructed Sequences 232 wgs_aaaa_pln.dat WGS - Oryza sativa (indica cultivar-group) 233 wgs_aaab_inv.dat WGS - Anopheles gambiae strain PEST 234 wgs_aaac_pro.dat WGS - Bacillus anthracis A2012 235 wgs_aaah_pro.dat WGS - Chloroflexus aurantiacus 236 wgs_aaak_pro.dat WGS - Enterococcus faecium 237 wgs_aaal_pro.dat WGS - Xylella fastidiosa Dixon 238 wgs_aaam_pro.dat WGS - Xylella fastidiosa Ann-1 239 wgs_aaap_pro.dat WGS - Magnetospirillum magnetotacticum 240 wgs_aaau_pro.dat WGS - Azotobacter vinelandii 241 wgs_aaaw_pro.dat WGS - Desulfitobacterium hafniense 242 wgs_aaay_pro.dat WGS - Nostoc punctiforme 243 wgs_aabc_pro.dat WGS - Ferroplasma acidarmanus 244 wgs_aabf_pro.dat WGS - Fusobacterium nucleatum vincentii ATCC49256 245 wgs_aabg_pro.dat WGS - Clostridium thermocellum ATCC 27405 246 wgs_aabl_inv.dat WGS - Plasmodium yoelii yoelii 247 wgs_aabm_pro.dat WGS - Bifidobacterium longum DJO10A 248 wgs_aabr_rod.dat WGS - Rattus norvegicus 249 wgs_aabs_inv.dat WGS - Ciona intestinalis 250 wgs_aabt_fun.dat WGS - Aspergillus terreus 251 wgs_aabu_inv.dat WGS - Drosophila melanogaster strain y 252 wgs_aabw_pro.dat WGS - Rickettsia sibirica 253 wgs_aabx_fun.dat WGS - Neurospora crassa strain OR74A 254 wgs_aaby_fun.dat WGS - Saccharomyces paradoxus 255 wgs_aabz_fun.dat WGS - Saccharomyces mikatae 256 wgs_aaca_fun.dat WGS - Saccharomyces bayanus 257 wgs_aacb_inv.dat WGS - Giardia lamblia ATCC 50803 258 wgs_aacc_hum.dat WGS - Homo sapiens chromosome 7 259 wgs_aacd_fun.dat WGS - Aspergillus nidulans FGSC A4 260 wgs_aace_fun.dat WGS - Saccharomyces kluyveri NRRL Y-12651 261 wgs_aacf_fun.dat WGS - Saccharomyces castellii NRRL Y-12630 262 wgs_aacg_fun.dat WGS - Saccharomyces bayanus 623-6C 263 wgs_aach_fun.dat WGS - Saccharomyces mikatae IFO 1815 264 wgs_aaci_fun.dat WGS - Saccharomyces kudriavzevii IFO 1802 265 wgs_aacj_pro.dat WGS - Haemophilus somnus 2336 266 wgs_aack_pro.dat WGS - Actinobacillus pleuropneumoniae 267 wgs_aacm_fun.dat WGS - Gibberella zeae PH-1 268 wgs_aacn_mam.dat WGS - Canis familiaris 269 wgs_aaco_fun.dat WGS - Cryptococcus neoformans var. grubii H99 270 wgs_aacp_fun.dat WGS - Ustilago maydis 521 271 wgs_aacq_fun.dat WGS - Candida albicans SC5314 272 wgs_aacr_pro.dat WGS - Pasteuria nishizawae 273 wgs_aacs_fun.dat WGS - Coprinopsis cinerea okayama7#130 274 wgs_aact_inv.dat WGS - Ciona savignyi 275 wgs_aacu_fun.dat WGS - Magnaporthe grisea 276 wgs_aacv_pln.dat WGS - Oryza sativa (japonica cultivar-group) 277 wgs_aacw_fun.dat WGS - Rhizopus oryzae RA 99-880 278 wgs_aacy_env.dat WGS - multiple-organism environmental 279 wgs_aacz_mam.dat WGS - Pan troglodytes WU 280 wgs_aada_mam.dat WGS - Pan troglodytes 281 wgs_aadb_hum.dat WGS - Homo sapiens Celera WGA 282 wgs_aadc_hum.dat WGS - Homo sapiens Celera CSA 283 wgs_aadd_hum.dat WGS - Homo sapiens Celera WGSA 284 wgs_aade_inv.dat WGS - Drosophila pseudoobscura 285 wgs_aadg_inv.dat WGS - Apis mellifera 286 wgs_aadj_pro.dat WGS - Rickettsia rickettsii 287 wgs_aadk_inv.dat WGS - Bombyx mori 288 wgs_aadl_env.dat WGS - environmental-sampling 289 wgs_aadm_fun.dat WGS - Kluyveromyces waltii NCYC 2644 290 wgs_aadn_vrt.dat WGS - Gallus gallus 291 wgs_aado_pro.dat WGS - Haemophilus influenzae R2846 292 wgs_aadp_pro.dat WGS - Haemophilus influenzae R2866 293 wgs_aadq_pro.dat WGS - Listeria monocytogenes str. 1/2a 294 wgs_aadr_pro.dat WGS - Listeria monocytogenes str. 4b 295 wgs_aads_fun.dat WGS - Phanerochaete chrysosporium RP-78 296 wgs_aadv_pro.dat WGS - Crocosphaera watsonii WH 8501 297 wgs_aadw_pro.dat WGS - Exiguobacterium sp. 255-15 298 wgs_aaec_fun.dat WGS - Coccidioides immitis RS 299 wgs_aaee_inv.dat WGS - Cryptosporidium parvum chromosome 7 300 wgs_aaef_pro.dat WGS - Rubrobacter xylanophilus DSM 9941 301 wgs_aaeg_fun.dat WGS - Saccharomyces cerevisiae RM11-1a 302 wgs_aaeh_pro.dat WGS - Burkholderia cepacia R1808 303 wgs_aaek_pro.dat WGS - Bacillus cereus G9241 304 wgs_aael_inv.dat WGS - Cryptosporidium hominis strain TU502 305 wgs_aaem_pro.dat WGS - Rubrivivax gelatinosus PM1 306 wgs_aaen_pro.dat WGS - Bacillus anthracis str. CNEVA-9066 307 wgs_aaeo_pro.dat WGS - Bacillus anthracis str. A1055 308 wgs_aaep_pro.dat WGS - Bacillus anthracis str. Vollum 309 wgs_aaeq_pro.dat WGS - Bacillus anthracis str. Kruger B 310 wgs_aaer_pro.dat WGS - Bacillus anthracis str. USA6153 311 wgs_aaes_pro.dat WGS - Bacillus anthracis str. Australia 94 312 wgs_aaeu_inv.dat WGS - Drosophila yakuba 313 wgs_aaew_pro.dat WGS - Desulfuromonas acetoxidans DSM 684 314 wgs_aaex_mam.dat WGS - Canis familiaris 315 wgs_aaey_fun.dat WGS - Cryptococcus neoformans var. neoformans 316 wgs_aafa_pro.dat WGS - Streptococcus suis 89/1591 317 wgs_aafb_inv.dat WGS - Entamoeba histolytica HM-1:IMSS 318 wgs_aafc_mam.dat WGS - Bos taurus 319 wgs_aafd_pln.dat WGS - Thalassiosira pseudonana CCMP1335 320 wgs_aafe_pro.dat WGS - Rickettsia akari str. Hartford 321 wgs_aaff_pro.dat WGS - Rickettsia canadensis str. McKiel 322 wgs_aafi_inv.dat WGS - Dictyostelium discoideum 323 wgs_aafj_pro.dat WGS - Campylobacter upsaliensis RM3195 324 wgs_aafk_pro.dat WGS - Campylobacter lari RM2100 325 wgs_aafl_pro.dat WGS - Campylobacter coli RM2228 326 wgs_aafm_fun.dat WGS - Pichia guilliermondii ATCC 6260 327 wgs_aafn_fun.dat WGS - Candida tropicalis T1 328 wgs_aafo_fun.dat WGS - Candida albicans WO-1 329 wgs_aafp_fun.dat WGS - Cryptococcus neoformans R265 330 wgs_aafr_mam.dat WGS - Monodelphis domestica 331 wgs_aafs_inv.dat WGS - Drosophila pseudoobscura 332 wgs_aaft_fun.dat WGS - Clavispora lusitaniae ATCC 42720 333 wgs_aafu_fun.dat WGS - Chaetomium globosum CBS 148.51 334 wgs_aafv_pro.dat WGS - Streptococcus pyogenes M49 591 335 wgs_aafw_fun.dat WGS - Saccharomyces cerevisiae YJM789 336 wgs_aafx_env.dat WGS - environmental sequence 337 wgs_aafy_env.dat WGS - environmental sequence 338 wgs_aafz_env.dat WGS - environmental sequence 339 wgs_aaga_env.dat WGS - environmental sequence 340 wgs_aagb_pro.dat WGS - Wolbachia(Drosophila ananassae) 341 wgs_aagc_pro.dat WGS - Wolbachia(Drosophila simulans) 342 wgs_aagd_inv.dat WGS - Caenorhabditis remanei 343 wgs_aage_inv.dat WGS - Aedes aegypti 344 wgs_aagf_inv.dat WGS - Tetrahymena thermophila SB210 345 wgs_aagh_inv.dat WGS - Drosophila simulans 346 wgs_aagi_fun.dat WGS - Phaeosphaeria nodorum SN15 347 wgs_aagj_inv.dat WGS - Strongylocentrotus purpuratus 348 wgs_aagk_inv.dat WGS - Theileria parva 349 wgs_aagl_mam.dat WGS - Gorilla gorilla 350 wgs_aagm_mam.dat WGS - Pongo pygmaeus 351 wgs_aagn_mam.dat WGS - Macaca mulatta 352 wgs_aagp_pro.dat WGS - Brevibacterium linens BL2 353 wgs_aagt_fun.dat WGS - Sclerotinia sclerotiorum 1980 354 wgs_aagu_mam.dat WGS - Loxodonta africana 355 wgs_aagv_mam.dat WGS - Dasypus novemcinctus 356 wgs_aagw_mam.dat WGS - Oryctolagus cuniculus 357 wgs_aagx_pro.dat WGS - Mycoplasma genitalium G-37 358 wgs_aagy_pro.dat WGS - Streptococcus pneumoniae TIGR4 359 wgs_aagz_inv.dat WGS - Trypanosoma brucei 360 wgs_aaha_inv.dat WGS - Trypanosoma brucei 361 wgs_aahb_inv.dat WGS - Trypanosoma brucei 362 wgs_aahc_inv.dat WGS - Trichomonas vaginalis 363 wgs_aahf_fun.dat WGS - Aspergillus fumigatus Af293 364 wgs_aahj_pro.dat WGS - Chlorobium limicola DSM 245 365 wgs_aahk_inv.dat WGS - Trypanosoma cruzi 366 wgs_aahm_pro.dat WGS - Burkholderia mallei 10229 367 wgs_aahn_pro.dat WGS - Burkholderia mallei 10399 368 wgs_aaho_pro.dat WGS - Burkholderia mallei GB8 horse 4 369 wgs_aahp_pro.dat WGS - Burkholderia mallei NCTC 10247 370 wgs_aahq_pro.dat WGS - Burkholderia mallei SAVP1 371 wgs_aahr_pro.dat WGS - Burkholderia pseudomallei 1655 372 wgs_aahs_pro.dat WGS - Burkholderia pseudomallei 1710a 373 wgs_aahu_pro.dat WGS - Burkholderia pseudomallei 668 374 wgs_aahv_pro.dat WGS - Burkholderia pseudomallei Pasteur 375 wgs_aahw_pro.dat WGS - Burkholderia pseudomallei S13 376 wgs_aahx_rod.dat WGS - Rattus norvegicus 377 wgs_aahy_mus.dat WGS - Mus musculus 378 wgs_aaib_pro.dat WGS - Chlorobium phaeobacteroides DSM 266 379 wgs_aaic_pro.dat WGS - Chlorobium phaeobacteroides BS1 380 wgs_aaid_fun.dat WGS - Botryotinia fuckeliana B05.10 381 wgs_aaif_pro.dat WGS - Ehrlichia chaffeensis str. Sapulpa 382 wgs_aaih_fun.dat WGS - Aspergillus flavus NRRL3357 383 wgs_aaii_pro.dat WGS - Frankia sp. EAN1pec 384 wgs_aaij_pro.dat WGS - Prosthecochloris aestuarii DSM 271 385 wgs_aaik_pro.dat WGS - Pelodictyon phaeoclathratiforme BU-1 386 wgs_aail_fun.dat WGS - Trichoderma reesei QM9414 387 wgs_aaim_fun.dat WGS - Gibberella moniliformis 7600 388 wgs_aain_pro.dat WGS - Shewanella amazonensis SB2B 389 wgs_aaio_pro.dat WGS - Shewanella baltica OS155 390 wgs_aaiq_pro.dat WGS - Burkholderia mallei FMH 391 wgs_aair_pro.dat WGS - Burkholderia mallei JHU 392 wgs_aait_pro.dat WGS - Paracoccus denitrificans PD1222 393 wgs_aaiw_fun.dat WGS - Uncinocarpus reesii 1704 394 wgs_aaix_pro.dat WGS - Mycobacterium tuberculosis F11 395 wgs_aaiy_mam.dat WGS - Echinops telfairi 396 wgs_aaiz_inv.dat WGS - Drosophila persimilis 397 wgs_aajb_pro.dat WGS - Nocardioides sp. JS614 398 wgs_aajd_pro.dat WGS - Prosthecochloris vibrioformis DSM 265 399 wgs_aajh_pro.dat WGS - Pelobacter propionicus DSM 2379 400 wgs_aaji_fun.dat WGS - Ajellomyces capsulatus NAm1 401 wgs_aajj_inv.dat WGS - Tribolium castaneum 402 wgs_aajm_pro.dat WGS - Bacillus thuringiensis ATCC 35646 403 wgs_aajn_fun.dat WGS - Aspergillus terreus NIH2624 404 wgs_aajo_pro.dat WGS - Streptococcus agalactiae 18RS21 405 wgs_aajp_pro.dat WGS - Streptococcus agalactiae 515 406 wgs_aajq_pro.dat WGS - Streptococcus agalactiae CJB111 407 wgs_aajr_pro.dat WGS - Streptococcus agalactiae COH1 408 wgs_aajs_pro.dat WGS - Streptococcus agalactiae H36B 409 wgs_aajt_pro.dat WGS - Escherichia coli B7A 410 wgs_aaju_pro.dat WGS - Escherichia coli F11 411 wgs_aajv_pro.dat WGS - Escherichia coli E22 412 wgs_aajw_pro.dat WGS - Escherichia coli E110019 413 wgs_aajx_pro.dat WGS - Escherichia coli B171 414 wgs_aajy_pro.dat WGS - Escherichia coli HS 415 wgs_aajz_pro.dat WGS - Escherichia coli E24377A 416 wgs_aaka_pro.dat WGS - Shigella boydii BS512 417 wgs_aakb_pro.dat WGS - Escherichia coli 53638 418 wgs_aakc_pro.dat WGS - Actinobacillus succinogenes 130Z 419 wgs_aakd_fun.dat WGS - Aspergillus clavatus NRRL 1 420 wgs_aake_fun.dat WGS - Neosartorya fischeri NRRL 181 421 wgs_aakf_pro.dat WGS - Vibrio cholerae MO10 422 wgs_aakg_pro.dat WGS - Vibrio cholerae O395 423 wgs_aakh_pro.dat WGS - Vibrio cholerae RC385 424 wgs_aaki_pro.dat WGS - Vibrio cholerae V51 425 wgs_aakj_pro.dat WGS - Vibrio cholerae V52 426 wgs_aakk_pro.dat WGS - Vibrio sp. Ex25 427 wgs_aakl_pro.dat WGS - Ralstonia solanacearum UW551 428 wgs_aakm_inv.dat WGS - Plasmodium vivax 429 wgs_aakn_rod.dat WGS - Cavia porcellus 430 wgs_aako_inv.dat WGS - Drosophila sechellia 431 wgs_aakq_pro.dat WGS - Thermoanaerobacter ethanolicus ATCC 33223 432 wgs_aakr_pro.dat WGS - Mycobacterium tuberculosis C 433 wgs_aaks_pro.dat WGS - Yersinia pestis Angola 434 wgs_aakt_pro.dat WGS - Yersinia pseudotuberculosis IP 31758 435 wgs_aaku_pro.dat WGS - Alkaliphilus metalliredigenes QYMF 436 wgs_aakv_pro.dat WGS - Pseudomonas aeruginosa C3719 437 wgs_aakw_pro.dat WGS - Pseudomonas aeruginosa 2192 438 wgs_aakx_pro.dat WGS - Burkholderia cenocepacia PC184 439 wgs_aaky_pro.dat WGS - Burkholderia dolosa AUO158 440 wgs_aalb_pro.dat WGS - Shewanella putrefaciens CN-32 441 wgs_aalc_pro.dat WGS - Yersinia bercovieri ATCC 43970 442 wgs_aald_pro.dat WGS - Yersinia mollaretii ATCC 43969 443 wgs_aale_pro.dat WGS - Yersinia frederiksenii ATCC 33641 444 wgs_aalf_pro.dat WGS - Yersinia intermedia ATCC 29909 445 wgs_aalg_pro.dat WGS - Marinobacter aquaeolei VT8 446 wgs_aalj_pro.dat WGS - Bradyrhizobium sp. BTAi1 447 wgs_aall_pro.dat WGS - Bacillus cereus subsp. cytotoxis NVH 391-98 448 wgs_aalm_pro.dat WGS - Pseudomonas putida F1 449 wgs_aaln_pro.dat WGS - Shewanella sp. W3-18-1 450 wgs_aalo_pro.dat WGS - Clostridium beijerincki NCIMB 8052 451 wgs_aalp_pro.dat WGS - Prochlorococcus marinus str. MIT 9211 452 wgs_aals_pro.dat WGS - Shewanella sp. PV-4 453 wgs_aalt_mam.dat WGS - Sorex araneus 454 wgs_aalv_pro.dat WGS - Sulfitobacter sp. EE-36 455 wgs_aalw_pro.dat WGS - Caldicellulosiruptor saccharolyticus 456 wgs_aaly_pro.dat WGS - Roseovarius nubinhibens ISM 457 wgs_aalz_pro.dat WGS - Sulfitobacter sp. NAS-14.1 458 wgs_aama_pro.dat WGS - Burkholderia pseudomallei 1106a 459 wgs_aamb_pro.dat WGS - Burkholderia pseudomallei 1106b 460 wgs_aamd_pro.dat WGS - Stigmatella aurantiaca DW4/3-1 461 wgs_aame_pro.dat WGS - Rhodobacter sphaeroides ATCC 17025 462 wgs_aamf_pro.dat WGS - Rhodobacter sphaeroides ATCC 17029 463 wgs_aamg_env.dat WGS - uncultured human fecal virus 464 wgs_aamh_env.dat WGS - uncultured human fecal virus 465 wgs_aami_env.dat WGS - uncultured human fecal virus 466 wgs_aamj_pro.dat WGS - Shigella dysenteriae 1012 467 wgs_aamk_pro.dat WGS - Escherichia coli 101-1 468 wgs_aaml_pro.dat WGS - Clostridium difficile QCD-32g58 469 wgs_aamm_pro.dat WGS - Burkholderia pseudomallei 406e 470 wgs_aamn_pro.dat WGS - Janibacter sp. HTCC2649 471 wgs_aamo_pro.dat WGS - Oceanicola batsensis HTCC2597 472 wgs_aamp_pro.dat WGS - Croceibacter atlanticus HTCC2559 473 wgs_aamq_pro.dat WGS - Oceanicaulis alexandrii HTCC2633 474 wgs_aamr_pro.dat WGS - Vibrio splendidus 12B01 475 wgs_aams_pro.dat WGS - Loktanella vestfoldensis SKA53 476 wgs_aamt_pro.dat WGS - Rhodobacterales bacterium HTCC2654 477 wgs_aamu_pro.dat WGS - Parvularcula bermudensis HTCC2503 478 wgs_aamv_pro.dat WGS - Roseovarius sp. 217 479 wgs_aamw_pro.dat WGS - Erythrobacter sp. NAP1 480 wgs_aamx_pro.dat WGS - Idiomarina baltica OS145 481 wgs_aamy_pro.dat WGS - Nitrobacter sp. Nb-311A 482 wgs_aamz_pro.dat WGS - Cellulophaga sp. MED134 483 wgs_aana_pro.dat WGS - Tenacibaculum sp. MED152 484 wgs_aanb_pro.dat WGS - Roseobacter sp. MED193 485 wgs_aanc_pro.dat WGS - Flavobacterium sp. MED217 486 wgs_aand_pro.dat WGS - Vibrio sp. MED222 487 wgs_aane_pro.dat WGS - Marinomonas sp. MED121 488 wgs_aanf_pro.dat WGS - Bartonella bacilliformis KC583 489 wgs_aang_mam.dat WGS - Felis catus 490 wgs_aanh_vrt.dat WGS - Gasterosteus aculeatus 491 wgs_aani_inv.dat WGS - Drosophila virilis 492 wgs_aanj_pro.dat WGS - Campylobacter jejuni subsp. jejuni CF93-6 493 wgs_aank_pro.dat WGS - Campylobacter jejuni subsp. jejuni 260.94 494 wgs_aanl_pro.dat WGS - Candidatus Sulcia muelleri str. Hc 495 wgs_aanm_pro.dat WGS - Polaromonas naphthalenivorans CJ2 496 wgs_aann_mam.dat WGS - Erinaceus europaeus 497 wgs_aano_pro.dat WGS - Synechococcus sp. WH 5701 498 wgs_aanp_pro.dat WGS - Synechococcus sp. RS9917 499 wgs_aanq_pro.dat WGS - Campylobacter jejuni subsp. jejuni HB93-13 500 wgs_aans_inv.dat WGS - Plasmodium falciparum HB3 501 wgs_aant_pro.dat WGS - Campylobacter jejuni subsp. jejuni 84-25 502 wgs_aanu_mam.dat WGS - Macaca mulatta 503 wgs_aanv_inv.dat WGS - Entamoeba dispar SAW760 504 wgs_aanw_inv.dat WGS - Entamoeba invadens IP1 505 wgs_aanx_pro.dat WGS - Burkholderia mallei 2002721280 506 wgs_aany_pro.dat WGS - Campylobacter jejuni subsp. jejuni 81-176 507 wgs_aanz_pro.dat WGS - Blastopirellula marina DSM 3645 508 wgs_aaoa_pro.dat WGS - gamma proteobacterium KT 71 509 wgs_aaob_pro.dat WGS - marine actinobacterium PHSC20C1 510 wgs_aaoc_pro.dat WGS - Flavobacteriales bacterium HTCC2170 511 wgs_aaod_pro.dat WGS - Alteromonas macleodii 'Deep ecotype' 512 wgs_aaoe_pro.dat WGS - Reinekea sp. MED297 513 wgs_aaof_pro.dat WGS - Nitrococcus mobilis Nb-231 514 wgs_aaog_pro.dat WGS - Polaribacter irgensii 23-P 515 wgs_aaoh_pro.dat WGS - Pseudoalteromonas tunicata D2 516 wgs_aaoi_pro.dat WGS - Robiginitalea biformata HTCC2501 517 wgs_aaoj_pro.dat WGS - Vibrio angustum S14 518 wgs_aaok_pro.dat WGS - Synechococcus sp. WH 7805 519 wgs_aaom_pro.dat WGS - Dehalococcoides sp. BAV1 520 wgs_aaon_pro.dat WGS - Geobacter uraniumreducens Rf4 521 wgs_aaoo_pro.dat WGS - Acidiphilium cryptum JF-5 522 wgs_aaop_pro.dat WGS - Desulfotomaculum reducens MI-1 523 wgs_aaoq_pro.dat WGS - Halorhodospira halophila SL1 524 wgs_aaos_pro.dat Yersinia pestis biovar Orientalis str. IP275 525 wgs_aaot_pro.dat WGS - Oceanicola granulosus HTCC2516 526 wgs_aaou_pro.dat WGS - Photobacterium sp. SKA34 527 wgs_aaov_pro.dat WGS - Lactobacillus reuteri JCM 1112 528 wgs_aaow_pro.dat WGS - Oceanospirillum sp. MED92 529 wgs_aaox_pro.dat WGS - Bacillus sp. NRRL B-14911 530 wgs_aaoy_pro.dat WGS - Bacillus weihenstephanensis KBAB4 531 wgs_aaoz_pro.dat WGS - Halothermothrix orenii H 168 532 wgs_aapa_pro.dat WGS - Mycobacterium flavescens PYR-GCK 533 wgs_aapc_pro.dat WGS - Xanthobacter autotrophicus Py2 534 wgs_aapd_pro.dat WGS - Flavobacteria bacterium BBFL7 535 wgs_aape_mam.dat WGS - Myotis lucifugus 536 wgs_aapf_pro.dat WGS - Mycobacterium vanbaalenii PYR-1 537 wgs_aapg_pro.dat WGS - Psychromonas sp. CNPT3 538 wgs_aaph_pro.dat WGS - Photobacterium profundum 3TCK 539 wgs_aapi_pro.dat WGS - marine gamma proteobacterium HTCC2207 540 wgs_aapj_pro.dat WGS - Aurantimonas sp. SI85-9A1 541 wgs_aapk_pro.dat WGS - Staphylococcus aureus subsp. aureus JH1 542 wgs_aapl_pro.dat WGS - Staphylococcus aureus subsp. aureus JH9 543 wgs_aapm_pro.dat WGS - Flavobacterium johnsoniae UW101 544 wgs_aapn_mam.dat WGS - Ornithorhynchus anatinus 545 wgs_aapo_fun.dat WGS - Lodderomyces elongisporus NRRL YB-4239 546 wgs_aapp_inv.dat WGS - Drosophila ananassae 547 wgs_aapq_inv.dat WGS - Drosophila erecta 548 wgs_aapr_pro.dat WGS - Psychroflexus torquis ATCC 700755 549 wgs_aaps_pro.dat WGS - Vibrio alginolyticus 12G01 550 wgs_aapt_inv.dat WGS - Drosophila grimshawi 551 wgs_aapu_inv.dat WGS - Drosophila mojavensis 552 wgs_aapv_pro.dat WGS - Candidatus Pelagibacter ubique HTCC1002 553 wgs_aapx_pro.dat WGS - Psychrobacter sp. PRwf-1 554 wgs_aapy_mam.dat WGS - Tupaia belangeri 555 wgs_aapz_pro.dat WGS - Lactobacillus reuteri 100-23 556 wgs_aaqb_inv.dat WGS - Drosophila willistoni 557 wgs_aaqc_pro.dat WGS - Mycobacterium sp. JLS 558 wgs_aaqd_pro.dat WGS - Mycobacterium sp. KMS 559 wgs_aaqe_pro.dat WGS - Pseudomonas aeruginosa PA7 560 wgs_aaqf_pro.dat WGS - delta proteobacterium MLMS-1 561 wgs_aaqg_pro.dat WGS - Sphingomonas sp. SKA58 562 wgs_aaqh_pro.dat WGS - Oceanobacter sp. RED65 563 wgs_aaqi_pro.dat WGS - Coxiella burnetii Dugway 7E9-12 564 wgs_aaqj_pro.dat WGS - Rickettsiella grylli 565 wgs_aaqk_env.dat WGS - environmental sequence 566 wgs_aaql_env.dat WGS - environmental sequence 567 wgs_aaqm_inv.dat WGS - Toxoplasma gondii RH 568 wgs_aaqn_pro.dat WGS - Xanthomonas oryzae pv. oryzicola BLS256 569 wgs_aaqo_pro.dat WGS - Coxiella burnetii RSA 331 570 wgs_aaqp_pro.dat Wolbachia endosymbiont of Drosophila willistoni 571 wgs_aaqq_rod.dat WGS - Spermophilus tridecemlineatus 572 wgs_aaqr_mam.dat WGS - Otolemur garnettii 573 wgs_aaqs_pro.dat WGS - Psychromonas ingrahamii 37 574 wgs_aaqt_pro.dat WGS - Clostridium phytofermentans ISDg 575 wgs_aaqu_pro.dat WGS - Roseiflexus sp. RS-1 576 wgs_aaqv_pro.dat WGS - Clostridium sp. OhILAs 577 wgs_aaqw_pro.dat WGS - Pseudomonas aeruginosa PACS2 578 wgs_aaqx_pln.dat WGS - Phytophthora ramorum 579 wgs_aaqy_pln.dat WGS - Phytophthora sojae 580 wgs_aaqz_pro.dat WGS - Campylobacter concisus 13826 581 wgs_aara_pro.dat WGS - Campylobacter curvus 525.92 582 wgs_aarb_pro.dat WGS - Campylobacter jejuni subsp. doylei 269.97 583 wgs_aarc_pro.dat WGS - Rickettsia bellii OSU 85-389 584 wgs_aare_fun.dat WGS - Ascosphaera apis USDA-ARSEF 7405 585 wgs_aarf_pro.dat WGS - Paenibacillus larvae subsp. larvae 586 wgs_aarg_pro.dat WGS - Fusobacterium nucleatum subsp. polymorphum 587 wgs_aarh_pln.dat WGS - Populus trichocarpa 588 wgs_aari_pro.dat WGS - Listeria monocytogenes FSL F2-515 589 wgs_aarj_pro.dat WGS - Listeria monocytogenes FSL J1-194 590 wgs_aark_pro.dat WGS - Listeria monocytogenes FSL J1-175 591 wgs_aarl_pro.dat WGS - Listeria monocytogenes FSL J1-208 592 wgs_aarm_pro.dat WGS - Listeria monocytogenes FSL J2-003 593 wgs_aaro_pro.dat WGS - Listeria monocytogenes FSL J2-064 594 wgs_aarq_pro.dat WGS - Listeria monocytogenes FSL N3-165 595 wgs_aarr_pro.dat WGS - Listeria monocytogenes FSL R2-503 596 wgs_aaru_pro.dat WGS - Listeria monocytogenes F6900 597 wgs_aarw_pro.dat WGS - Listeria monocytogenes J0161 598 wgs_aarx_pro.dat WGS - Listeria monocytogenes J2818 599 wgs_aary_pro.dat WGS - Listeria monocytogenes LO28 600 wgs_aarz_pro.dat WGS - Listeria monocytogenes 10403S 601 wgs_aasa_pro.dat WGS - Mannheimia haemolytica PHL213 602 wgs_aasc_inv.dat WGS - Aplysia californica 603 wgs_aasd_pro.dat WGS - Acidovorax sp. JS42 604 wgs_aase_pro.dat WGS - Chlorobium ferrooxidans DSM 13031 605 wgs_aasg_pln.dat WGS - Ricinus communis 606 wgs_aash_pro.dat WGS - Geobacter sp. FRC-32 607 wgs_aasi_pro.dat WGS - Methanoculleus marisnigri JR1 608 wgs_aasj_pro.dat WGS - Thermofilum pendens Hrk 5 609 wgs_aasl_pro.dat WGS - Campylobacter jejuni subsp. jejuni 81-176 610 wgs_aasm_inv.dat WGS - Plasmodium falciparum Dd2 611 wgs_aasn_pro.dat WGS - Mycobacterium tuberculosis str. Haarlem 612 wgs_aaso_fun.dat WGS - Coccidioides immitis H538.4 613 wgs_aasp_pro.dat WGS - desc to add 614 wgs_aasq_pro.dat WGS - Verminephrobacter eiseniae EF01-2 615 wgs_aasr_inv.dat WGS - Drosophila simulans 616 wgs_aass_inv.dat WGS - Drosophila simulans 617 wgs_aast_inv.dat WGS - Drosophila simulans 618 wgs_aasu_inv.dat WGS - Drosophila simulans 619 wgs_aasv_inv.dat WGS - Drosophila simulans 620 wgs_aasw_inv.dat WGS - Drosophila simulans 621 wgs_aasx_pro.dat WGS - Acidovorax avenae subsp. citrulli AAC00-1 622 wgs_aasz_env.dat WGS - environmental sequence 623 wgs_aatg_pro.dat WGS - Sinorhizobium medicae WSM419 624 wgs_aath_pro.dat WGS - Caulobacter sp. K31 625 wgs_aati_pro.dat WGS - Herpetosiphon aurantiacus ATCC 23779 626 wgs_aatj_pro.dat WGS - Salinispora tropica CNB-440 627 wgs_aatk_pro.dat WGS - Shewanella baltica OS195 628 wgs_aatl_pro.dat WGS - Listeria monocytogenes HPB2262 629 wgs_aatm_fun.dat WGS - Schizosaccharomyces japonicus yFS275 630 wgs_aatn_env.dat WGS - environmental sequence 631 wgs_aato_env.dat WGS - environmental sequence 632 wgs_aatp_pro.dat WGS - Fulvimarina pelagi HTCC2506 633 wgs_aatq_pro.dat WGS - Roseovarius sp. HTCC2601 634 wgs_aatr_pro.dat WGS - alpha proteobacterium HTCC2255 635 wgs_aats_pro.dat WGS - Mariprofundus ferrooxydans PV-1 636 wgs_aatt_fun.dat WGS - Batrachochytrium dendrobatidis JEL423 637 wgs_aatu_pln.dat WGS - Phytophthora infestans T30-4 638 wgs_aatv_pro.dat WGS - Thermoanaerobacter ethanolicus X514 639 wgs_aatw_pro.dat WGS - Desulfovibrio vulgaris subsp. vulgaris DP4 640 wgs_aatx_fun.dat WGS - Coccidioides immitis RMSCC 2394 641 wgs_aaty_pro.dat WGS - Vibrio cholerae AM-19226 642 wgs_aatz_pro.dat WGS - Synechococcus sp. BL107 643 wgs_aaua_pro.dat WGS - Synechococcus sp. RS9916 644 wgs_aaub_pro.dat WGS - Yersinia pestis FV-1 645 wgs_aauc_pro.dat WGS - Polynucleobacter sp. QLW-P1DMWA-1 646 wgs_aaue_pro.dat WGS - Bacillus cereus AH820 647 wgs_aauf_pro.dat WGS - Bacillus cereus AH187 648 wgs_aaug_pro.dat WGS - Burkholderia phymatum STM815 649 wgs_aauh_pro.dat WGS - Burkholderia phytofirmans PsJN 650 wgs_aaui_pro.dat WGS - Chloroflexus aggregans DSM 9485 651 wgs_aauj_pro.dat WGS - Comamonas testosteroni KF-1 652 wgs_aauk_pro.dat WGS - Fervidobacterium nodosum Rt17-B1 653 wgs_aaul_pro.dat WGS - Pseudomonas mendocina ymp 654 wgs_aaum_pro.dat WGS - Roseiflexus castenholzii DSM 13941 655 wgs_aaun_pro.dat WGS - Serratia proteamaculans 568 656 wgs_aauo_pro.dat WGS - Shewanella woodyi ATCC 51908 657 wgs_aaup_pro.dat WGS - Coxiella burnetii 'MSU Goat Q177' 658 wgs_aaur_pro.dat WGS - Vibrio cholerae 1587 659 wgs_aaus_pro.dat WGS - Vibrio cholerae MAK 757 660 wgs_aauu_pro.dat WGS - Vibrio cholerae MZO-3 661 wgs_aauv_pro.dat WGS - Oenococcus oeni ATCC BAA-1163 662 wgs_aauw_pro.dat WGS - Stappia aggregata IAM 12614 663 wgs_aaux_pro.dat WGS - Methylophilales bacterium HTCC2181 664 wgs_baab_inv.dat WGS - Bombyx mori 665 wgs_baac_pro.dat WGS - Pelotomaculum thermopropionicum SI 666 wgs_baad_pro.dat WGS - Bifidobacterium adolescentis 667 wgs_baaf_vrt.dat WGS - Oryzias latipes 668 wgs_caaa_mus.dat WGS - Mus musculus 669 wgs_caab_vrt.dat WGS - Fugu rubripes 670 wgs_caac_inv.dat WGS - Caenorhabditis briggsae 671 wgs_caae_vrt.dat WGS - Tetraodon nigroviridis 672 wgs_caai_inv.dat WGS - Plasmodium berghei 673 wgs_caaj_inv.dat WGS - Plasmodium chabaudi 674 wgs_caak_vrt.dat WGS - Danio rerio 675 wgs_caal_inv.dat WGS - Paramecium tetraurelia 676 wgs_caam_env.dat WGS - environmental sequence 677 wgs_caan_env.dat WGS - Neanderthal fossil environmental sequences APPENDIX A DATABASE GROWTH TABLE The following table shows the growth of the EMBL Nucleotide Sequence Database at each release. Release Month Entries Nucleotides 1 06/1982 568 585433 2 04/1983 811 1114447 3 12/1983 1481 1654863 4 08/1984 1698 2147205 5 04/1985 2378 2874493 6 08/1985 4835 4567592 7 12/1985 5789 5622638 8 04/1986 6395 6353040 9 09/1986 7630 7813214 10 12/1986 8817 9766948 11 04/1987 11621 12189783 12 07/1987 12706 13638061 13 10/1987 14397 16023478 14 01/1988 15344 17272160 15 05/1988 17961 20318442 16 08/1988 19592 22625941 17 11/1988 20695 24211054 18 02/1989 22938 27249830 19 05/1989 24365 29066676 20 08/1989 26223 31240948 21 11/1989 28679 34748087 22 02/1990 31508 38165786 23 05/1990 34902 42923803 24 08/1990 37784 47354438 25 11/1990 41580 52900354 26 02/1991 43745 55859549 27 05/1991 46871 59915244 28 09/1991 54558 70448052 29 12/1991 57655 75400487 30 03/1992 63378 83574342 31 06/1992 72481 94390065 32 09/1992 79377 101292310 33 12/1992 89100 111413979 34 03/1993 99591 121420828 35 06/1993 108973 131880111 36 09/1993 127933 145401156 37 12/1993 146576 158171400 38 03/1994 167777 177550115 39 06/1994 182615 192195819 40 09/1994 209352 211017104 41 12/1994 230950 226259607 42 03/1995 303206 262559786 43 06/1995 420111 315840053 44 09/1995 506190 363273777 45 12/1995 622566 427620278 46 03/1996 701246 473691480 47 06/1996 827174 550739395 48 09/1996 928067 608931850 49 12/1996 1047263 696183789 50 03/1997 1187455 789755858 51 06/1997 1432941 931351601 52 10/1997 1787004 1181167498 53 12/1997 1917868 1281391651 54 03/1998 2125225 1427634373 55 06/1998 2330040 1607673907 56 09/1998 2689618 1904091473 57 12/1998 3046471 2164718256 58 03/1999 3272064 2355200790 59 06/1999 3952878 2924568545 60 09/1999 4719266 3543553093 61 12/1999 5303436 4508169737 62 03/2000 5865742 6120908677 63 06/2000 6760113 8255674441 64 09/2000 8344436 9650223037 65 12/2000 9549382 10710321435 66 03/2001 11169673 11916112872 67 06/2001 12044420 12821742622 68 09/2001 12964797 13727100206 69 12/2001 14366182 15383451165 70 03/2002 15851373 17807926047 71 06/2002 17226422 20020556107 72 09/2002 18324246 23090186146 73 12/2002 20857746 27903283528 74 03/2003 23234788 30356786718 75 06/2003 25214767 32195012823 76 09/2003 27248475 33885908155 77 12/2003 30351263 36042464651 78 03/2004 32631252 37984728579 79 06/2004 39214123 65185548741 80 09/2004 42312264 70222432184 81 12/2004 46105397 79271300840 82 03/2005 49474402 85134714382 83 06/2005 54491598 94996164558 84 09/2005 58758902 107562580723 85 12/2005 64739883 116106677726 86 03/2006 69783593 126401347060 87 06/2006 74034622 134602904495 88 09/2006 80591891 146595277574 89 12/2006 83666567 150163403742