UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2013_09 STATISTICS
1. INTRODUCTION
Release 2013_09 of 18-Sep-2013 of UniProtKB/TrEMBL contains 42821879 sequence entries,
comprising 13630914768 amino acids.
1396586 sequences have been added since release 2013_08, the sequence data of
7581 existing entries has been updated and the annotations of
10916723 entries have been revised. This represents an increase of 3%.
Number of fragments: 4420020
Protein existence (PE): entries %
1: Evidence at protein level 20622 0.05%
2: Evidence at transcript level 818805 1.91%
3: Inferred from homology 9893252 23.10%
4: Predicted 32089200 74.94%
5: Uncertain 0 0.00%
The growth of the database is summarized below.
2. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/TrEMBL: 429833
The first twenty species represent 1892237 sequences: 4.4 % of the
total number of entries.
2.1 Table of the frequency of occurrence of species
Species represented 1x:17720
2x:70277
3x:37887
4x:27067
5x:16361
6x:11384
7x: 8710
8x: 6853
9x: 5437
10x:10574
11- 20x:30839
21- 50x:10249
51-100x: 3983
>100x:13003
2.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 546713 Human immunodeficiency virus 1
2 201102 uncultured bacterium
3 113507 Homo sapiens (Human)
4 96854 Oryza sativa subsp. japonica (Rice)
5 89025 Hepatitis C virus
6 73840 Glycine max (Soybean) (Glycine hispida)
7 70413 Hordeum vulgare var. distichum (Two-rowed barley)
8 69149 Macaca mulatta (Rhesus macaque)
9 60522 Zea mays (Maize)
10 60361 Hepatitis B virus (HBV)
11 56803 Mus musculus (Mouse)
12 56145 Medicago truncatula (Barrel medic) (Medicago tribuloides)
13 54890 Solanum tuberosum (Potato)
14 54131 Vitis vinifera (Grape)
15 52260 Danio rerio (Zebrafish) (Brachydanio rerio)
16 50601 Trichomonas vaginalis
17 49263 Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
18 48906 Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
19 44560 Populus trichocarpa (Western balsam poplar)
20 43192 Callithrix jacchus (White-tufted-ear marmoset)
21 41214 Arabidopsis thaliana (Mouse-ear cress)
22 41204 Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
23 39850 Paramecium tetraurelia
24 39842 Oryza sativa subsp. indica (Rice)
25 39300 Setaria italica (Foxtail millet) (Panicum italicum)
26 38798 Mustela putorius furo (European domestic ferret) (Mustela furo)
27 38163 human gut metagenome
28 36691 Drosophila melanogaster (Fruit fly)
29 36522 Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
30 35920 Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
31 35631 Ailuropoda melanoleuca (Giant panda)
32 35599 Emiliania huxleyi CCMP1516
33 35205 Acyrthosiphon pisum (Pea aphid)
34 35112 Simian immunodeficiency virus (SIV)
35 35066 Caenorhabditis japonica
36 34830 Physcomitrella patens subsp. patens (Moss)
37 34570 Thalassiosira oceanica (Marine diatom)
38 34369 Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
39 33845 Sorghum bicolor (Sorghum) (Sorghum vulgare)
40 33253 Selaginella moellendorffii (Spikemoss)
41 32767 Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
42 32342 Oryza brachyantha
43 32204 Sus scrofa (Pig)
44 32122 Caenorhabditis remanei (Caenorhabditis vulgaris)
45 32094 Oryza glaberrima (African rice)
46 31849 Pan troglodytes (Chimpanzee)
47 31386 Ricinus communis (Castor bean)
48 31207 Capitella teleta
49 30926 Daphnia pulex (Water flea)
50 30712 Caenorhabditis brenneri (Nematode worm)
51 30146 Brachypodium distachyon (Purple false brome) (Trachynia distachya)
52 29815 Amphimedon queenslandica (Sponge)
53 29451 Strongylocentrotus purpuratus (Purple sea urchin)
54 29318 Pristionchus pacificus (Parasitic nematode)
55 29183 Branchiostoma floridae (Florida lancelet) (Amphioxus)
56 29054 Oikopleura dioica (Tunicate)
57 28856 Escherichia coli
58 28835 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
59 28825 Capsella rubella
60 28614 Prunus persica (Peach) (Amygdalus persica)
61 28521 Canis familiaris (Dog) (Canis lupus familiaris)
62 28099 Gasterosteus aculeatus (Three-spined stickleback)
63 27753 Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
64 27504 Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
65 27460 Equus caballus (Horse)
66 27089 Gorilla gorilla gorilla (Lowland gorilla)
67 26827 Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
68 25970 Oryzias latipes (Medaka fish) (Japanese ricefish)
69 25797 Loxodonta africana (African elephant)
70 25721 Rattus norvegicus (Rat)
71 25721 Phytophthora sojae (strain P6497) (Soybean stem and root rot agent)
72 25655 Bos taurus (Bovine)
73 25100 Oryctolagus cuniculus (Rabbit)
74 24905 Nematostella vectensis (Starlet sea anemone)
75 24643 Tetrahymena thermophila (strain SB210)
76 24590 Guillardia theta CCMP2712
77 24374 Triticum urartu (Red wild einkorn) (Crithodium urartu)
78 24208 Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
79 23716 Ornithorhynchus anatinus (Duckbill platypus)
80 23565 Oxytricha trifallax
81 23502 Latimeria chalumnae (West Indian ocean coelacanth)
82 23115 Perkinsus marinus (strain ATCC 50983 / TXsc)
83 22751 Monodelphis domestica (Gray short-tailed opossum)
84 22562 Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
85 22525 Caenorhabditis elegans
86 22313 Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
87 22163 gut metagenome
88 21548 Heterocephalus glaber (Naked mole rat)
89 21346 Caenorhabditis briggsae
90 21311 Gallus gallus (Chicken)
91 21125 Ixodes scapularis (Black-legged tick) (Deer tick)
92 20940 Felis catus (Cat) (Felis silvestris catus)
93 20867 Myotis lucifugus (Little brown bat)
94 20838 Tupaia chinensis (Chinese tree shrew)
95 20760 Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
96 20512 Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
97 20133 Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
98 20114 Ciona savignyi (Pacific transparent sea squirt)
99 20073 Cavia porcellus (Guinea pig)
100 19985 Spermophilus tridecemlineatus (Thirteen-lined ground squirrel)
101 19816 Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
102 19684 Taeniopygia guttata (Zebra finch) (Poephila guttata)
103 19551 Anolis carolinensis (Green anole) (American chameleon)
104 19546 Pteropus alecto (Black flying fox)
105 19438 Wuchereria bancrofti
106 19336 Toxoplasma gondii
107 19200 Trypanosoma cruzi (strain CL Brener)
108 19057 Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
109 18949 Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
110 18855 Drosophila simulans (Fruit fly)
111 18771 mine drainage metagenome
112 18592 Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
113 18555 Bos grunniens mutus
114 18115 Atta cephalotes (Leafcutter ant)
115 18026 Anopheles gambiae (African malaria mosquito)
116 17839 Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver)
117 17784 Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
118 17599 Phytophthora infestans (strain T30-4) (Potato late blight fungus)
119 17520 Bombyx mori (Silk moth)
120 17408 Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
121 17301 Anas platyrhynchos (Domestic duck) (Anas boschas)
122 17282 Nasonia vitripennis (Parasitic wasp)
123 17047 Tribolium castaneum (Red flour beetle)
124 17040 Drosophila yakuba (Fruit fly)
125 16946 Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)
126 16917 Meleagris gallopavo (Common turkey)
127 16714 Drosophila persimilis (Fruit fly)
128 16698 Drosophila pseudoobscura pseudoobscura (Fruit fly)
129 16649 Plasmodium falciparum
130 16639 Fusarium oxysporum f. sp. lycopersici
131 16469 Hepatitis C virus subtype 1b
132 16426 Ectocarpus siliculosus (Brown alga)
133 16338 Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
134 16329 Danaus plexippus (Monarch butterfly)
135 16274 Trichinella spiralis (Trichina worm)
136 16237 Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7)
137 16188 Drosophila sechellia (Fruit fly)
138 16156 Schistosoma japonicum (Blood fluke)
139 16110 Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
140 15793 Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL)
141 15762 Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)
142 15716 Naegleria gruberi (Amoeba)
143 15653 Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI)
144 15568 Phytophthora ramorum (Sudden oak death agent)
145 15461 Myotis davidii (David's myotis)
146 15421 Drosophila willistoni (Fruit fly)
147 15371 Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus)
148 15354 Loa loa (Eye worm) (Filaria loa)
149 15345 Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
150 15225 Pythium ultimum
151 15177 Hepatitis C virus subtype 1a
152 15144 Drosophila ananassae (Fruit fly)
153 15057 Pararge aegeria (specked wood butterfly)
154 15041 Harpegnathos saltator (Jerdon's jumping ant)
155 15040 Klebsiella pneumoniae
156 14942 Acanthamoeba castellanii str. Neff
157 14927 Drosophila erecta (Fruit fly)
158 14910 Dendroctonus ponderosae (mountain pine beetle)
159 14861 Chlamydomonas reinhardtii (Chlamydomonas smithii)
160 14801 Camponotus floridanus (Florida carpenter ant)
161 14792 Fusarium fujikuroi IMI 58289
162 14791 Drosophila mojavensis (Fruit fly)
163 14713 Plasmodium chabaudi
164 14704 Drosophila virilis (Fruit fly)
165 14652 Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
166 14610 Gaeumannomyces graminis var. tritici (strain R3-111a-1)
167 14592 uncultured archaeon
168 14419 Rabies virus
169 14417 Volvox carteri (Green alga)
170 14341 Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
171 14339 Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
172 14236 Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold)
173 14147 Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
174 13970 Acromyrmex echinatior (Panamanian leafcutter ant)
175 13923 Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent)
176 13876 Clonorchis sinensis (Chinese liver fluke)
177 13867 Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus)
178 13801 Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
179 13766 Aspergillus niger (strain CBS 513.88 / FGSC A1513)
180 13648 Moniliophthora perniciosa (strain FA553 / isolate CP02)
181 13588 Trypanosoma cruzi
182 13345 Aspergillus flavus
183 13329 Colletotrichum orbiculare
184 13267 Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)
185 13121 Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
186 13109 Petromyzon marinus (Sea lamprey)
187 13082 Glarea lozoyensis ATCC 20868
188 13062 Mycosphaerella fijiensis (strain CIRAD86) (Black leaf streak disease fungus)
189 13043 Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)
190 12983 Albugo laibachii Nc14
191 12962 Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006)
192 12950 Stigmatella aurantiaca (strain DW4/3-1)
193 12900 Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)
194 12856 Cochliobolus heterostrophus (strain C5 / ATCC 48332 / race O)
195 12846 Magnaporthe oryzae (strain Y34) (Rice blast fungus) (Pyricularia oryzae)
196 12754 Porcine reproductive and respiratory syndrome virus (PRRSV)
197 12722 Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
198 12711 Magnaporthe oryzae (strain P131) (Rice blast fungus) (Pyricularia oryzae)
199 12703 Cochliobolus heterostrophus (strain C4 / ATCC 48331 / race T)
200 12697 Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255)
201 12696 Trypanosoma congolense (strain IL3000)
202 12681 Schistosoma mansoni (Blood fluke)
203 12630 Xenopus laevis (African clawed frog)
204 12586 Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)
205 12464 Helicobacter pylori (Campylobacter pylori)
206 12447 Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
207 12440 Polysphondylium pallidum (Cellular slime mold)
208 12414 Mycosphaerella pini (strain NZE10 / CBS 128990) (Red band needle blight fungus)
209 12389 Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens)
210 12352 Dictyostelium purpureum (Slime mold)
211 12197 Thanatephorus cucumeris (strain AG1-IB / isolate 7/3/14)
212 12174 Cochliobolus sativus (strain ND90Pr / ATCC 201652)
213 12152 Dictyostelium fasciculatum (strain SH3) (Slime mold)
214 12143 Mucor circinelloides f. circinelloides (strain 1006PhL) (Mucormycosis agent)
215 12078 Ceriporiopsis subvermispora (strain B) (White-rot fungus)
216 12012 Apis mellifera (Honeybee)
217 11994 Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus)
218 11993 Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)
219 11941 Emericella nidulans
220 11815 Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
221 11780 Piriformospora indica (strain DSM 11827)
222 11752 Chondrocladia sp. SMF
2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 697897 ( 2%)
Bacteria 32004412 ( 75%)
Eukaryota 8237075 ( 19%)
Viruses 1777493 ( 4%)
Other 105001 ( <1%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 113547 ( 1%) ( 0%)
Other Mammalia 975279 ( 12%) ( 2%)
Other Vertebrata 856288 ( 10%) ( 2%)
Viridiplantae 1675936 ( 20%) ( 4%)
Fungi 2000482 ( 24%) ( 5%)
Insecta 861317 ( 10%) ( 2%)
Nematoda 253797 ( 3%) ( 1%)
Other 1500429 ( 18%) ( 4%)
3. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 1152890 1001-1100 233810
51- 100 3819409 1101-1200 162232
101- 150 4263111 1201-1300 116662
151- 200 4132686 1301-1400 69927
201- 250 4172983 1401-1500 57952
251- 300 4045536 1501-1600 39581
301- 350 3662444 1601-1700 28778
351- 400 2738225 1701-1800 21648
401- 450 2379339 1801-1900 17548
451- 500 1948639 1901-2000 14786
501- 550 1246753 2001-2100 11923
551- 600 962056 2101-2200 12091
601- 650 702351 2201-2300 9309
651- 700 552722 2301-2400 7474
701- 750 459727 2401-2500 6605
751- 800 395879 >2500 50761
801- 850 308222
851- 900 275494
901- 950 189152
951-1000 133154
The average sequence length in UniProtKB/TrEMBL is 318 amino acids.
The shortest sequence is G0XMK1_9MYRT: 1 amino acids.
The longest sequence is Q3ASY8_CHLCH: 36805 amino acids.
4. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 50749561 1.19
Submitted to EMBL/GenBank/DDBJ 30016028 28188723 0.70
Journal 18960414 17938735 0.44
Submitted to other databases 1756014 1745086 0.04
Thesis 10355 10297 <0.01
Book citation 6749 6699 <0.01
Patent 1 1 <0.01
Total number of distinct authors cited in UniProtKB/TrEMBL: 476409
Total Number of Average
Line type / subtype number entries per entry Rank
--------------------------------- -------- --------- --------- ----
Comments (CC) 58956525 1.38
CATALYTIC ACTIVITY 4641945 4194295 0.11 4
CAUTION 24680360 24660034 0.58 1
COFACTOR 1867584 1733466 0.04 8
DOMAIN 197544 189952 <0.01 9
ENZYME REGULATION 55788 55788 <0.01 11
FUNCTION 5209978 4929003 0.12 3
INTERACTION 1262 1262 <0.01 12
MISCELLANEOUS 125316 125120 <0.01 10
PATHWAY 2337778 2126819 0.05 7
SIMILARITY 13013817 11329028 0.30 2
SUBCELLULAR LOCATION 4114352 3963753 0.10 5
SUBUNIT 2710801 2686049 0.06 6
Total number of comment topics: 12
Total Number of Average
Line type / subtype number entries per entry Rank
--------------------------------- -------- --------- --------- ----
Features (FT) 24345276 0.57
ACT_SITE 1735575 1067948 0.04 5
BINDING 3656439 955064 0.09 2
CARBOHYD 353 138 <0.01 28
CHAIN 867813 708743 0.02 8
COILED 64458 35295 <0.01 17
COMPBIAS 10895 10895 <0.01 22
CROSSLNK 10106 6751 <0.01 23
DISULFID 85239 65565 <0.01 15
DNA_BIND 51908 47617 <0.01 19
DOMAIN 667148 517796 0.02 10
INIT_MET 12646 12646 <0.01 21
INTRAMEM 385 55 <0.01 27
LIPID 63866 31933 <0.01 18
METAL 3472915 896396 0.08 3
MOD_RES 287205 258431 0.01 13
MOTIF 195807 118437 <0.01 14
NON_STD 1855 1676 <0.01 25
NON_TER 6880692 4421808 0.16 1
NP_BIND 1312418 785644 0.03 6
PEPTIDE 34 34 <0.01 29
PROPEP 4604 4604 <0.01 24
REGION 1118492 617220 0.03 7
REPEAT 50440 12156 <0.01 20
SIGNAL 703463 700186 0.02 9
SITE 397122 231130 0.01 11
TOPO_DOM 296514 60297 0.01 12
TRANSIT 1440 1440 <0.01 26
TRANSMEM 2328592 407596 0.05 4
ZN_FING 66852 60240 <0.01 16
Total number of feature keys: 29
Total Number of Average
Line type / subtype number entries per entry Rank Category
--------------------------------- -------- --------- --------- ---- -------------------------------------------
Cross-references (DR) 457803165 10.69
Allergome 3478 2845 <0.01 84 Protein family/group databases
ArachnoServer 66 66 <0.01 102 Organism-specific databases
ArrayExpress 185650 185650 <0.01 45 Gene expression databases
BRENDA 2642 2614 <0.01 86 Enzyme and pathway databases
Bgee 99589 99589 <0.01 51 Gene expression databases
BindingDB 5825 5825 <0.01 77 Other
BioCyc 5639949 5572624 0.13 18 Enzyme and pathway databases
CAZy 74011 69538 <0.01 55 Protein family/group databases
CGD 7033 7033 <0.01 76 Organism-specific databases
COMPLUYEAST-2DPAGE 4 4 <0.01 107 2D gel databases
CTD 351972 350648 0.01 38 Organism-specific databases
ChEMBL 606 606 <0.01 94 Other
ChiTaRS 65575 65575 <0.01 56 Other
ConoServer 160 160 <0.01 99 Organism-specific databases
DIP 2873 2868 <0.01 85 Protein-protein interaction databases
DNASU 42308 41974 <0.01 62 Protocols and materials databases
EMBL 46113664 41791436 1.08 3 Sequence databases
Ensembl 1014023 999501 0.02 29 Genome annotation databases
EnsemblBacteria 17859986 17586110 0.42 8 Genome annotation databases
EnsemblFungi 372634 370469 0.01 37 Genome annotation databases
EnsemblMetazoa 693448 677918 0.02 33 Genome annotation databases
EnsemblPlants 654086 620584 0.02 34 Genome annotation databases
EnsemblProtists 156283 153887 <0.01 47 Genome annotation databases
EuPathDB 147096 147094 <0.01 49 Organism-specific databases
EvolutionaryTrace 8045 8045 <0.01 74 Other
FlyBase 196093 194626 <0.01 43 Organism-specific databases
GO 73053695 23628003 1.71 2 Ontologies
Gene3D 18982525 14964234 0.44 7 Family and domain databases
GeneID 9877826 9621471 0.23 12 Genome annotation databases
GeneTree 900564 900506 0.02 31 Phylogenomic databases
Genevestigator 86308 86303 <0.01 52 Gene expression databases
GenoList 14732 14459 <0.01 71 Organism-specific databases
GenomeRNAi 19362 19362 <0.01 69 Other
Gramene 204041 204041 <0.01 42 Organism-specific databases
H-InvDB 611 464 <0.01 93 Organism-specific databases
HAMAP 4676484 4616658 0.11 20 Family and domain databases
HGNC 47308 47231 <0.01 59 Organism-specific databases
HOGENOM 3653902 3653857 0.09 23 Phylogenomic databases
HOVERGEN 305195 305184 0.01 39 Phylogenomic databases
IPI 279279 278387 0.01 40 Sequence databases
InParanoid 186428 186428 <0.01 44 Phylogenomic databases
IntAct 12340 12340 <0.01 72 Protein-protein interaction databases
InterPro 91953594 32259129 2.15 1 Family and domain databases
KEGG 8939612 8718349 0.21 14 Genome annotation databases
KO 3718574 3700849 0.09 22 Phylogenomic databases
LegioList 5138 5110 <0.01 79 Organism-specific databases
Leproma 1272 1270 <0.01 88 Organism-specific databases
MEROPS 138706 138705 <0.01 50 Protein family/group databases
MGI 52364 51864 <0.01 58 Organism-specific databases
MIM 4 4 <0.01 108 Organism-specific databases
MINT 10254 10253 <0.01 73 Protein-protein interaction databases
NextBio 208199 208187 <0.01 41 Other
OGP 3 3 <0.01 109 2D gel databases
OMA 4858165 4857948 0.11 19 Phylogenomic databases
OrthoDB 553137 553094 0.01 35 Phylogenomic databases
PANTHER 6134456 5761264 0.14 16 Family and domain databases
PATRIC 8281125 8281000 0.19 15 Genome annotation databases
PDB 20274 11224 <0.01 67 3D structure databases
PDBsum 19943 11001 <0.01 68 3D structure databases
PIR 172301 139477 <0.01 46 Sequence databases
PIRSF 3862148 3858028 0.09 21 Family and domain databases
PMAP-CutDB 209 209 <0.01 98 Other
PRIDE 931362 931362 0.02 30 Proteomic databases
PRINTS 6116639 5484225 0.14 17 Family and domain databases
PROSITE 20498431 13677251 0.48 5 Family and domain databases
PaxDb 29030 29028 <0.01 64 Proteomic databases
PeptideAtlas 129 129 <0.01 100 Proteomic databases
PeroxiBase 2595 2587 <0.01 87 Protein family/group databases
Pfam 41306136 30224533 0.96 4 Family and domain databases
PharmGKB 3572 3572 <0.01 83 Organism-specific databases
PhosSite 616 604 <0.01 92 PTM databases
PhosphoSite 1125 1125 <0.01 89 PTM databases
PhylomeDB 147513 147513 <0.01 48 Phylogenomic databases
PomBase 40 27 <0.01 103 Organism-specific databases
PptaseDB 36 35 <0.01 104 Protein family/group databases
ProDom 825962 796392 0.02 32 Family and domain databases
ProMEX 5387 5387 <0.01 78 Proteomic databases
ProtClustDB 2719510 2719499 0.06 26 Phylogenomic databases
ProteinModelPortal 10532143 10532143 0.25 9 3D structure databases
PseudoCAP 4533 4527 <0.01 80 Organism-specific databases
REBASE 40133 40127 <0.01 63 Protein family/group databases
REPRODUCTION-2DPAGE 66 65 <0.01 101 2D gel databases
RGD 21119 20290 <0.01 66 Organism-specific databases
Reactome 228 174 <0.01 97 Enzyme and pathway databases
RefSeq 9916615 9627663 0.23 11 Sequence databases
SABIO-RK 480 480 <0.01 95 Enzyme and pathway databases
SGD 11 11 <0.01 106 Organism-specific databases
SMART 9043980 6868802 0.21 13 Family and domain databases
SMR 2600631 2600631 0.06 27 3D structure databases
STRING 2903825 2903756 0.07 24 Protein-protein interaction databases
SUPFAM 19135150 15447562 0.45 6 Family and domain databases
SWISS-2DPAGE 28 28 <0.01 105 2D gel databases
SignaLink 4399 4397 <0.01 82 Enzyme and pathway databases
TAIR 15152 15079 <0.01 70 Organism-specific databases
TCDB 4463 4455 <0.01 81 Protein family/group databases
TIGRFAMs 10136144 9247943 0.24 10 Family and domain databases
TubercuList 1101 1100 <0.01 90 Organism-specific databases
UCSC 59397 59234 <0.01 57 Genome annotation databases
UniGene 551026 521416 0.01 36 Sequence databases
UniPathway 2272596 2114725 0.05 28 Enzyme and pathway databases
VectorBase 78249 77732 <0.01 53 Genome annotation databases
World-2DPAGE 673 668 <0.01 91 2D gel databases
WormBase 42521 42348 <0.01 61 Organism-specific databases
Xenbase 25592 25514 <0.01 65 Organism-specific databases
ZFIN 45721 45153 <0.01 60 Organism-specific databases
dictyBase 7996 7774 <0.01 75 Organism-specific databases
eggNOG 2768244 2768224 0.06 25 Phylogenomic databases
euHCVdb 75267 75264 <0.01 54 Organism-specific databases
mycoCLAP 422 422 <0.01 96 Protein family/group databases
Number of explicitly cross-referenced databases: 127
5. AMINO ACID COMPOSITION
5.1 Composition in percent for the complete database
Ala (A) 8.66 Gln (Q) 3.99 Leu (L) 9.95 Ser (S) 6.54
Arg (R) 5.36 Glu (E) 6.22 Lys (K) 5.32 Thr (T) 5.55
Asn (N) 4.11 Gly (G) 7.09 Met (M) 2.49 Trp (W) 1.29
Asp (D) 5.33 His (H) 2.19 Phe (F) 4.05 Tyr (Y) 3.07
Cys (C) 1.20 Ile (I) 6.09 Pro (P) 4.57 Val (V) 6.80
Asx (B) 0.000 Glx (Z) 0 Xaa (X) 0.02
Legend: gray = aliphatic, red = acidic, green = small hydroxy,
blue = basic, black = aromatic, white = amide, yellow = sulfur
5.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
Gln, Tyr, Met, His, Trp, Cys
6. MISCELLANEOUS STATISTICS
Total number of entries encoded on a Mitochondrion: 643390
Total number of entries encoded on a Plasmid: 350611
Total number of entries encoded on a Plastid: 26952
Total number of entries encoded on a Plastid; Apicoplast: 750
Total number of entries encoded on a Plastid; Chloroplast: 236998
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1059