Current Release Statistics


         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2015_01 STATISTICS


1.  INTRODUCTION

Release 2015_01 of 07-Jan-2015 of UniProtKB/TrEMBL contains 89451166 sequence entries,
comprising 28391936394 amino acids.

909315 sequences have been added since release 2014_11, the sequence data of
6347 existing entries has been updated and the annotations of
51070108 entries have been revised. This represents an increase of 1%.

Number of fragments: 6843159

Protein existence (PE):              entries      %
1: Evidence at protein level           44453     0.05%
2: Evidence at transcript level      1024441     1.15%
3: Inferred from homology           22185486    24.80%
4: Predicted                        66196786    74.00%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.
image



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 532080

   The first twenty species represent 2602570 sequences:   2.9 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:21189
                            2x:85352
                            3x:46141
                            4x:33082
                            5x:19307
                            6x:14126
                            7x:10089
                            8x: 8113
                            9x: 6478
                           10x:11467
                       11- 20x:41988
                       21- 50x:13524
                       51-100x: 5166
                         >100x:25349


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     614696  Human immunodeficiency virus 1
       2     352020  marine sediment metagenome
       3     240563  uncultured bacterium
       4     121941  Homo sapiens (Human)
       5     103787  Triticum aestivum (Wheat)
       6     100521  Brassica napus (Rape)
       7      98416  Hepatitis C virus
       8      96629  Oryza sativa subsp. japonica (Rice)
       9      90837  Hepatitis B virus (HBV)
      10      89256  Escherichia coli
      11      84382  Zea mays (Maize)
      12      74043  Glycine max (Soybean) (Glycine hispida)
      13      73055  mine drainage metagenome
      14      70545  Hordeum vulgare var. distichum (Domesticated barley)
      15      69648  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      16      69573  Macaca mulatta (Rhesus macaque)
      17      67671  Phytophthora parasitica (Potato buckeye rot agent)
      18      65421  Ancylostoma ceylanicum
      19      60710  human gut metagenome
      20      58856  Burkholderia pseudomallei (Pseudomonas pseudomallei)
      21      58708  Mus musculus (Mouse)
      22      55066  Solanum tuberosum (Potato)
      23      55042  Callithrix jacchus (White-tufted-ear marmoset)
      24      54260  Vitis vinifera (Grape)
      25      53424  Danio rerio (Zebrafish) (Brachydanio rerio)
      26      50656  Trichomonas vaginalis
      27      49743  Oncorhynchus mykiss (Rainbow trout) (Salmo gairdneri)
      28      49274  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      29      48917  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      30      48124  Vibrio parahaemolyticus
      31      47144  Populus trichocarpa (Western balsam poplar) 
      32      44386  Citrus sinensis (Sweet orange) (Citrus aurantium var. sinensis)
      33      44332  Eucalyptus grandis (Flooded gum)
      34      41211  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      35      40931  Theobroma cacao (Cacao) (Cocoa)
      36      39923  Reticulomyxa filosa
      37      39905  Oryza sativa subsp. indica (Rice)
      38      39848  Paramecium tetraurelia
      39      39456  Setaria italica (Foxtail millet) (Panicum italicum)
      40      39297  Simian immunodeficiency virus (SIV)
      41      39199  Arabidopsis thaliana (Mouse-ear cress)
      42      38814  Mustela putorius furo (European domestic ferret) (Mustela furo)
      43      37312  Acyrthosiphon pisum (Pea aphid)
      44      37294  Drosophila melanogaster (Fruit fly)
      45      36609  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      46      36038  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      47      35743  Klebsiella pneumoniae
      48      35673  Ailuropoda melanoleuca (Giant panda)
      49      35599  Emiliania huxleyi CCMP1516
      50      35327  Physcomitrella patens subsp. patens (Moss)
      51      35138  Caenorhabditis japonica
      52      34853  Pseudomonas aeruginosa
      53      34718  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      54      34570  Thalassiosira oceanica (Marine diatom)
      55      34566  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      56      33942  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      57      33752  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      58      33736  Staphylococcus aureus
      59      33261  Selaginella moellendorffii (Spikemoss)
      60      32888  Vibrio cholerae
      61      32772  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      62      32567  Sus scrofa (Pig)
      63      32468  Phaseolus vulgaris (Kidney bean) (French bean)
      64      32342  Oryza brachyantha
      65      32206  Oryza glaberrima (African rice)
      66      32194  Caenorhabditis remanei (Caenorhabditis vulgaris)
      67      32101  Capitella teleta (Polychaete worm)
      68      32014  Anas platyrhynchos (Domestic duck) (Anas boschas)
      69      31899  Pan troglodytes (Chimpanzee)
      70      31463  Ricinus communis (Castor bean)
      71      31348  Citrus clementina
      72      30981  Daphnia pulex (Water flea)
      73      30713  Caenorhabditis brenneri (Nematode worm)
      74      30501  Poecilia formosa (Amazon molly) (Limia formosa)
      75      30239  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      76      29845  Rhizophagus irregularis (strain DAOM 181602 / DAOM 197198 / MUCL 43194)  
      77      29815  Amphimedon queenslandica (Sponge)
      78      29494  Strongylocentrotus purpuratus (Purple sea urchin)
      79      29334  Pristionchus pacificus (Parasitic nematode)
      80      29205  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      81      29083  Oikopleura dioica (Tunicate)
      82      28941  Erythranthe guttata (Yellow monkey flower) (Mimulus guttatus)
      83      28885  Capsella rubella
      84      28842  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      85      28702  Prunus persica (Peach) (Amygdalus persica)
      86      28669  Rhizophagus irregularis DAOM 197198w
      87      28382  Eutrema salsugineum (Saltwater cress) (Sisymbrium salsugineum)
      88      28197  Gasterosteus aculeatus (Three-spined stickleback)
      89      28034  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      90      27818  Canis familiaris (Dog) (Canis lupus familiaris)
      91      27568  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      92      27557  Equus caballus (Horse)
      93      27540  Jatropha curcas (Barbados nut)
      94      27454  Amborella trichopoda
      95      27100  Gorilla gorilla gorilla (Lowland gorilla)
      96      27017  Stegodyphus mimosarum
      97      26921  Tetrahymena thermophila (strain SB210)
      98      26861  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      99      26771  Morus notabilis
     100      26517  Phytophthora parasitica P1976
     101      26489  Phytophthora parasitica CJ01A1
     102      26477  Phytophthora parasitica P1569
     103      26452  Phytophthora parasitica P10297
     104      26438  Phytophthora parasitica (strain INRA-310)
     105      26428  Ovis aries (Sheep)
     106      26235  Rattus norvegicus (Rat)
     107      26073  Listeria monocytogenes
     108      26003  Oryzias latipes (Medaka fish) (Japanese ricefish)
     109      25869  Bos taurus (Bovine)
     110      25832  Loxodonta africana (African elephant)
     111      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
     112      25599  Coffea canephora (Robusta coffee)
     113      25025  Aphanomyces astaci
     114      24920  Nematostella vectensis (Starlet sea anemone)
     115      24590  Guillardia theta CCMP2712
     116      24476  Cucumis sativus (Cucumber)
     117      24375  Oxytricha trifallax
     118      24301  Tetraselmis sp. GSL018
     119      24167  Tolypothrix bouteillei Iicb1
     120      23809  Astyanax mexicanus (Blind cave fish) (Astyanax fasciatus mexicanus)
     121      23743  Ornithorhynchus anatinus (Duckbill platypus)
     122      23687  Lottia gigantea (Giant owl limpet)
     123      23651  Dendroctonus ponderosae (Mountain pine beetle)
     124      23533  Bacillus subtilis
     125      23528  Caenorhabditis elegans
     126      23497  Latimeria chalumnae (West Indian ocean coelacanth)
     127      23382  Helobdella robusta (Californian leech)
     128      23373  Arabis alpina (Alpine rock-cress)
     129      23318  Fusarium oxysporum f. sp. melonis 26406
     130      23271  Fusarium oxysporum f. sp. conglutinans race 2 54008
     131      23263  Fusarium oxysporum f. sp. pisi HDV247
     132      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
     133      22812  Monodelphis domestica (Gray short-tailed opossum)
     134      22754  Fusarium oxysporum f. sp. raphani 54005
     135      22569  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
     136      22528  Lepisosteus oculatus (Spotted gar)
     137      22396  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
     138      22248  Fusarium oxysporum f. sp. vasinfectum 25433
     139      22174  gut metagenome
     140      21972  Trichuris suis (pig whipworm)
     141      21957  Comamonas testosteroni (Pseudomonas testosteroni)
     142      21939  Oryctolagus cuniculus (Rabbit)
     143      21754  Haemonchus contortus (Barber pole worm)
     144      21689  Fusarium oxysporum f. sp. radicis-lycopersici 26381
     145      21661  Fusarium oxysporum Fo47
     146      21655  Gallus gallus (Chicken)
     147      21549  Fusarium oxysporum f. sp. lycopersici MN25
     148      21547  Heterocephalus glaber (Naked mole rat)
     149      21506  Burkholderia cepacia (Pseudomonas cepacia)
     150      21460  Gibberella zeae (Wheat head blight fungus) (Fusarium graminearum)
     151      21415  Papio anubis (Olive baboon)
     152      21398  Caenorhabditis briggsae
     153      21357  Galerina marginata CBS 339.88
     154      21301  Echinococcus granulosus (Hydatid tapeworm)
     155      21210  Ixodes scapularis (Black-legged tick) (Deer tick)
     156      21173  Myotis lucifugus (Little brown bat)
     157      21101  Felis catus (Cat) (Felis silvestris catus)
     158      20867  Tupaia chinensis (Chinese tree shrew)
     159      20808  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     160      20768  Stylonychia lemnae
     161      20767  Fusarium oxysporum FOSC 3-a
     162      20541  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     163      20465  Fukomys damarensis (Damaraland mole rat) (Cryptomys damarensis)
     164      20313  Helicobacter pylori (Campylobacter pylori)
     165      20267  Acinetobacter baumannii
     166      20168  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     167      20115  Ciona savignyi (Pacific transparent sea squirt)
     168      20106  Cavia porcellus (Guinea pig)
     169      20062  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     170      20052  Saprolegnia parasitica (strain CBS 223.65)
     171      20028  Camelus ferus (Wild Bactrian camel)
     172      19998  Callorhynchus milii (Elephant fish) (Australian ghost shark)
     173      19945  Burkholderia cenocepacia
     174      19837  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     175      19807  Fusarium oxysporum f. sp. cubense tropical race 4 54006
     176      19705  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     177      19687  Mesorhizobium plurifarium
     178      19657  Bactrocera dorsalis (Oriental fruit fly) (Dacus dorsalis)
     179      19626  Anolis carolinensis (Green anole) (American chameleon)
     180      19619  Brugia malayi (Filarial nematode worm)
     181      19594  Aphanomyces invadans
     182      19562  Pteropus alecto (Black flying fox)
     183      19523  Wuchereria bancrofti
     184      19428  Anopheles sinensis
     185      19300  Myotis brandtii (Brandt's bat)
     186      19235  uncultured archaeon
     187      19200  Trypanosoma cruzi (strain CL Brener)
     188      19196  Necator americanus (Human hookworm)
     189      19142  Mycobacterium tuberculosis
     190      19112  Ixodes ricinus (Common tick)
     191      19064  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     192      19017  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     193      18924  Drosophila simulans (Fruit fly)
     194      18600  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     195      18574  Bos mutus
     196      18488  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     197      18485  Vibrio vulnificus
     198      18417  Ophiophagus hannah (King cobra) (Naja hannah)
     199      18407  Plasmodium falciparum
     200      18373  Nonlabens ulvanivorans
     201      18294  Tetranychus urticae (Two-spotted spider mite)
     202      18125  Atta cephalotes (Leafcutter ant)
     203      18115  Hepatitis C virus subtype 1b
     204      18053  Anopheles gambiae (African malaria mosquito)
     205      18047  Saprolegnia diclina VS20
     206      17976  Moniliophthora roreri (strain MCA 2997) (Cocoa frosty pod rot fungus) 
     207      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     208      17797  Bombyx mori (Silk moth)
     209      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     210      17683  Genlisea aurea
     211      17621  Bacillus cereus
     212      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     213      17589  Gibberella moniliformis (strain M3125 / FGSC 7600)  
     214      17496  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     215      17399  Rhizobium radiobacter (Agrobacterium tumefaciens) (Agrobacterium radiobacter)
     216      17384  Ceratitis capitata (Mediterranean fruit fly) (Tephritis capitata)
     217      17289  Nasonia vitripennis (Parasitic wasp)
     218      17107  Drosophila yakuba (Fruit fly)
     219      17090  Tribolium castaneum (Red flour beetle)
     220      16949  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     221      16933  Meleagris gallopavo (Common turkey)
     222      16723  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     223      16715  Drosophila persimilis (Fruit fly)
     224      16676  Enterobacter agglomerans (Erwinia herbicola) (Pantoea agglomerans)
     225      16637  Fusarium oxysporum f. sp. lycopersici  
     226      16620  Rhodnius prolixus (Triatomid bug)
     227      16534  Cerapachys biroi (Ant)
     228      16484  Botryobasidium botryosum FD-172 SS1
     229      16453  Apis mellifera (Honeybee)
     230      16447  Ectocarpus siliculosus (Brown alga)
     231      16437  Streptococcus mitis
     232      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     233      16372  Opisthorchis viverrini
     234      16341  Jaapia argillacea MUCL 33604
     235      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     236      16335  Burkholderia mallei (Pseudomonas mallei)
     237      16332  Danaus plexippus (Monarch butterfly)
     238      16282  Trichinella spiralis (Trichina worm)
     239      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     240      16226  Neovison vison (American mink) (Mustela vison)
     241      16223  Schistosoma japonicum (Blood fluke)
     242      16207  Ralstonia solanacearum (Pseudomonas solanacearum)
     243      16195  Streptomyces scabiei
     244      16185  Drosophila sechellia (Fruit fly)
     245      16164  Pectobacterium carotovorum subsp. brasiliense
     246      16149  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     247      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     248      16046  Cedecea neteri
     249      15933  Rabies virus
     250      15871  Bacillus mycoides


   
   2.3  Taxonomic distribution of the sequences

image

   Kingdom        sequences (% of the database)
    Archaea          888257 (  1%)
    Bacteria       73062005 ( 82%)
    Eukaryota      12775496 ( 14%)
    Viruses         2171639 (  2%)
    Other            553768 ( <1%)



   Within Eukaryota:

image

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 121996 (  1%)           (  0%)
     Other Mammalia       1140200 (  9%)           (  1%)
     Other Vertebrata     1537814 ( 12%)           (  2%)
     Viridiplantae        2483042 ( 19%)           (  3%)
     Fungi                3441534 ( 27%)           (  4%)
     Insecta              1130250 (  9%)           (  1%)
     Nematoda              436275 (  3%)           (  0%)
     Other                2484385 ( 19%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 2015188             1001-1100   447450
                 51- 100 8101501             1101-1200   324210
                101- 150 9437038             1201-1300   231784
                151- 200 8911965             1301-1400   131937
                201- 250 9148354             1401-1500   115102
                251- 300 8971666             1501-1600    75380
                301- 350 8088314             1601-1700    58305
                351- 400 5991853             1701-1800    36663
                401- 450 5231649             1801-1900    31850
                451- 500 4230204             1901-2000    24644
                501- 550 2690685             2001-2100    24293
                551- 600 2046288             2101-2200    32372
                601- 650 1450022             2201-2300    18702
                651- 700 1162875             2301-2400    15459
                701- 750  900374             2401-2500    13676
                751- 800  769493             >2500        95029
                801- 850  599684
                851- 900  547902
                901- 950  375263
                951-1000  260833

image


   The average sequence length in UniProtKB/TrEMBL is   317 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                   102391932                1.14                                                    
   Submitted to EMBL/GenBank/DDBJ  71917478  68108214      0.80                                                    
   Journal                         28205775  26688495      0.32                                                    
   Submitted to other databases     2240413   2232687      0.03                                                    
   Thesis                             18904     18845     <0.01                                                    
   Book citation                       9361      9298     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 537698


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                     154140590                1.72                                                    
   CATALYTIC ACTIVITY              11280170  10383742      0.13     4                                              
   CAUTION                         62979682  62927285      0.70     1                                              
   COFACTOR                         7450856   4762434      0.08     6                                              
   DOMAIN                            593279    570575      0.01     9                                              
   ENZYME REGULATION                 218047    218047     <0.01    11                                              
   FUNCTION                        13223846  12549308      0.15     3                                              
   INTERACTION                         1795      1795     <0.01    12                                              
   MISCELLANEOUS                     342919    342678     <0.01    10                                              
   PATHWAY                          5139863   4615532      0.06     8                                              
   SIMILARITY                      35156520  27237032      0.39     2                                              
   SUBCELLULAR LOCATION            10543721  10179910      0.12     5                                              
   SUBUNIT                          7209892   7126020      0.08     7                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      58609698                0.66                                                    
   ACT_SITE                         5057572   3159410      0.06     5                                              
   BINDING                         10954472   2805217      0.12     1                                              
   CARBOHYD                             312       130     <0.01    28                                              
   CHAIN                             907785    717261      0.01    10                                              
   COILED                            197285    112002     <0.01    16                                              
   COMPBIAS                           30612     30442     <0.01    21                                              
   CROSSLNK                           30470     21662     <0.01    22                                              
   DISULFID                          227672    169531     <0.01    15                                              
   DNA_BIND                          169137    158655     <0.01    17                                              
   DOMAIN                           2082668   1668338      0.02     8                                              
   INIT_MET                           30175     30175     <0.01    23                                              
   INTRAMEM                             406        58     <0.01    27                                              
   LIPID                             158436     79218     <0.01    19                                              
   METAL                            9981750   2627800      0.11     3                                              
   MOD_RES                           774306    717101      0.01    12                                              
   MOTIF                             609216    392529      0.01    14                                              
   NON_STD                             2081      1939     <0.01    26                                              
   NON_TER                         10265866   6847893      0.11     2                                              
   NP_BIND                          4134086   2453402      0.05     6                                              
   PEPTIDE                              130       130     <0.01    29                                              
   PROPEP                              9842      9842     <0.01    24                                              
   REGION                           3458509   1879006      0.04     7                                              
   REPEAT                            132993     30863     <0.01    20                                              
   SIGNAL                            859233    854394      0.01    11                                              
   SITE                             1466155    743040      0.02     9                                              
   TOPO_DOM                          702619    145927      0.01    13                                              
   TRANSIT                             2324      2312     <0.01    25                                              
   TRANSMEM                         6195948   1102240      0.07     4                                              
   ZN_FING                           167638    150115     <0.01    18                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             990742570               11.08                                                    
   Allergome                           3806      3142     <0.01    84   Protein family/group databases             
   ArachnoServer                         99        99     <0.01   104   Organism-specific databases                
   BRENDA                              2560      2533     <0.01    90   Enzyme and pathway databases               
   Bgee                               94363     94363     <0.01    52   Gene expression databases                  
   BindingDB                          90304     90304     <0.01    53   Chemistry                                  
   BioCyc                           5767238   5689821      0.06    23   Enzyme and pathway databases               
   CAZy                               73738     69290     <0.01    58   Protein family/group databases             
   CGD                                 6743      6743     <0.01    79   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   111   2D gel databases                           
   CTD                               467204    465976      0.01    40   Organism-specific databases                
   ChEMBL                               784       784     <0.01    95   Chemistry                                  
   ChiTaRS                            87421     87261     <0.01    54   Other                                      
   ConoServer                           159       159     <0.01   101   Organism-specific databases                
   DIP                                 3132      3127     <0.01    87   Protein-protein interaction databases      
   DNASU                              41829     41503     <0.01    65   Protocols and materials databases          
   DrugBank                             145        57     <0.01   102   Chemistry                                  
   EMBL                            96290820  88238352      1.08     3   Sequence databases                         
   Ensembl                          1164118   1148619      0.01    32   Genome annotation databases                
   EnsemblBacteria                 40546483  39897969      0.45     8   Genome annotation databases                
   EnsemblFungi                      467819    465297      0.01    39   Genome annotation databases                
   EnsemblMetazoa                    917474    901190      0.01    34   Genome annotation databases                
   EnsemblPlants                     845191    804530      0.01    36   Genome annotation databases                
   EnsemblProtists                   191003    188509     <0.01    49   Genome annotation databases                
   EuPathDB                          161147    161146     <0.01    51   Organism-specific databases                
   EvolutionaryTrace                   7861      7861     <0.01    78   Other                                      
   ExpressionAtlas                   244882    244882     <0.01    43   Gene expression databases                  
   FlyBase                           199513    198038     <0.01    47   Organism-specific databases                
   GO                             160330172  56064150      1.79     2   Ontologies                                 
   Gene3D                          54849804  42798349      0.61     5   Family and domain databases                
   GeneID                          11577618  11298732      0.13    17   Genome annotation databases                
   GeneTree                         1117226   1117185      0.01    33   Phylogenomic databases                     
   Genevestigator                     81851     81847     <0.01    55   Gene expression databases                  
   GenoList                           14727     14454     <0.01    75   Organism-specific databases                
   GenomeRNAi                         23400     23400     <0.01    71   Other                                      
   Gramene                           194393    194393     <0.01    48   Organism-specific databases                
   GuidetoPHARMACOLOGY                   21        21     <0.01   108   Chemistry                                  
   H-InvDB                              595       448     <0.01    97   Organism-specific databases                
   HAMAP                           12212463  12040050      0.14    16   Family and domain databases                
   HGNC                               47036     46959     <0.01    63   Organism-specific databases                
   HOGENOM                          3641163   3641117      0.04    26   Phylogenomic databases                     
   HOVERGEN                          302244    302235     <0.01    42   Phylogenomic databases                     
   InParanoid                       2681089   2681089      0.03    29   Phylogenomic databases                     
   IntAct                             20064     20064     <0.01    74   Protein-protein interaction databases      
   InterPro                       212169560  72081434      2.37     1   Family and domain databases                
   KEGG                            10729218  10502638      0.12    18   Genome annotation databases                
   KO                               4582245   4559154      0.05    25   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    81   Organism-specific databases                
   Leproma                             1272      1270     <0.01    91   Organism-specific databases                
   MEROPS                            225740    225739     <0.01    44   Protein family/group databases             
   MGI                                53354     53016     <0.01    60   Organism-specific databases                
   MIM                                    4         4     <0.01   112   Organism-specific databases                
   MINT                               10088     10087     <0.01    76   Protein-protein interaction databases      
   MaxQB                               2566      2566     <0.01    89   Proteomic databases                        
   MoonProt                               8         8     <0.01   109   Protein family/group databases             
   NextBio                           200330    200330     <0.01    46   Other                                      
   OGP                                    3         3     <0.01   113   2D gel databases                           
   OMA                              8492691   8492691      0.09    21   Phylogenomic databases                     
   OrthoDB                          5177843   5177840      0.06    24   Phylogenomic databases                     
   PANTHER                         12312745  11900230      0.14    15   Family and domain databases                
   PATRIC                           8243980   8243783      0.09    22   Genome annotation databases                
   PDB                                25559     13582     <0.01    68   3D structure databases                     
   PDBsum                             25342     13454     <0.01    69   3D structure databases                     
   PIR                               171014    138183     <0.01    50   Sequence databases                         
   PIRSF                            9599893   9522616      0.11    19   Family and domain databases                
   PMAP-CutDB                           199       199     <0.01   100   Other                                      
   PRIDE                             914970    914970      0.01    35   Proteomic databases                        
   PRINTS                          12699303  11425154      0.14    14   Family and domain databases                
   PRO                                26813     26812     <0.01    67   Other                                      
   PROSITE                         44232611  29962255      0.49     7   Family and domain databases                
   PaxDb                              28286     28284     <0.01    66   Proteomic databases                        
   PeptideAtlas                         127       127     <0.01   103   Proteomic databases                        
   PeroxiBase                          2583      2575     <0.01    88   Protein family/group databases             
   Pfam                            92203619  67193379      1.03     4   Family and domain databases                
   PharmGKB                            3196      3196     <0.01    86   Organism-specific databases                
   PhosSite                             888       876     <0.01    94   PTM databases                              
   PhosphoSite                         1078      1078     <0.01    92   PTM databases                              
   PhylomeDB                         445354    445354     <0.01    41   Phylogenomic databases                     
   PomBase                                2         2     <0.01   114   Organism-specific databases                
   PptaseDB                              38        36     <0.01   106   Protein family/group databases             
   ProDom                           1740641   1698908      0.02    30   Family and domain databases                
   ProMEX                              3505      3505     <0.01    85   Proteomic databases                        
   ProteinModelPortal              21913822  21913822      0.24    10   3D structure databases                     
   Proteomes                       18735959  18501974      0.21    12   Other                                      
   PseudoCAP                           4496      4490     <0.01    82   Organism-specific databases                
   REBASE                             48307     48307     <0.01    61   Protein family/group databases             
   REPRODUCTION-2DPAGE                   65        64     <0.01   105   2D gel databases                           
   RGD                                21960     20835     <0.01    72   Organism-specific databases                
   Reactome                          210372     73550     <0.01    45   Enzyme and pathway databases               
   RefSeq                          17663787  14244462      0.20    13   Sequence databases                         
   SABIO-RK                             507       507     <0.01    98   Enzyme and pathway databases               
   SGD                                    7         7     <0.01   110   Organism-specific databases                
   SMART                           18983008  14490310      0.21    11   Family and domain databases                
   SMR                              8624781   8624781      0.10    20   3D structure databases                     
   STRING                           3127546   3127372      0.03    27   Protein-protein interaction databases      
   SUPFAM                          51991254  41862892      0.58     6   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   107   2D gel databases                           
   SignaLink                           4106      4101     <0.01    83   Enzyme and pathway databases               
   TAIR                               21090     20972     <0.01    73   Organism-specific databases                
   TCDB                                6312      6303     <0.01    80   Protein family/group databases             
   TIGRFAMs                        24746439  22581404      0.28     9   Family and domain databases                
   TreeFam                           587469    587467      0.01    37   Phylogenomic databases                     
   TubercuList                         1072      1062     <0.01    93   Organism-specific databases                
   UCSC                               56340     56136     <0.01    59   Genome annotation databases                
   UniGene                           558011    520015      0.01    38   Sequence databases                         
   UniPathway                       1306606   1123188      0.01    31   Enzyme and pathway databases               
   VectorBase                         78241     77724     <0.01    56   Genome annotation databases                
   World-2DPAGE                         671       666     <0.01    96   2D gel databases                           
   WormBase                           43350     43019     <0.01    64   Organism-specific databases                
   Xenbase                            25019     24961     <0.01    70   Organism-specific databases                
   ZFIN                               47400     47319     <0.01    62   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    77   Organism-specific databases                
   eggNOG                           2749360   2749325      0.03    28   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    57   Organism-specific databases                
   mycoCLAP                             410       410     <0.01    99   Protein family/group databases             

Number of explicitly cross-referenced databases: 135


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.94   Gln (Q) 3.99   Leu (L) 9.94   Ser (S) 6.37
   Arg (R) 5.37   Glu (E) 6.08   Lys (K) 5.20   Thr (T) 5.56
   Asn (N) 4.12   Gly (G) 7.21   Met (M) 2.47   Trp (W) 1.26
   Asp (D) 5.43   His (H) 2.21   Phe (F) 3.98   Tyr (Y) 3.05
   Cys (C) 1.10   Ile (I) 6.16   Pro (P) 4.52   Val (V) 6.93

   Asx (B) 0      Glx (Z) 0      Xaa (X) 0.01

image

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Ile, Glu, Thr, Asp, Arg, Lys, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 813562
Total number of entries encoded on a Plasmid: 479842
Total number of entries encoded on a Plastid: 39814
Total number of entries encoded on a Plastid; Apicoplast: 
Total number of entries encoded on a Plastid; Chloroplast: 63
Total number of entries encoded on a Plastid; Cyanelle: 
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: