Current Release Statistics


         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2015_03 STATISTICS


1.  INTRODUCTION

Release 2015_03 of 04-Mar-2015 of UniProtKB/TrEMBL contains 92124243 sequence entries,
comprising 29223634881 amino acids.

1368401 sequences have been added since release 2015_02, the sequence data of
5063 existing entries has been updated and the annotations of
20973274 entries have been revised. This represents an increase of 1%.

Number of fragments: 7021669

Protein existence (PE):              entries      %
1: Evidence at protein level           62058     0.07%
2: Evidence at transcript level      1030699     1.12%
3: Inferred from homology           22704365    24.65%
4: Predicted                        68327121    74.17%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.
image



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 543753

   The first twenty species represent 2878236 sequences:   3.1 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:21693
                            2x:87015
                            3x:47192
                            4x:33727
                            5x:19716
                            6x:14462
                            7x:10395
                            8x: 8288
                            9x: 6634
                           10x:11616
                       11- 20x:42731
                       21- 50x:13862
                       51-100x: 5326
                         >100x:25856


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     620109  Human immunodeficiency virus 1
       2     352020  marine sediment metagenome
       3     259919  Arundo donax (Giant reed) (Donax arundinaceus)
       4     244215  uncultured bacterium
       5     152527  Escherichia coli
       6     124783  Homo sapiens (Human)
       7     100566  Brassica napus (Rape)
       8     100405  Hepatitis C virus
       9      99058  Triticum aestivum (Wheat)
      10      96567  Oryza sativa subsp. japonica (Rice)
      11      92744  Hepatitis B virus (HBV)
      12      84634  Zea mays (Maize)
      13      74050  Glycine max (Soybean) (Glycine hispida)
      14      73055  mine drainage metagenome
      15      70545  Hordeum vulgare var. distichum (Domesticated barley)
      16      69648  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      17      69589  Macaca mulatta (Rhesus macaque)
      18      67671  Phytophthora parasitica (Potato buckeye rot agent)
      19      65421  Ancylostoma ceylanicum
      20      60710  human gut metagenome
      21      60520  Burkholderia pseudomallei (Pseudomonas pseudomallei)
      22      59339  Mus musculus (Mouse)
      23      58869  Vibrio parahaemolyticus
      24      55070  Solanum tuberosum (Potato)
      25      55045  Callithrix jacchus (White-tufted-ear marmoset)
      26      54260  Vitis vinifera (Grape)
      27      53422  Danio rerio (Zebrafish) (Brachydanio rerio)
      28      51249  Klebsiella pneumoniae
      29      50656  Trichomonas vaginalis
      30      50142  Glycine soja (Wild soybean)
      31      49755  Oncorhynchus mykiss (Rainbow trout) (Salmo gairdneri)
      32      49281  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      33      48934  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      34      47498  Enterobacter cloacae
      35      47214  Populus trichocarpa (Western balsam poplar) 
      36      44386  Citrus sinensis (Sweet orange) (Citrus aurantium var. sinensis)
      37      44332  Eucalyptus grandis (Flooded gum)
      38      41211  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      39      40931  Theobroma cacao (Cacao) (Cocoa)
      40      40510  Simian immunodeficiency virus (SIV)
      41      39923  Reticulomyxa filosa
      42      39900  Oryza sativa subsp. indica (Rice)
      43      39848  Paramecium tetraurelia
      44      39766  Pseudomonas aeruginosa
      45      39459  Setaria italica (Foxtail millet) (Panicum italicum)
      46      38816  Mustela putorius furo (European domestic ferret) (Mustela furo)
      47      38804  Arabidopsis thaliana (Mouse-ear cress)
      48      37321  Acyrthosiphon pisum (Pea aphid)
      49      37297  Drosophila melanogaster (Fruit fly)
      50      36609  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      51      36558  Bacillus subtilis
      52      36034  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      53      35673  Ailuropoda melanoleuca (Giant panda)
      54      35599  Emiliania huxleyi CCMP1516
      55      35327  Physcomitrella patens subsp. patens (Moss)
      56      35138  Caenorhabditis japonica
      57      34721  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      58      34575  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      59      34570  Thalassiosira oceanica (Marine diatom)
      60      34022  Staphylococcus aureus
      61      33952  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      62      33767  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      63      33290  Vibrio cholerae
      64      33261  Selaginella moellendorffii (Spikemoss)
      65      32772  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      66      32632  Sus scrofa (Pig)
      67      32473  Phaseolus vulgaris (Kidney bean) (French bean)
      68      32342  Oryza brachyantha
      69      32206  Oryza glaberrima (African rice)
      70      32194  Caenorhabditis remanei (Caenorhabditis vulgaris)
      71      32101  Capitella teleta (Polychaete worm)
      72      32017  Anas platyrhynchos (Mallard) (Anas boschas)
      73      31919  Helicobacter pylori (Campylobacter pylori)
      74      31904  Pan troglodytes (Chimpanzee)
      75      31805  Lygus hesperus (Western plant bug)
      76      31680  Gossypium arboreum (Tree cotton) (Gossypium nanking)
      77      31463  Ricinus communis (Castor bean)
      78      31348  Citrus clementina
      79      30981  Daphnia pulex (Water flea)
      80      30713  Caenorhabditis brenneri (Nematode worm)
      81      30712  Acinetobacter baumannii
      82      30501  Poecilia formosa (Amazon molly) (Limia formosa)
      83      30240  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      84      29845  Rhizophagus irregularis (strain DAOM 181602 / DAOM 197198 / MUCL 43194)  
      85      29815  Amphimedon queenslandica (Sponge)
      86      29518  Strongylocentrotus purpuratus (Purple sea urchin)
      87      29334  Pristionchus pacificus (Parasitic nematode)
      88      29205  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      89      29083  Oikopleura dioica (Tunicate)
      90      28942  Erythranthe guttata (Yellow monkey flower) (Mimulus guttatus)
      91      28885  Capsella rubella
      92      28846  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      93      28797  Salmonella enterica I
      94      28711  Prunus persica (Peach) (Amygdalus persica)
      95      28669  Rhizophagus irregularis DAOM 197198w
      96      28382  Eutrema salsugineum (Saltwater cress) (Sisymbrium salsugineum)
      97      28207  Gasterosteus aculeatus (Three-spined stickleback)
      98      28059  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      99      27848  Canis familiaris (Dog) (Canis lupus familiaris)
     100      27699  Equus caballus (Horse)
     101      27574  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
     102      27568  Jatropha curcas (Barbados nut)
     103      27454  Amborella trichopoda
     104      27340  Listeria monocytogenes
     105      27102  Gorilla gorilla gorilla (Western lowland gorilla)
     106      27017  Stegodyphus mimosarum
     107      26921  Tetrahymena thermophila (strain SB210)
     108      26862  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
     109      26771  Morus notabilis
     110      26517  Phytophthora parasitica P1976
     111      26489  Phytophthora parasitica CJ01A1
     112      26483  Ovis aries (Sheep)
     113      26477  Phytophthora parasitica P1569
     114      26452  Phytophthora parasitica P10297
     115      26438  Phytophthora parasitica (strain INRA-310)
     116      26349  Burkholderia cepacia (Pseudomonas cepacia)
     117      26247  Rattus norvegicus (Rat)
     118      26013  Oryzias latipes (Medaka fish) (Japanese ricefish)
     119      25882  Bos taurus (Bovine)
     120      25832  Loxodonta africana (African elephant)
     121      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
     122      25599  Coffea canephora (Robusta coffee)
     123      25496  Serratia marcescens
     124      25222  Oesophagostomum dentatum (Nodular worm)
     125      25025  Aphanomyces astaci
     126      24922  Nematostella vectensis (Starlet sea anemone)
     127      24778  Trichuris suis (pig whipworm)
     128      24590  Guillardia theta CCMP2712
     129      24480  Cucumis sativus (Cucumber)
     130      24375  Oxytricha trifallax
     131      24301  Tetraselmis sp. GSL018
     132      24167  Tolypothrix bouteillei VB521301
     133      23824  Astyanax mexicanus (Blind cave fish) (Astyanax fasciatus mexicanus)
     134      23780  Bacillus cereus
     135      23743  Ornithorhynchus anatinus (Duckbill platypus)
     136      23687  Lottia gigantea (Giant owl limpet)
     137      23651  Dendroctonus ponderosae (Mountain pine beetle)
     138      23497  Latimeria chalumnae (West Indian ocean coelacanth)
     139      23492  Caenorhabditis elegans
     140      23382  Helobdella robusta (Californian leech)
     141      23373  Arabis alpina (Alpine rock-cress)
     142      23318  Fusarium oxysporum f. sp. melonis 26406
     143      23271  Fusarium oxysporum f. sp. conglutinans race 2 54008
     144      23263  Fusarium oxysporum f. sp. pisi HDV247
     145      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
     146      22820  Monodelphis domestica (Gray short-tailed opossum)
     147      22754  Fusarium oxysporum f. sp. raphani 54005
     148      22570  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
     149      22529  Lepisosteus oculatus (Spotted gar)
     150      22413  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
     151      22248  Fusarium oxysporum f. sp. vasinfectum 25433
     152      22174  gut metagenome
     153      22046  Comamonas testosteroni (Pseudomonas testosteroni)
     154      21955  Oryctolagus cuniculus (Rabbit)
     155      21759  Haemonchus contortus (Barber pole worm)
     156      21689  Fusarium oxysporum f. sp. radicis-lycopersici 26381
     157      21678  Gallus gallus (Chicken)
     158      21661  Fusarium oxysporum Fo47
     159      21640  Vibrio vulnificus
     160      21625  Heterocephalus glaber (Naked mole rat)
     161      21549  Fusarium oxysporum f. sp. lycopersici MN25
     162      21484  Gibberella zeae (Wheat head blight fungus) (Fusarium graminearum)
     163      21421  Papio anubis (Olive baboon)
     164      21398  Caenorhabditis briggsae
     165      21383  Aeromonas hydrophila
     166      21357  Galerina marginata CBS 339.88
     167      21333  Echinococcus granulosus (Hydatid tapeworm)
     168      21210  Ixodes scapularis (Black-legged tick) (Deer tick)
     169      21200  Myotis lucifugus (Little brown bat)
     170      21121  Felis catus (Cat) (Felis silvestris catus)
     171      20867  Tupaia chinensis (Chinese tree shrew)
     172      20822  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     173      20768  Stylonychia lemnae
     174      20767  Fusarium oxysporum FOSC 3-a
     175      20737  Pectobacterium carotovorum subsp. brasiliense
     176      20541  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     177      20465  Fukomys damarensis (Damaraland mole rat) (Cryptomys damarensis)
     178      20258  Mycobacterium tuberculosis
     179      20168  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     180      20116  Ciona savignyi (Pacific transparent sea squirt)
     181      20104  Cavia porcellus (Guinea pig)
     182      20090  uncultured archaeon
     183      20062  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     184      20052  Saprolegnia parasitica (strain CBS 223.65)
     185      20028  Camelus ferus (Wild Bactrian camel)
     186      19998  Callorhynchus milii (Elephant fish) (Australian ghost shark)
     187      19945  Burkholderia cenocepacia
     188      19837  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     189      19807  Fusarium oxysporum f. sp. cubense tropical race 4 54006
     190      19733  Rhizopus microsporus
     191      19708  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     192      19705  Pseudomonas putida (Arthrobacter siderocapsulatus)
     193      19687  Mesorhizobium plurifarium
     194      19673  Bactrocera dorsalis (Oriental fruit fly) (Dacus dorsalis)
     195      19654  Plasmodium falciparum
     196      19630  Anolis carolinensis (Green anole) (American chameleon)
     197      19618  Brugia malayi (Filarial nematode worm)
     198      19594  Aphanomyces invadans
     199      19562  Pteropus alecto (Black flying fox)
     200      19523  Wuchereria bancrofti
     201      19436  Anopheles sinensis
     202      19300  Myotis brandtii (Brandt's bat)
     203      19200  Trypanosoma cruzi (strain CL Brener)
     204      19196  Necator americanus (Human hookworm)
     205      19112  Ixodes ricinus (Common tick)
     206      19064  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     207      19032  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     208      18966  Drosophila simulans (Fruit fly)
     209      18947  Xanthomonas axonopodis pv. phaseoli
     210      18600  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     211      18582  Toxocara canis (Canine roundworm)
     212      18575  Bos mutus
     213      18488  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     214      18417  Ophiophagus hannah (King cobra) (Naja hannah)
     215      18373  Nonlabens ulvanivorans
     216      18297  Tetranychus urticae (Two-spotted spider mite)
     217      18143  Hepatitis C virus subtype 1b
     218      18125  Atta cephalotes (Leafcutter ant)
     219      18055  Anopheles gambiae (African malaria mosquito)
     220      18047  Saprolegnia diclina VS20
     221      17976  Moniliophthora roreri (strain MCA 2997) (Cocoa frosty pod rot fungus) 
     222      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     223      17807  Bombyx mori (Silk moth)
     224      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     225      17683  Genlisea aurea
     226      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     227      17589  Gibberella moniliformis (strain M3125 / FGSC 7600)  
     228      17490  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     229      17480  Pectobacterium carotovorum subsp. carotovorum 
     230      17420  Rhizobium radiobacter (Agrobacterium tumefaciens) (Agrobacterium radiobacter)
     231      17386  Ceratitis capitata (Mediterranean fruit fly) (Tephritis capitata)
     232      17346  Nasonia vitripennis (Parasitic wasp)
     233      17108  Tribolium castaneum (Red flour beetle)
     234      17107  Drosophila yakuba (Fruit fly)
     235      17013  Bradyrhizobium japonicum
     236      16949  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     237      16943  Meleagris gallopavo (Common turkey)
     238      16843  Burkholderia mallei (Pseudomonas mallei)
     239      16723  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     240      16715  Drosophila persimilis (Fruit fly)
     241      16622  Rhodnius prolixus (Triatomid bug)
     242      16534  Cerapachys biroi (Ant)
     243      16491  Apis mellifera (Honeybee)
     244      16484  Botryobasidium botryosum FD-172 SS1
     245      16447  Ectocarpus siliculosus (Brown alga)
     246      16437  Streptococcus mitis
     247      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     248      16373  Opisthorchis viverrini
     249      16341  Jaapia argillacea MUCL 33604
     250      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)


   
   2.3  Taxonomic distribution of the sequences

image

   Kingdom        sequences (% of the database)
    Archaea          907701 (  1%)
    Bacteria       74927307 ( 81%)
    Eukaryota      13524655 ( 15%)
    Viruses         2210016 (  2%)
    Other            554563 ( <1%)



   Within Eukaryota:

image

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 124838 (  1%)           (  0%)
     Other Mammalia       1144771 (  8%)           (  1%)
     Other Vertebrata     1547253 ( 11%)           (  2%)
     Viridiplantae        2836128 ( 21%)           (  3%)
     Fungi                3699811 ( 27%)           (  4%)
     Insecta              1190582 (  9%)           (  1%)
     Nematoda              482960 (  4%)           (  1%)
     Other                2498312 ( 18%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 2181591             1001-1100   462371
                 51- 100 8385659             1101-1200   334379
                101- 150 9698601             1201-1300   238483
                151- 200 9153610             1301-1400   136888
                201- 250 9393613             1401-1500   118596
                251- 300 9215663             1501-1600    77944
                301- 350 8313014             1601-1700    59938
                351- 400 6164659             1701-1800    37835
                401- 450 5379069             1801-1900    32747
                451- 500 4349506             1901-2000    25468
                501- 550 2769806             2001-2100    24962
                551- 600 2108245             2101-2200    32919
                601- 650 1495272             2201-2300    19216
                651- 700 1198600             2301-2400    15843
                701- 750  928688             2401-2500    14036
                751- 800  794182             >2500        98160
                801- 850  619051
                851- 900  566960
                901- 950  387125
                951-1000  269875

image


   The average sequence length in UniProtKB/TrEMBL is   317 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                   105422721                1.14                                                    
   Submitted to EMBL/GenBank/DDBJ  74391024  70276758      0.81                                                    
   Journal                         28841008  27307873      0.31                                                    
   Submitted to other databases     2162349   2153785      0.02                                                    
   Thesis                             18978     18919     <0.01                                                    
   Book citation                       9361      9298     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 545691


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                     155146654                1.68                                                    
   CATALYTIC ACTIVITY              11594150  10654780      0.13     4                                              
   CAUTION                         64659261  64598813      0.70     1                                              
   COFACTOR                         6500602   4960605      0.07     7                                              
   DOMAIN                            614809    591589      0.01     9                                              
   ENZYME REGULATION                 221637    221637     <0.01    11                                              
   FUNCTION                        13464311  12849233      0.15     3                                              
   INTERACTION                         1803      1803     <0.01    12                                              
   MISCELLANEOUS                     352168    349723     <0.01    10                                              
   PATHWAY                          6182436   5128993      0.07     8                                              
   SIMILARITY                      33447071  27756186      0.36     2                                              
   SUBCELLULAR LOCATION            10710986  10399095      0.12     5                                              
   SUBUNIT                          7397420   7314837      0.08     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      62905093                0.68                                                    
   ACT_SITE                         5598642   3504537      0.06     5                                              
   BINDING                         11853765   3043221      0.13     1                                              
   CARBOHYD                             416       157     <0.01    27                                              
   CHAIN                             943947    750435      0.01    10                                              
   COILED                            201367    114373     <0.01    16                                              
   COMPBIAS                           31155     30984     <0.01    22                                              
   CROSSLNK                           31312     22264     <0.01    21                                              
   DISULFID                          254263    190823     <0.01    15                                              
   DNA_BIND                          171835    161233     <0.01    18                                              
   DOMAIN                           2199056   1728519      0.02     8                                              
   INIT_MET                           30902     30902     <0.01    23                                              
   INTRAMEM                             413        59     <0.01    28                                              
   LIPID                             164238     82119     <0.01    19                                              
   METAL                           10666676   2814218      0.12     2                                              
   MOD_RES                           824671    766804      0.01    12                                              
   MOTIF                             633506    406775      0.01    14                                              
   NON_STD                             2121      1978     <0.01    26                                              
   NON_TER                         10517909   7024642      0.11     3                                              
   NP_BIND                          4373608   2558154      0.05     6                                              
   PEPTIDE                              137       137     <0.01    29                                              
   PROPEP                             12236     12236     <0.01    24                                              
   REGION                           3680728   2004600      0.04     7                                              
   REPEAT                            136513     31651     <0.01    20                                              
   SIGNAL                            897768    891819      0.01    11                                              
   SITE                             1528883    776658      0.02     9                                              
   TOPO_DOM                          706606    146160      0.01    13                                              
   TRANSIT                             2417      2405     <0.01    25                                              
   TRANSMEM                         7263466   1288482      0.08     4                                              
   ZN_FING                           176537    158654     <0.01    17                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             1041689367               11.31                                                    
   Allergome                           3824      3147     <0.01    84   Protein family/group databases             
   ArachnoServer                         99        99     <0.01   103   Organism-specific databases                
   BRENDA                              2555      2529     <0.01    90   Enzyme and pathway databases               
   Bgee                               94222     94222     <0.01    52   Gene expression databases                  
   BindingDB                          89443     89443     <0.01    53   Chemistry                                  
   BioCyc                           5767036   5689632      0.06    23   Enzyme and pathway databases               
   CAZy                               73692     69246     <0.01    58   Protein family/group databases             
   CGD                                 6735      6735     <0.01    79   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   109   2D gel databases                           
   CTD                               467193    465965      0.01    39   Organism-specific databases                
   ChEMBL                               783       783     <0.01    94   Chemistry                                  
   ChiTaRS                            87252     87092     <0.01    54   Other                                      
   ConoServer                           159       159     <0.01   100   Organism-specific databases                
   DIP                                 3178      3173     <0.01    87   Protein-protein interaction databases      
   DNASU                              41813     41487     <0.01    65   Protocols and materials databases          
   DrugBank                             145        57     <0.01   101   Chemistry                                  
   EMBL                           100380391  90947165      1.09     3   Sequence databases                         
   Ensembl                          1164978   1149456      0.01    32   Genome annotation databases                
   EnsemblBacteria                 68295399  67076216      0.74     5   Genome annotation databases                
   EnsemblFungi                      400562    399495     <0.01    41   Genome annotation databases                
   EnsemblMetazoa                    922526    903941      0.01    34   Genome annotation databases                
   EnsemblPlants                     922118    880726      0.01    35   Genome annotation databases                
   EnsemblProtists                   186149    183681     <0.01    49   Genome annotation databases                
   EuPathDB                          157060    157059     <0.01    51   Organism-specific databases                
   EvolutionaryTrace                   7849      7849     <0.01    78   Other                                      
   ExpressionAtlas                   258442    258442     <0.01    43   Gene expression databases                  
   FlyBase                           199511    198036     <0.01    47   Organism-specific databases                
   GO                             164200738  56661479      1.78     2   Ontologies                                 
   Gene3D                          54791209  42751356      0.59     6   Family and domain databases                
   GeneID                          12234881  11941271      0.13    16   Genome annotation databases                
   GeneTree                         1125216   1125178      0.01    33   Phylogenomic databases                     
   Genevestigator                     81479     81475     <0.01    55   Gene expression databases                  
   GenoList                           14727     14454     <0.01    75   Organism-specific databases                
   GenomeRNAi                         23374     23374     <0.01    71   Other                                      
   Gramene                           194196    194196     <0.01    48   Organism-specific databases                
   GuidetoPHARMACOLOGY                   21        21     <0.01   106   Chemistry                                  
   H-InvDB                              593       446     <0.01    96   Organism-specific databases                
   HAMAP                           12204196  12031921      0.13    17   Family and domain databases                
   HGNC                               47179     47097     <0.01    63   Organism-specific databases                
   HOGENOM                          3640631   3640585      0.04    27   Phylogenomic databases                     
   HOVERGEN                          302087    302078     <0.01    42   Phylogenomic databases                     
   InParanoid                       2640967   2640967      0.03    30   Phylogenomic databases                     
   IntAct                             20539     20539     <0.01    74   Protein-protein interaction databases      
   InterPro                       211949334  72002144      2.30     1   Family and domain databases                
   KEGG                            10756381  10527742      0.12    18   Genome annotation databases                
   KO                               4613765   4590629      0.05    26   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    81   Organism-specific databases                
   Leproma                             1272      1270     <0.01    91   Organism-specific databases                
   MEROPS                            235558    235557     <0.01    44   Protein family/group databases             
   MGI                                53938     53579     <0.01    59   Organism-specific databases                
   MIM                                    4         4     <0.01   110   Organism-specific databases                
   MINT                               10074     10073     <0.01    76   Protein-protein interaction databases      
   MaxQB                               2863      2863     <0.01    88   Proteomic databases                        
   MoonProt                               7         7     <0.01   108   Protein family/group databases             
   NextBio                           199926    199924     <0.01    46   Other                                      
   OGP                                    3         3     <0.01   111   2D gel databases                           
   OMA                              8472356   8472354      0.09    21   Phylogenomic databases                     
   OrthoDB                          5140232   5140229      0.06    25   Phylogenomic databases                     
   PANTHER                         12298032  11885978      0.13    15   Family and domain databases                
   PATRIC                           8243868   8243671      0.09    22   Genome annotation databases                
   PDB                                25820     13828     <0.01    68   3D structure databases                     
   PDBsum                             25132     13388     <0.01    69   3D structure databases                     
   PIR                               170835    138012     <0.01    50   Sequence databases                         
   PIRSF                            9592127   9514910      0.10    19   Family and domain databases                
   PMAP-CutDB                           199       199     <0.01    99   Other                                      
   PRIDE                             912871    912871      0.01    36   Proteomic databases                        
   PRINTS                          12685762  11412997      0.14    14   Family and domain databases                
   PRO                                26773     26772     <0.01    67   Other                                      
   PROSITE                         44182499  29929011      0.48     8   Family and domain databases                
   PaxDb                              28220     28218     <0.01    66   Proteomic databases                        
   PeptideAtlas                         127       127     <0.01   102   Proteomic databases                        
   PeroxiBase                          2582      2574     <0.01    89   Protein family/group databases             
   Pfam                            92109244  67122781      1.00     4   Family and domain databases                
   PharmGKB                            3185      3185     <0.01    86   Organism-specific databases                
   PhosphoSite                         1078      1078     <0.01    92   PTM databases                              
   PhylomeDB                         434481    434481     <0.01    40   Phylogenomic databases                     
   PomBase                                2         2     <0.01   112   Organism-specific databases                
   ProDom                           1739487   1697757      0.02    31   Family and domain databases                
   ProMEX                              3447      3447     <0.01    85   Proteomic databases                        
   ProteinModelPortal              30517640  30517640      0.33     9   3D structure databases                     
   Proteomes                       19251171  18985671      0.21    12   Other                                      
   PseudoCAP                           4493      4487     <0.01    82   Organism-specific databases                
   REBASE                             48293     48285     <0.01    60   Protein family/group databases             
   REPRODUCTION-2DPAGE                   65        64     <0.01   104   2D gel databases                           
   RGD                                22313     21193     <0.01    72   Organism-specific databases                
   Reactome                          210246     73480     <0.01    45   Enzyme and pathway databases               
   RefSeq                          19506578  16084896      0.21    11   Sequence databases                         
   SABIO-RK                             501       501     <0.01    97   Enzyme and pathway databases               
   SGD                                    7         7     <0.01   107   Organism-specific databases                
   SMART                           18959595  14472117      0.21    13   Family and domain databases                
   SMR                              8569252   8569252      0.09    20   3D structure databases                     
   STRING                           3125426   3125252      0.03    28   Protein-protein interaction databases      
   SUPFAM                          51933758  41815732      0.56     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   105   2D gel databases                           
   SignaLink                           4091      4086     <0.01    83   Enzyme and pathway databases               
   TAIR                               20821     20704     <0.01    73   Organism-specific databases                
   TCDB                                6379      6370     <0.01    80   Protein family/group databases             
   TIGRFAMs                        24729214  22565549      0.27    10   Family and domain databases                
   TreeFam                           587405    587403      0.01    37   Phylogenomic databases                     
   TubercuList                         1072      1062     <0.01    93   Organism-specific databases                
   UCSC                               48194     48031     <0.01    61   Genome annotation databases                
   UniGene                           556492    518303      0.01    38   Sequence databases                         
   UniPathway                       5580533   5112332      0.06    24   Enzyme and pathway databases               
   VectorBase                         78240     77723     <0.01    56   Genome annotation databases                
   World-2DPAGE                         670       665     <0.01    95   2D gel databases                           
   WormBase                           43320     43196     <0.01    64   Organism-specific databases                
   Xenbase                            25018     24960     <0.01    70   Organism-specific databases                
   ZFIN                               47564     47459     <0.01    62   Organism-specific databases                
   dictyBase                           7993      7771     <0.01    77   Organism-specific databases                
   eggNOG                           2745536   2745501      0.03    29   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    57   Organism-specific databases                
   mycoCLAP                             419       419     <0.01    98   Protein family/group databases             

Number of explicitly cross-referenced databases: 133


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.96   Gln (Q) 3.98   Leu (L) 9.94   Ser (S) 6.38
   Arg (R) 5.38   Glu (E) 6.07   Lys (K) 5.19   Thr (T) 5.55
   Asn (N) 4.11   Gly (G) 7.21   Met (M) 2.47   Trp (W) 1.26
   Asp (D) 5.43   His (H) 2.21   Phe (F) 3.98   Tyr (Y) 3.05
   Cys (C) 1.10   Ile (I) 6.15   Pro (P) 4.52   Val (V) 6.93

   Asx (B) 0      Glx (Z) 0      Xaa (X) 0.01

image

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Ile, Glu, Thr, Asp, Arg, Lys, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 835340
Total number of entries encoded on a Plasmid: 498700
Total number of entries encoded on a Plastid: 41615
Total number of entries encoded on a Plastid; Apicoplast: 
Total number of entries encoded on a Plastid; Chloroplast: 63
Total number of entries encoded on a Plastid; Cyanelle: 
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: