Current Release Statistics


         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2018_11 STATISTICS


1.  INTRODUCTION

Release 2018_11 of 05-Dec-2018 of UniProtKB/TrEMBL contains 137213158 sequence entries,
comprising 46053137524 amino acids.

4203204 sequences have been added since release 2018_10, the sequence data of
53796 existing entries has been updated and the annotations of
46854386 entries have been revised. This represents an increase of 3%.

Number of fragments: 13302908

Protein existence (PE):              entries      %
1: Evidence at protein level          145836     0.11%
2: Evidence at transcript level      1208846     0.88%
3: Inferred from homology           34608287    25.22%
4: Predicted                       101250189    73.79%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.
image



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 1034102

   The first twenty species represent14810351 sequences:  10.8 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x: 549927
                            2x: 123716
                            3x:  65377
                            4x:  46748
                            5x:  28567
                            6x:  20546
                            7x:  15407
                            8x:  12102
                            9x:   9917
                           10x:  15375
                       11- 20x:  78637
                       21- 50x:  22461
                       51-100x:  12748
                         >100x:  32574


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1    1113124  Chernetidae sp. UAIC
   2.3  Taxonomic distribution of the sequences

image

   Kingdom        sequences (% of the database)
    Archaea         2960583 (  2%)
    Bacteria       96630254 ( 70%)
    Eukaryota      32245725 ( 24%)
    Viruses         3769429 (  3%)
    Other           1607167 ( <1%)



   Within Eukaryota:

image

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 157261 (  0%)           (  0%)
     Other Mammalia       2526010 (  8%)           (  2%)
     Other Vertebrata     3360550 ( 10%)           (  2%)
     Viridiplantae        6873986 ( 21%)           (  5%)
     Fungi                9111998 ( 28%)           (  7%)
     Insecta              3389036 ( 11%)           (  2%)
     Nematoda             1570413 (  5%)           (  1%)
     Other                5256471 ( 16%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  1843418             1001-1100    938727
                 51- 100 11455644             1101-1200    650980
                101- 150 13754000             1201-1300    436865
                151- 200 13278826             1301-1400    296222
                201- 250 13184149             1401-1500    231285
                251- 300 13117616             1501-1600    169242
                301- 350 11921878             1601-1700    126839
                351- 400  9247347             1701-1800     97435
                401- 450  7864911             1801-1900     84557
                451- 500  6312270             1901-2000     70323
                501- 550  4381186             2001-2100     56642
                551- 600  3329539             2101-2200     54168
                601- 650  2452729             2201-2300     42141
                651- 700  1920748             2301-2400     34652
                701- 750  1632237             2401-2500     29676
                751- 800  1396120             >2500        225831
                801- 850  1084252
                851- 900   932321
                901- 950   710570
                951-1000   544904

image


   The average sequence length in UniProtKB/TrEMBL is   335 amino acids.

   The shortest sequence is     C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  A0A316Q3J5_9FIRM: 74488 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                   161810858                1.18                                                    
   Submitted to EMBL/GenBank/DDBJ 104908005  95396525      0.76                                                    
   Journal                         50214933  47560661      0.37                                                    
   Submitted to other databases     6661013   6632930      0.05                                                    
   Thesis                             15531     15471     <0.01                                                    
   Book citation                      11375     11310     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 709243


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                     193654267                1.41                                                    
   ACTIVITY REGULATION               408461    408459     <0.01    11                                              
   CATALYTIC ACTIVITY              16082589  14647033      0.12     5                                              
   CAUTION                         78800880  77049242      0.57     1                                              
   COFACTOR                         7910770   7214615      0.06     8                                              
   DOMAIN                           1343836   1031323      0.01     9                                              
   FUNCTION                        19095777  18074599      0.14     3                                              
   INTERACTION                         2850      2850     <0.01    12                                              
   MISCELLANEOUS                     786229    710171      0.01    10                                              
   PATHWAY                          8214529   7397484      0.06     7                                              
   SIMILARITY                      34863090  34391348      0.25     2                                              
   SUBCELLULAR LOCATION            16212261  16088761      0.12     4                                              
   SUBUNIT                          9932995   9809400      0.07     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                     351836341                2.56                                                    
   ACT_SITE                         8860094   5363990      0.06     9                                              
   BINDING                         18713713   4754626      0.14     5                                              
   CARBOHYD                           21521     20373     <0.01    23                                              
   CHAIN                           10049399  10036537      0.07     7                                              
   COILED                          19915043  13342991      0.15     3                                              
   COMPBIAS                            4990      4990     <0.01    26                                              
   CROSSLNK                           38874     36339     <0.01    22                                              
   DISULFID                         2180615    583186      0.02    15                                              
   DNA_BIND                         1095050   1077548      0.01    17                                              
   DOMAIN                          97419253  70665612      0.71     2                                              
   INIT_MET                           59033     59033     <0.01    21                                              
   INTRAMEM                            1250      1028     <0.01    27                                              
   LIPID                             372887    214215     <0.01    19                                              
   METAL                           14738851   3908729      0.11     6                                              
   MOD_RES                          2770429   2428222      0.02    13                                              
   MOTIF                            1708125   1157874      0.01    16                                              
   NON_STD                             6800      6571     <0.01    25                                              
   NON_TER                         19527862  13356905      0.14     4                                              
   NP_BIND                          7674428   4892398      0.06    10                                              
   PEPTIDE                              753       484     <0.01    28                                              
   PROPEP                             19224     19224     <0.01    24                                              
   REGION                           6169431   3178574      0.04    11                                              
   REPEAT                           5221581   1249056      0.04    12                                              
   SIGNAL                          10022868  10022858      0.07     8                                              
   SITE                             2361511   1434562      0.02    14                                              
   TOPO_DOM                          343243    160243     <0.01    20                                              
   TRANSIT                              140       140     <0.01    29                                              
   TRANSMEM                       122071316  26797565      0.89     1                                              
   ZN_FING                           468057    370429     <0.01    18                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             1603054374               11.68                                                    
   Allergome                           3937      3171     <0.01    89   Protein family/group databases             
   ArachnoServer                        200       200     <0.01   113   Organism-specific databases                
   Araport                            15175     15109     <0.01    80   Organism-specific databases                
   BRENDA                              9542      9252     <0.01    83   Enzyme and pathway databases               
   Bgee                              529572    529380     <0.01    48   Gene expression databases                  
   BindingDB                            231       231     <0.01   112   Chemistry                                  
   BioCyc                           6330281   6306058      0.05    28   Enzyme and pathway databases               
   CAZy                              128848    120585     <0.01    58   Protein family/group databases             
   CDD                             24948747  21900591      0.18    14   Family and domain databases                
   CGD                                20795     20729     <0.01    78   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   129   2D gel databases                           
   CORUM                                259       259     <0.01   111   Protein-protein interaction databases      
   CTD                              1214072   1212186      0.01    39   Organism-specific databases                
   CarbonylDB                           264       264     <0.01   110   PTM databases                              
   ChEMBL                               964       964     <0.01   103   Chemistry                                  
   ChiTaRS                           131314    131313     <0.01    57   Other                                      
   CollecTF                             198       198     <0.01   114   Gene expression databases                  
   ComplexPortal                        183       134     <0.01   115   Protein-protein interaction databases      
   ConoServer                           158       158     <0.01   116   Organism-specific databases                
   DIP                                 3200      3199     <0.01    91   Protein-protein interaction databases      
   DNASU                              41260     40821     <0.01    70   Protocols and materials databases          
   DisProt                               96        96     <0.01   120   3D structure databases                     
   DrugBank                             769       461     <0.01   104   Chemistry                                  
   ELM                                  100       100     <0.01   119   Protein-protein interaction databases      
   EMBL                           150411730 132651462      1.10     3   Sequence databases                         
   EPD                                13908     13908     <0.01    81   Proteomic databases                        
   ESTHER                             76102     75804     <0.01    63   Protein family/group databases             
   Ensembl                          2344895   2274491      0.02    33   Genome annotation databases                
   EnsemblBacteria                 38705733  36458425      0.28    10   Genome annotation databases                
   EnsemblFungi                     5918878   5777821      0.04    29   Genome annotation databases                
   EnsemblMetazoa                   1150750   1109253      0.01    40   Genome annotation databases                
   EnsemblPlants                    2400957   2205758      0.02    31   Genome annotation databases                
   EnsemblProtists                  1865160   1757010      0.01    37   Genome annotation databases                
   EuPathDB                          675787    675190     <0.01    44   Organism-specific databases                
   EvolutionaryTrace                   5930      5930     <0.01    86   Other                                      
   ExpressionAtlas                   597563    597560     <0.01    45   Gene expression databases                  
   FlyBase                           215849    214493     <0.01    55   Organism-specific databases                
   GO                             236365893  88075500      1.72     2   Ontologies                                 
   Gene3D                          60218333  50063452      0.44     8   Family and domain databases                
   GeneCards                           1273      1257     <0.01   100   Organism-specific databases                
   GeneDB                            114674    112894     <0.01    60   Genome annotation databases                
   GeneID                          10838882  10732844      0.08    22   Genome annotation databases                
   GeneTree                         2214118   2213798      0.02    35   Phylogenomic databases                     
   Genevisible                        15825     15818     <0.01    79   Gene expression databases                  
   GenomeRNAi                         32231     32231     <0.01    74   Other                                      
   GlyConnect                            55        55     <0.01   124   PTM databases                              
   Gramene                          2395450   2201603      0.02    32   Genome annotation databases                
   GuidetoPHARMACOLOGY                    4         4     <0.01   131   Chemistry                                  
   H-InvDB                              587       440     <0.01   106   Organism-specific databases                
   HAMAP                           15277103  15101905      0.11    18   Family and domain databases                
   HGNC                               52801     52692     <0.01    68   Organism-specific databases                
   HOGENOM                          2993558   2993476      0.02    30   Phylogenomic databases                     
   HOVERGEN                          300293    300280     <0.01    53   Phylogenomic databases                     
   InParanoid                       2275237   2275237      0.02    34   Phylogenomic databases                     
   IntAct                             26322     25640     <0.01    76   Protein-protein interaction databases      
   InterPro                       353564868 107575389      2.58     1   Family and domain databases                
   KEGG                            16573243  16157537      0.12    17   Genome annotation databases                
   KO                               7336424   7306677      0.05    24   Phylogenomic databases                     
   LegioList                           2496      2483     <0.01    96   Organism-specific databases                
   Leproma                             1271      1269     <0.01   101   Organism-specific databases                
   MEROPS                            238781    238779     <0.01    54   Protein family/group databases             
   MGI                                62478     62050     <0.01    65   Organism-specific databases                
   MIM                                    4         4     <0.01   130   Organism-specific databases                
   MINT                                2719      2719     <0.01    94   Protein-protein interaction databases      
   MalaCards                             10        10     <0.01   126   Organism-specific databases                
   MaxQB                              40469     40469     <0.01    71   Proteomic databases                        
   MoonDB                                 1         1     <0.01   134   Protein family/group databases             
   MoonProt                              62        62     <0.01   122   Protein family/group databases             
   OGP                                    3         3     <0.01   132   2D gel databases                           
   OMA                              6820446   6820427      0.05    26   Phylogenomic databases                     
   OpenTargets                        50720     50670     <0.01    69   Organism-specific databases                
   OrthoDB                         14038137  14038011      0.10    19   Phylogenomic databases                     
   PANTHER                         32310892  31179264      0.24    12   Family and domain databases                
   PATRIC                          16982090  16971259      0.12    16   Genome annotation databases                
   PDB                                39163     18913     <0.01    72   3D structure databases                     
   PIR                               162587    130350     <0.01    56   Sequence databases                         
   PIRSF                           12042138  11938180      0.09    21   Family and domain databases                
   PMAP-CutDB                           130       130     <0.01   118   Other                                      
   PRIDE                             352325    352325     <0.01    50   Proteomic databases                        
   PRINTS                          17858211  16152529      0.13    15   Family and domain databases                
   PRO                                 2255      2255     <0.01    97   Other                                      
   PROSITE                         68473637  45810114      0.50     7   Family and domain databases                
   PaxDb                             324430    324430     <0.01    51   Proteomic databases                        
   PeptideAtlas                      128042    128042     <0.01    59   Proteomic databases                        
   PeroxiBase                          2610      2594     <0.01    95   Protein family/group databases             
   Pfam                           134946820  98199886      0.98     4   Family and domain databases                
   PharmGKB                            3132      3132     <0.01    92   Organism-specific databases                
   PhosphoSitePlus                     2234      2234     <0.01    98   PTM databases                              
   PhylomeDB                         461089    461089     <0.01    49   Phylogenomic databases                     
   PomBase                                2         2     <0.01   133   Organism-specific databases                
   ProDom                           1893761   1819926      0.01    36   Family and domain databases                
   ProMEX                              2798      2798     <0.01    93   Proteomic databases                        
   ProteinModelPortal               7072662   7072662      0.05    25   3D structure databases                     
   Proteomes                      109228195 102985100      0.80     5   Other                                      
   PseudoCAP                           4447      4443     <0.01    88   Organism-specific databases                
   REBASE                             30940     30923     <0.01    75   Protein family/group databases             
   REPRODUCTION-2DPAGE                   62        61     <0.01   123   2D gel databases                           
   RGD                                21602     20703     <0.01    77   Organism-specific databases                
   Reactome                          307547    107592     <0.01    52   Enzyme and pathway databases               
   RefSeq                          45991141  44865210      0.34     9   Sequence databases                         
   SABIO-RK                             620       620     <0.01   105   Enzyme and pathway databases               
   SFLD                              869497    677484      0.01    42   Family and domain databases                
   SGD                                    7         7     <0.01   128   Organism-specific databases                
   SIGNOR                                 7         7     <0.01   127   Enzyme and pathway databases               
   SMART                           32422961  24626147      0.24    11   Family and domain databases                
   SMR                              1427915   1427915      0.01    38   3D structure databases                     
   STRING                           6361934   6361673      0.05    27   Protein-protein interaction databases      
   SUPFAM                          89927703  71223139      0.66     6   Family and domain databases                
   SWISS-2DPAGE                           1         1     <0.01   135   2D gel databases                           
   SignaLink                           3790      3790     <0.01    90   Enzyme and pathway databases               
   SwissLipids                           81        81     <0.01   121   Chemistry                                  
   SwissPalm                           2187      2187     <0.01    99   PTM databases                              
   TAIR                               11848     11787     <0.01    82   Organism-specific databases                
   TCDB                                8227      8215     <0.01    84   Protein family/group databases             
   TIGRFAMs                        28926719  26608372      0.21    13   Family and domain databases                
   TopDownProteomics                    278       278     <0.01   109   Proteomic databases                        
   TreeFam                           548757    548721     <0.01    47   Phylogenomic databases                     
   TubercuList                          999       998     <0.01   102   Organism-specific databases                
   UCSC                               92941     92733     <0.01    61   Genome annotation databases                
   UniCarbKB                             17        17     <0.01   125   PTM databases                              
   UniGene                           894424    758668      0.01    41   Sequence databases                         
   UniLectin                            148       148     <0.01   117   Protein family/group databases             
   UniPathway                       7985650   7372783      0.06    23   Enzyme and pathway databases               
   VGNC                               83109     83109     <0.01    62   Organism-specific databases                
   VectorBase                        580372    561656     <0.01    46   Genome annotation databases                
   WBParaSite                        854104    845698      0.01    43   Genome annotation databases                
   World-2DPAGE                         315       310     <0.01   108   2D gel databases                           
   WormBase                           55949     55565     <0.01    66   Organism-specific databases                
   Xenbase                            34609     34523     <0.01    73   Organism-specific databases                
   ZFIN                               54447     54130     <0.01    67   Organism-specific databases                
   dictyBase                           7978      7756     <0.01    85   Organism-specific databases                
   eggNOG                          13552826   6793586      0.10    20   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    64   Organism-specific databases                
   iPTMnet                             5116      5116     <0.01    87   PTM databases                              
   mycoCLAP                             447       447     <0.01   107   Protein family/group databases             

Number of explicitly cross-referenced databases: 156


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 9.15   Gln (Q) 3.77   Leu (L) 9.89   Ser (S) 6.65
   Arg (R) 5.73   Glu (E) 6.17   Lys (K) 4.96   Thr (T) 5.55
   Asn (N) 3.86   Gly (G) 7.31   Met (M) 2.38   Trp (W) 1.30
   Asp (D) 5.47   His (H) 2.18   Phe (F) 3.93   Tyr (Y) 2.92
   Cys (C) 1.19   Ile (I) 5.71   Pro (P) 4.84   Val (V) 6.90

   Asx (B) 0      Glx (Z) 0      Xaa (X) 0.04

image

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Arg, Ile, Thr, Asp, Lys, Pro, Phe, Asn,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 1971984
Total number of entries encoded on a Plasmid: 965646
Total number of entries encoded on a Plastid: 128615
Total number of entries encoded on a Plastid; Apicoplast: 30
Total number of entries encoded on a Plastid; Chloroplast: 63
Total number of entries encoded on a Plastid; Cyanelle: 
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: