Current Release Statistics


         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2019_01 STATISTICS


1.  INTRODUCTION

Release 2019_01 of 16-Jan-2019 of UniProtKB/TrEMBL contains 139694261 sequence entries,
comprising 46816904446 amino acids.

4171399 sequences have been added since release 2018_11, the sequence data of
1018 existing entries has been updated and the annotations of
59725137 entries have been revised. This represents an increase of 3%.

Number of fragments: 13892854

Protein existence (PE):              entries      %
1: Evidence at protein level          145905     0.10%
2: Evidence at transcript level      1209443     0.87%
3: Inferred from homology           35253269    25.24%
4: Predicted                       103085644    73.79%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.
image



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 1040048

   The first twenty species represent14953595 sequences:  10.7 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x: 555756
                            2x: 123697
                            3x:  65389
                            4x:  46740
                            5x:  28575
                            6x:  20547
                            7x:  15407
                            8x:  12109
                            9x:   9914
                           10x:  15378
                       11- 20x:  78650
                       21- 50x:  22464
                       51-100x:  12765
                         >100x:  32657


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1    1113124  Chernetidae sp. UAIC
   2.3  Taxonomic distribution of the sequences

image

   Kingdom        sequences (% of the database)
    Archaea         2957862 (  2%)
    Bacteria       99087603 ( 71%)
    Eukaryota      32270263 ( 23%)
    Viruses         3771366 (  3%)
    Other           1607167 ( <1%)



   Within Eukaryota:

image

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 149070 (  0%)           (  0%)
     Other Mammalia       2526020 (  8%)           (  2%)
     Other Vertebrata     3360714 ( 10%)           (  2%)
     Viridiplantae        6792370 ( 21%)           (  5%)
     Fungi                9208492 ( 29%)           (  7%)
     Insecta              3396196 ( 11%)           (  2%)
     Nematoda             1570408 (  5%)           (  1%)
     Other                5266993 ( 16%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  1847178             1001-1100    950581
                 51- 100 11611947             1101-1200    657726
                101- 150 13963034             1201-1300    440443
                151- 200 13494379             1301-1400    298489
                201- 250 13399572             1401-1500    232964
                251- 300 13341173             1501-1600    170531
                301- 350 12121300             1601-1700    127569
                351- 400  9402190             1701-1800     97821
                401- 450  7996663             1801-1900     84868
                451- 500  6410491             1901-2000     70674
                501- 550  4444593             2001-2100     56979
                551- 600  3376925             2101-2200     54408
                601- 650  2488164             2201-2300     42319
                651- 700  1947175             2301-2400     34845
                701- 750  1653931             2401-2500     29798
                751- 800  1413653             >2500        226408
                801- 850  1098888
                851- 900   943123
                901- 950   719228
                951-1000   551377

image


   The average sequence length in UniProtKB/TrEMBL is   335 amino acids.

   The shortest sequence is     C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  A0A316Q3J5_9FIRM: 74488 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                   164821363                1.18                                                    
   Submitted to EMBL/GenBank/DDBJ 105686490  96228716      0.76                                                    
   Journal                         52445966  49711801      0.38                                                    
   Submitted to other databases     6661779   6633639      0.05                                                    
   Thesis                             15530     15470     <0.01                                                    
   Book citation                      11597     11532     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 709523


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                     198762385                1.42                                                    
   ACTIVITY REGULATION               421717    413311     <0.01    11                                              
   CATALYTIC ACTIVITY              16802730  14953289      0.12     4                                              
   CAUTION                         81468766  79687575      0.58     1                                              
   COFACTOR                         8029010   7321164      0.06     8                                              
   DOMAIN                           1359450   1044331      0.01     9                                              
   FUNCTION                        19398357  18386285      0.14     3                                              
   INTERACTION                         2844      2844     <0.01    12                                              
   MISCELLANEOUS                     794290    717732      0.01    10                                              
   PATHWAY                          8386726   7559161      0.06     7                                              
   SIMILARITY                      35494267  35016218      0.25     2                                              
   SUBCELLULAR LOCATION            16533394  16408678      0.12     5                                              
   SUBUNIT                         10070834   9946043      0.07     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                     359822382                2.58                                                    
   ACT_SITE                         9062027   5488337      0.06     9                                              
   BINDING                         19191997   4882958      0.14     5                                              
   CARBOHYD                           21573     20423     <0.01    23                                              
   CHAIN                           10256807  10243243      0.07     7                                              
   COILED                          20471047  13659643      0.15     3                                              
   COMPBIAS                            5067      5067     <0.01    26                                              
   CROSSLNK                           39681     37113     <0.01    22                                              
   DISULFID                         2262561    604116      0.02    15                                              
   DNA_BIND                         1116388   1098068      0.01    17                                              
   DOMAIN                          99615682  72118868      0.71     2                                              
   INIT_MET                           59565     59565     <0.01    21                                              
   INTRAMEM                            1300      1060     <0.01    27                                              
   LIPID                             375529    215983     <0.01    19                                              
   METAL                           15222911   4016339      0.11     6                                              
   MOD_RES                          2824442   2476773      0.02    13                                              
   MOTIF                            1726499   1170894      0.01    16                                              
   NON_STD                             7159      6931     <0.01    25                                              
   NON_TER                         20194449  13947134      0.14     4                                              
   NP_BIND                          7873143   5003564      0.06    10                                              
   PEPTIDE                              756       487     <0.01    28                                              
   PROPEP                             19514     19514     <0.01    24                                              
   REGION                           6269332   3232946      0.04    11                                              
   REPEAT                           5392898   1288619      0.04    12                                              
   SIGNAL                          10228844  10228834      0.07     8                                              
   SITE                             2403797   1460811      0.02    14                                              
   TOPO_DOM                          344615    161143     <0.01    20                                              
   TRANSIT                              140       140     <0.01    29                                              
   TRANSMEM                       124352189  27315157      0.89     1                                              
   ZN_FING                           482470    380854     <0.01    18                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             1628697687               11.66                                                    
   Allergome                           3899      3142     <0.01    91   Protein family/group databases             
   ArachnoServer                        200       200     <0.01   116   Organism-specific databases                
   Araport                            15170     15104     <0.01    82   Organism-specific databases                
   BRENDA                              9531      9241     <0.01    85   Enzyme and pathway databases               
   Bgee                              529539    529339     <0.01    48   Gene expression databases                  
   BindingDB                            268       268     <0.01   113   Chemistry                                  
   BioCyc                           6316684   6292473      0.05    28   Enzyme and pathway databases               
   BioMuta                             1066      1065     <0.01   104   Polymorphism and mutation databases        
   CAZy                              128839    120576     <0.01    57   Protein family/group databases             
   CDD                             25416906  22307993      0.18    14   Family and domain databases                
   CGD                                20795     20729     <0.01    80   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   134   2D gel databases                           
   CORUM                                259       259     <0.01   115   Protein-protein interaction databases      
   CTD                              1225664   1223778      0.01    39   Organism-specific databases                
   CarbonylDB                           265       265     <0.01   114   PTM databases                              
   ChEMBL                               962       962     <0.01   106   Chemistry                                  
   ChiTaRS                           131301    131300     <0.01    56   Other                                      
   CollecTF                             195       195     <0.01   117   Gene expression databases                  
   ComplexPortal                        183       134     <0.01   118   Protein-protein interaction databases      
   ConoServer                           158       158     <0.01   119   Organism-specific databases                
   DIP                                 3197      3196     <0.01    93   Protein-protein interaction databases      
   DNASU                              41252     40813     <0.01    72   Protocols and materials databases          
   DisProt                               96        96     <0.01   123   3D structure databases                     
   DrugBank                             769       461     <0.01   107   Chemistry                                  
   ELM                                   99        99     <0.01   122   Protein-protein interaction databases      
   EMBL                           152994275 135132202      1.10     3   Sequence databases                         
   EPD                                14515     14515     <0.01    83   Proteomic databases                        
   ESTHER                             75858     75560     <0.01    63   Protein family/group databases             
   Ensembl                          2344161   2273809      0.02    33   Genome annotation databases                
   EnsemblBacteria                 38456892  36217118      0.28    10   Genome annotation databases                
   EnsemblFungi                     5918844   5777787      0.04    29   Genome annotation databases                
   EnsemblMetazoa                   1150829   1109328      0.01    40   Genome annotation databases                
   EnsemblPlants                    2400913   2205721      0.02    32   Genome annotation databases                
   EnsemblProtists                  1865160   1757010      0.01    37   Genome annotation databases                
   EuPathDB                          675762    675166     <0.01    44   Organism-specific databases                
   EvolutionaryTrace                   5926      5926     <0.01    88   Other                                      
   ExpressionAtlas                   597494    597490     <0.01    45   Gene expression databases                  
   FlyBase                            95673     95256     <0.01    60   Organism-specific databases                
   GO                             240866999  89768893      1.72     2   Ontologies                                 
   Gene3D                          61591967  51150270      0.44     8   Family and domain databases                
   GeneCards                           1265      1250     <0.01   103   Organism-specific databases                
   GeneDB                            114674    112894     <0.01    59   Genome annotation databases                
   GeneID                          10918747  10815154      0.08    22   Genome annotation databases                
   GeneTree                         2214072   2213752      0.02    35   Phylogenomic databases                     
   Genevisible                        15816     15809     <0.01    81   Gene expression databases                  
   GenomeRNAi                         32221     32221     <0.01    77   Other                                      
   GlyConnect                            19        19     <0.01   127   PTM databases                              
   Gramene                          2405704   2209146      0.02    31   Genome annotation databases                
   GuidetoPHARMACOLOGY                    4         4     <0.01   132   Chemistry                                  
   H-InvDB                              587       440     <0.01   109   Organism-specific databases                
   HAMAP                           15489827  15311885      0.11    19   Family and domain databases                
   HGNC                               52806     52698     <0.01    68   Organism-specific databases                
   HOGENOM                          2988291   2988209      0.02    30   Phylogenomic databases                     
   HOVERGEN                          300263    300250     <0.01    53   Phylogenomic databases                     
   InParanoid                       2275219   2275219      0.02    34   Phylogenomic databases                     
   IntAct                             26986     26361     <0.01    78   Protein-protein interaction databases      
   InterPro                       360854944 109606142      2.58     1   Family and domain databases                
   KEGG                            16691799  16272761      0.12    18   Genome annotation databases                
   KO                               7385310   7355506      0.05    24   Phylogenomic databases                     
   LegioList                           2496      2483     <0.01    98   Organism-specific databases                
   Leproma                             1271      1269     <0.01   102   Organism-specific databases                
   MEROPS                            238274    238272     <0.01    54   Protein family/group databases             
   MGI                                62491     62060     <0.01    65   Organism-specific databases                
   MIM                                    4         4     <0.01   133   Organism-specific databases                
   MINT                                2995      2995     <0.01    95   Protein-protein interaction databases      
   MalaCards                             10        10     <0.01   129   Organism-specific databases                
   MaxQB                              42728     42728     <0.01    70   Proteomic databases                        
   MoonDB                                 1         1     <0.01   138   Protein family/group databases             
   MoonProt                              62        62     <0.01   125   Protein family/group databases             
   OGP                                    3         3     <0.01   135   2D gel databases                           
   OMA                              6817448   6817429      0.05    26   Phylogenomic databases                     
   OpenTargets                        50712     50664     <0.01    69   Organism-specific databases                
   OrthoDB                         19481168  19481162      0.14    15   Phylogenomic databases                     
   PANTHER                         33067805  31902964      0.24    12   Family and domain databases                
   PATRIC                          16854607  16845513      0.12    17   Genome annotation databases                
   PDB                                39678     19039     <0.01    73   3D structure databases                     
   PDBsum                             39094     18666     <0.01    74   3D structure databases                     
   PIR                               162575    130338     <0.01    55   Sequence databases                         
   PIRSF                           12242867  12137028      0.09    21   Family and domain databases                
   PMAP-CutDB                           130       130     <0.01   121   Other                                      
   PRIDE                             347089    347089     <0.01    50   Proteomic databases                        
   PRINTS                          18280057  16522635      0.13    16   Family and domain databases                
   PRO                                 2255      2255     <0.01   100   Other                                      
   PROSITE                         70041950  46800713      0.50     7   Family and domain databases                
   PaxDb                             324398    324398     <0.01    51   Proteomic databases                        
   PeptideAtlas                      128020    128020     <0.01    58   Proteomic databases                        
   PeroxiBase                          2609      2593     <0.01    97   Protein family/group databases             
   Pfam                           137614381 100099857      0.99     4   Family and domain databases                
   PharmGKB                            3131      3131     <0.01    94   Organism-specific databases                
   PhosphoSitePlus                     2231      2231     <0.01   101   PTM databases                              
   PhylomeDB                         459734    459734     <0.01    49   Phylogenomic databases                     
   PomBase                                2         2     <0.01   136   Organism-specific databases                
   ProDom                           1912931   1838946      0.01    36   Family and domain databases                
   ProMEX                              2798      2798     <0.01    96   Proteomic databases                        
   ProteinModelPortal               7039830   7039830      0.05    25   3D structure databases                     
   Proteomes                      104164036  98036274      0.75     5   Other                                      
   PseudoCAP                           4447      4443     <0.01    90   Organism-specific databases                
   REBASE                             33390     33354     <0.01    76   Protein family/group databases             
   REPRODUCTION-2DPAGE                   62        61     <0.01   126   2D gel databases                           
   RGD                                21601     20702     <0.01    79   Organism-specific databases                
   Reactome                          307491    107578     <0.01    52   Enzyme and pathway databases               
   RefSeq                          46271731  45126549      0.33     9   Sequence databases                         
   SABIO-RK                             613       613     <0.01   108   Enzyme and pathway databases               
   SFLD                              882728    687747      0.01    42   Family and domain databases                
   SGD                                    7         7     <0.01   130   Organism-specific databases                
   SIGNOR                                 7         7     <0.01   131   Enzyme and pathway databases               
   SMART                           33257577  25221323      0.24    11   Family and domain databases                
   SMR                              1399697   1399697      0.01    38   3D structure databases                     
   STRING                           6345261   6345000      0.05    27   Protein-protein interaction databases      
   SUPFAM                          91663372  72587816      0.66     6   Family and domain databases                
   SWISS-2DPAGE                           1         1     <0.01   137   2D gel databases                           
   SignaLink                           3790      3790     <0.01    92   Enzyme and pathway databases               
   SwissLipids                           82        82     <0.01   124   Chemistry                                  
   SwissPalm                           2258      2258     <0.01    99   PTM databases                              
   TAIR                               11843     11782     <0.01    84   Organism-specific databases                
   TCDB                                8273      8261     <0.01    86   Protein family/group databases             
   TIGRFAMs                        29360283  27006494      0.21    13   Family and domain databases                
   TopDownProteomics                    278       278     <0.01   112   Proteomic databases                        
   TreeFam                           548741    548705     <0.01    47   Phylogenomic databases                     
   TubercuList                          999       998     <0.01   105   Organism-specific databases                
   UCSC                               92908     92700     <0.01    61   Genome annotation databases                
   UniCarbKB                             17        17     <0.01   128   PTM databases                              
   UniGene                           912743    775536      0.01    41   Sequence databases                         
   UniLectin                            151       151     <0.01   120   Protein family/group databases             
   UniPathway                       8154769   7533968      0.06    23   Enzyme and pathway databases               
   VGNC                               83541     83541     <0.01    62   Organism-specific databases                
   VectorBase                        580932    562180     <0.01    46   Genome annotation databases                
   WBParaSite                        854104    845698      0.01    43   Genome annotation databases                
   World-2DPAGE                         315       310     <0.01   111   2D gel databases                           
   WormBase                           55945     55561     <0.01    66   Organism-specific databases                
   Xenbase                            34655     34561     <0.01    75   Organism-specific databases                
   ZFIN                               54602     54126     <0.01    67   Organism-specific databases                
   dictyBase                           7978      7756     <0.01    87   Organism-specific databases                
   eggNOG                          13514577   6774264      0.10    20   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    64   Organism-specific databases                
   iPTMnet                             5112      5112     <0.01    89   PTM databases                              
   jPOST                              42098     42098     <0.01    71   Proteomic databases                        
   mycoCLAP                             447       447     <0.01   110   Protein family/group databases             

Number of explicitly cross-referenced databases: 157


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 9.15   Gln (Q) 3.76   Leu (L) 9.89   Ser (S) 6.64
   Arg (R) 5.73   Glu (E) 6.18   Lys (K) 4.96   Thr (T) 5.55
   Asn (N) 3.86   Gly (G) 7.32   Met (M) 2.38   Trp (W) 1.30
   Asp (D) 5.48   His (H) 2.18   Phe (F) 3.93   Tyr (Y) 2.92
   Cys (C) 1.19   Ile (I) 5.71   Pro (P) 4.84   Val (V) 6.90

   Asx (B) 0      Glx (Z) 0      Xaa (X) 0.04

image

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Arg, Ile, Thr, Asp, Lys, Pro, Phe, Asn,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 1979089
Total number of entries encoded on a Plasmid: 969376
Total number of entries encoded on a Plastid: 128616
Total number of entries encoded on a Plastid; Apicoplast: 30
Total number of entries encoded on a Plastid; Chloroplast: 63
Total number of entries encoded on a Plastid; Cyanelle: 
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: