spacer
Get PDB by ID
PDBsml.gif


EMDBsml.gif



spacer

Protein Data Bank in Europe Group - Statistics



Database size in scientific terms relevant to the database The PDB contains more than 40,000 entries and the PDBe relational databases are derived from the PDB. The PDBe actually consists of two separate databases:

  • the deposition database is highly normalised, with thousands of relationships linking some 325 tables; the deposition database is the definitive archive for all structural data at PDBe
  • the search database is a simpler, but larger denormalised database,which contains a large amount of additional derived data, with data items duplicated and aggregated into 170 much wider tables, making it more amenable to searching and retrieval of data
  • A third intermediate database is involved in transforming the data from the deposition database to the search database and in calculating and adding the derived data

Database size

in gross (computer storage) terms The PDBe databases, including duplication for redundancy to give High availability - Fail over between 2 live databases for multiple search interfaces take around 4 Tbytes of disk storage.

Growth graph

Complexity

We have 20 databases on 8 servers that carry out:
  • Depositions of protein structures
  • Loading of data from PDB files (~370 Tables)
  • Transformation of loaded data to a data warehouse structure
  • Search services on the web (~170 Tables)
  • Development and testing of database systems

Data acquisition rates

The PDBe shares the deposition and processing of new PDB entries with the RCSB and the PDBj – the deposition rates are shown in the Table below.
Year Total Deposited To Processed By
    RCSB PDBj EBI RCSB PDBj EBI
200029832445105282294161528
2001328626731184952407384495
2002356327692895052401657505
20034830348867366931351026669
20045508376990081230851613812
20056678450711661005356321101005
20067281514510521084425219451084
TOTAL3412924823420850982113978925098

Usage Statistics

Web access
YearRequestsDistinct FilesUnique Hosts¥Total (Tb)
200634,764,0544,471,304154,800 (434,682)2.86
200514,003,5381,884,257133,867 (336,189)2.42
200414,041,9901,529,826178,188 (321,494)1.29
20037,300,199210,165139,380 (212,196)0.61
20022,447,463106,37353,966 (77,729)0.18
¥Web access is through 4 servers. The number listed is the max number of hosts for a server; the number if parenthesis is the total number of hosts over the 4 servers

FTP Access (GigaBytes)

YearPQSSiftsPDBeChemEMOtherPDBTotal(Gb)
20062,629.85.17.241.2142.81,487.24,313.3
20051,753.81.4-42.8167.81,507.03,473.8
2004318.30.9-11.696.91,379.31,808.0
20031.4--10.266.5277.8355.9

Number of remote in-house copies of the PDBe Search Database

We have provided 16 copies of the database schema to European Institutes. These include: Sanger Centre; MRC-Dunn Human Nutrition Unit; Pfizer; UCL; University of Dundee; Centre For Biomolecular Informatics, Georgetown University, USA; EBI, Thornton Group; KAIST, Korea; Equipe de Génomique Structurale, University Of Basel; TU Dresden, Germany; Birbeck; SIB - Swiss Institute of Bioinfomatics; Humboldt Universitaet zu Berlin and inforsense.com.


Document mantained by: Gaurav Sahni
spacer