spacer

PQS Protein Quaternary Structure Query Form at the EBI


General Information


The PQS server allows for searching of the list of likely quaternary structures generated at the EBI. The system is a SQL front end to a database on characteristics of the quaternary structure files.

A document is available that describes the procedure for generating this list of files by clicking here

Please note this is a test version and only part of the PDB has been loaded. Structures that belong to either the virus family, or are nucleic acid complexes or the asymmetric unit is considered to be the complete molecule have yet to be loaded.

The following PDB entries are not loaded into the pqs database and are only available from simple idcode search URL given below (as quaternary structure file server).


1bbs 1cde 1cyc 1hrb 1lbt 1xys 2cha 2lgs 1bcf 1gto 1xim 2xim 3pcn 3pca 3pcf 3pch 3pci 3pcj 3pck 3pcl 3pcm 3xim 1bcf 2lgs 123d 1ruo 1wio 1wip 3tra

The full list of PDB entries is still available on a per IDcode basis from the quaternary structure file server.

What is a Chain
Equivalent Protein chains are recognised if:
  1. The Number of protein residues greater than 14
  2. The alignment score from an alignment of the sequence derived from ATOM C-alpha records (using the Needleman & Wunsch method, with an identity matrix) is greater than 2*(NumResidues_in_Chain - X), where X= 1 allowed missing residue in density per 50 residues

    NOTE: For all information in the PQS data set only ATOM/HETATM and CRYST1 and SCALE records are read from the PDB entries and the superposition and alignment for identified protein chains fails in a few cases where the atoms given show different gaps for the equivalent chains, for example,

    1hyh  should be a HOMO complex but the alignment based on ATOM records is
    (in part)
       A    ANLEAHGNIVINDWAALADADVVISTLGNIKLQQFAELKFTSSMVQSVGT
       C    ANLEAHGNIVINDWAALADADVVISTLGGDR---FAELKFTSSMVQSVGT
    and a similar case is encountered for entry 1ncf
       A    LFQCFNCSLCLNGTVHLSCQEKQNTVCTCH-GFREC-------
       B    LFQCFNCSLCLNGTVHLSCQEKQNTVCTCHAGFFLEEVSCSN
    

    In these 2 examples a annotation of HETERO complex is automatically generated

    A problem is also encountered where the alignment requires ATOM records with zero occupancy to be considered and in these cases therefore the derived stereochemistry (salt bridges etc) and ASA will also be based on some ATOM records that have zero occupancy and therefore will be of little value.

  3. Then the aligned C-alpha coordinates are superimposed and if the RMS for superposition is less than 3.1 Angstroms, then the chains are not only chemically equivalent but accepted as symmetrically equivalent (either via crystallographic symmetry or by proper-non-crystallographic symmetry operations). The superposition method uses the algorithm given in, Hendrickson, Acta Cryst (1979) A35, 158-163. The spherical Polar Angles Phi, Psi and Chi from the rotation matrix R are also generated. The Polar Angles are as defined in Rossman and Blow (1962) Acta Cryst 15, 24-31.

OutPutFiles
For all searches you will be given a pointer to an additional file that contains all the chain generation information for each bio-assembly.

If you explicitly request that query contains at least one saltbridge - then a pointer to a file containing details on the geometry of the saltbridges will be generated.

If you explicitly request that query contains at least one di-sulphide - then a pointer to a file containing details on the geometry of the di-sulphides will be generated.

If you explicitly request that query contains at least one buried side chain - then a pointer to a file containing details on the list of buried side chains will be generated.

If there is more than one independently determined copy of the bio-molecule then a URL to a some details on each copy is given. These values come from procheck.


Sample Searches
Sample SEARCH 1

1. select keyword "lysozyme"
2. select quaternary type "DIMERIC"
3. select at least one di-sulphide
4. select delta-asa > 800Ang**2
This gives the result


pdb  num  SpGrp  delta  num  num    num  num  num   num per  delta    type
id  biol  name    ASA   S-S SaltB   bur chain res   het  ASA   sole
139l  1   P3221   866.3  2     2    0    2    324   20  10.2  -12.1  DIMERIC

with the extra URL hyperlinks to:

(i)  Chain Information for each structure in the hit list 
PQS details Quaternary_Chain_Structure_and_Symmetry 
 Chain details::-
  
Chain  Chain  Num    Sole   ASA Symmetry
num.   name   res    Energy     chain
For Molecule:: 139l_0
  1     A  162     -159.6    8511.6   X,  Y,  Z
  2     B  162     -159.6    8511.6   X-Y,  -Y,  -Z+1/3
 


(ii)  InterChain Di-sulphide Details for each structure in the hit list 
PQS details InterChain_Disulphides_Formed_Upon_Complex_Formation 
 DiSulphide details::-
  dihedrals defined as 
  N...Ca...Cb...Sg    Chi-1 CYS-1  
  N...Ca...Cb...Sg    Chi-1 CYS-2  
  Ca...Cb...Sg...Sg   Chi-2 CYS-1/CYS-2 
  Ca...Cb...Sg...Sg   Chi-2 CYS-2/CYS-1 
  Cb...Sg...Sg...Cb   Chi-3 CYS-1/CYS-2 
  
ID   Cys  Chain  Cys  Chain   SG-SG  CB..CB  CA..CA  Chi1    Chi2    Chi3
     seq    seq     Ang     Ang     Ang    (deg)   (deg)   (deg)
For Molecule:: 139l_0
 1     68   A 93   B 2.044    6.28    3.97 -177.6  -71.3  -86.5
             -149.0 -103.3

For Molecule:: 139l_0
 2     93   A 68   B 2.044    6.28    3.97 -149.0 -103.3  -86.5
             -177.6  -71.3

Sample SEARCH 2

1. select keywords e.g. "xylose and isomerase"
2. select authors  e.g. "henrick and blow"
3. select quaternary type e.g. "tetrameric"
4. select Delta-ASA e.g. >1500A**2 ASA per chain lost upon complex formation
5. order results with descending delta-asa

This gives the results


pdb  num  SpGrp  delta  num  num    num  num  num   num per  delta    type
id  biol  name    ASA   SS SaltB   bur chain res   het  ASA   sole
1xlb  1   P3121  7800.0  0    18    8    4   1572   4   40.6  -165.8  TETRAMER
1xla  1   P3121  7780.8  0    15    8    4   1572   0   40.3  -173.5  TETRAMER
5xia  1   P3121  7777.2  0    15    6    4   1572  44   40.6  -169.4  TETRAMER
4xia  1   P3121  7770.9  0    19    8    4   1572  52   40.8  -170.2  TETRAMER
1xlg  1   P3121  7762.5  0    16    7    4   1572  48   40.9  -182.5  TETRAMER
1xlh  1   P3121  7685.9  0    17    7    4   1572   4   40.2  -177.8  TETRAMER
1xll  1   P3121  7679.9  0    18    8    4   1572   8   40.1  -169.4  TETRAMER
1xlj  1   P3121  7677.6  0    19    8    4   1572  44   40.3  -169.0  TETRAMER
1xlc  1   P3121  7663.9  0    18    9    4   1572  44   40.5  -179.8  TETRAMER
1xlf  1   P3121  7663.7  0    17    7    4   1572  60   40.3  -169.4  TETRAMER
1xle  1   P3121  7631.0  0    19    7    4   1572   8   40.1  -166.4  TETRAMER
1xld  1   P3121  7621.4  0    19    7    4   1572  48   40.4  -166.5  TETRAMER
1xlk  1   P3121  7619.9  0    18    7    4   1572   8   40.0  -167.1  TETRAMER
1xli  1   P3121  7599.0  0    18    7    4   1572  56   40.2  -163.9  TETRAMER

If you were however to add the condition


6. select at least one di-sulphide per quaternary sructure

then NO hits will be found for this set of conditions.

In the returned line the PDB-idcode is a URL to a summary page for each entry


PDB Code
You may enter the 4 character PDB idcode in this field. If a full correct PDB idcode is entered, then all other search fields are ignored.
You may enter the PDB idcode with "wild-characters" if they are restricted to either .abc or 1ab. , i.e. only the first or last character may be wild. In this case if other fields are set then all fields will be used in the search.


Keywords
You may enter any character string here that is a word found in a PDB released entry file within the PDB records, HEADER, TITLE, COMPND, and SOURCE. (However, only a limited number of words are stored in the database tables).
No Wild characters are allowed.
In addition you are allowed only one possible logical operation, i.e., you may input only one of the three possible combinations for two keyword strings:
xylose and isomerase
xylose or isomerase
xylose not isomerase


Authors
You may enter any authors name as a character string here that is an author found in a PDB released entry file within the PDB record JRNL.
No Wild characters are allowed, and do not enter initials for other names.
In addition you are allowed only one possible logical operation, i.e., you may input only one of the three possible combinations for two keyword strings:
henrick and blow
henrick or blow
henrick not blow


Quaternary Type
If you want to limit your search to one type of quaternary structure, choose here the particular option you require. The default is no condition for quaternary structure type.
Possible options are:
MONOMERIC     DIMERIC   TRIMERIC TETRAMERIC     PENTAMERIC 
HEXAMERIC     HEPTAMERIC     OCTAMERIC     NONAMERIC DECAMERIC 
UNDECAMERIC   DODECAMERIC    TRIDECAMERIC  TETRADECAMERIC PENTADECAMERIC 
HEXADECAMERIC HEPTADECAMERIC OCTADECAMERIC NONADECAMERIC  EICOSAMERIC 
21MERIC  22MERIC   23MERIC  24MERIC   25MERIC 
26MERIC  27MERIC   28MERIC  29MERIC   30MERIC

Homo or Hetero
If you want to limit your search to one either HOMO or HETERO quaternary structures, choose here the particular option you require. The default is no condition for quaternary structure Homo or Hetero.

NOTE: The criteria for setting a structure as HOMO or HETERO is not limited to sequence homology of constituent chains. In the PQS data set a criteria for stereochemistry is also applied. After an automatic alignment based only on ATOM records in the PDB files is applied then if the chains are equal by sequence a superposition of the equivalent C-alpha atoms is applied and the RMS derived. Normally the superposition RMS based on C-alpha atoms should be less than 0.3 Angstroms, here a cutoff of 3.1 Angstroms is applied and chains of equal sequence that have an RMS greater than this value are set to be HETERO. Examples of this type of complex are given below:


 ID     RMS    ID     RMS ID     RMS
1obp   4.15   1dcl  12.74     1mch  12.91
1spi   5.33   1iph   4.44     1mci  12.74
1tlf   5.12   1lbh   4.28     1mcj  12.76
2mcg  12.73   1lbi   4.93     1mck  12.68
3bjl  14.47   1mcb  12.73     1mcl  12.58
3mcg   9.14   1mcc  12.56     1mcn  12.65
3mcg   9.14   1mcd  12.79     1mcq  12.60
4bjl  11.58   1mce  12.73     1mcr  12.59
8cat  12.08   1mcf  12.80     1mcs  12.70
1av1   4.88   1ant   9.05     1bjm  14.44    
1dap   4.19    

Number Independent Molecules
If you want to search for entries that have 1 or more independent copies of the quaternary structure, choose here the particular option you require. The default is no condition.


Mean Delta ASA per Chain
This field is to limit the search based on Accessible Surface Area.
The Lee and Richards method is used for accessibility calculations. This method loops over each atom in a list, and finds the surface area in square Angstroms that is accessible to a probe sphere of a radius specified. A Probe radius of 1.40 Angstroms is used with a Zslice of 0.05 is used here.
The following atoms in the generated quaternary structure files are rejected in this calculation if:
  1. water (HOH)
  2. Hydrogen or Deuterium
  3. zero occupancy
  4. Only the first atom of any altloc set is used (i.e. all other HETATM records are used, including ligands and DNA/RNA)
The Absolute sum accessible surface is calculated for :
  1. All acceptable atoms in the Oligomer.
  2. All acceptable atoms of the First protein chain of an equivalent set of chains
  3. The difference between the Oligomer ASA/Number Chains and the ASA on an isolated chain is then given. A value of at least 400 Angstroms*2 may indicate that a true oligomer exists. Values below this cutoff may indicate that PQS file describes a significant crystal packing arrangement rather than an oligomeric assembly.
The default is not to search on this condition. You may choose various limits to the mean loss of Accessible Surface Area per chain upon assembly formation compared to the isolated chains.

Details on what to expect as a known dimer are given by Sue Jones at UCL.

NOTE: For entries that only contain C-alpha coordinates contact information is not derived and nor is a delta-ASA therefore in general these entries will be left as the deposited set of coordinates.


Delta Solvation Energy of Folding
A solvation energy of folding calculation is carried out for the complete quaternary structure generated and for each chain that makes up the assembly. A delta solvation energy is then calculated.
see D.Eisenberg and A.D. McLachlan, Nature 319, 199-203 (1986)
and L.Chiche, L.M. Gregoret, F.E. Cohen and P.A.Kollman, Proc Natl. Accad. Sci. USA, 87, 3240-3243, (1990)
A positive value for this delta solvation energy of folding may indicate and error in the generation of the quaternary structure. The more negative the value may simply indicate the more hydrophobic character of the protein-protein interfaces in the assembly.
You may choose to ignore this condition (the default), or to search for only positive values or for various sets of negative values.


Number of SaltBridges
Inter-Chain Salt bridges are searched for with N...O distance cutoff of less than 3.35 Angstroms and that the angle is within acceptable limits. The program calculates the hydrogen position for those target nitrogen atoms where the hydrogen position is unambiguous (ie excluding NZ on Lys and N terminus). Then Angle O...H...N is calculated. For source...oxygen hydrogen bonds, the Angle source...O--Bonded carbon is calculated. Limits on both these Angles are 115 and 85 degrees respectively. (the method used is from Tadeusz Skarzynski's ccp4 program contact.f
You nay ignore this condition (the default) or search for interfaces with no saltbridges (perhaps just hydrophobic), or interfaces that have at least 1 inter-chain saltbridge.


Number of Di-Sulphides
Inter-Chain Di-sulphides are searched for in the analysis of likely generated quaternary structures.
You may ignore this condition (the default) or search for interfaces with no di-sulphides, or interfaces that have at least 1 inter-chain di-sulphide.
Below are list some details of known Disulphide geometries.
 From Kossiakoff subtilisin paper
 Overall C-alpha ... C-alpha distance  For 39 di-sulphides are-
 Left-Handed  Disulphides = 5.88 +-0.49
 Right-Handed Disulphides = 5.07 +-0.73

 From Dijkstra average C-beta ... C-beta is 3.83 +- 0.18 A,  
 RANGE 3.45 to 4.50A  95 percent of C-alpha ... C-alpha are 
 within 4.4 to 6.8 A

 Disulphide dihedral Angles (Jane Richardson -Standard)
  Left-handed has:
   Chi-1(1)  = -60, Chi-2(1) = -90, Chi-3(1) = -90
   Chi-2(2)  = -90, Chi-1(2) = -60
  Right-handed has:
   Chi-1(1)  = -60, Chi-2(1) = +120, Chi-3(1) = +90
   Chi-2(2)  = -50, Chi-1(2) = -60
  With for both an average C-alpha ... C-alpha = 5.0A

  But there are outliers with  -, +, - dihedral Angles and
  C-alpha ... C-alpha of 4.0 to 4.5A in immunoglobulin Beta barrels 
  Long C-alpha ... C-alpha of  6.6 to 7.4A and  Chi-2(1) 
        and Chi-2(2) of 180deg

 From Kossiakoff:
  Left-handed Disulphides
   Chi-1    Chi-2  Chi-3     Chi-2"    Chi-1"
   -60      -60   -85.1(8.7)  -60 -60
   180      180     180 180
   180      180     180 -60
   -60      -60     -90 180
   -76      -90    -148 -85
  Right-handed Disulphides:
   Chi-1    Chi-2  Chi-3     Chi-2"    Chi-1"
    -60     -60     99 (11)   60   180
    -60     -60     74    60
    -60     -78    180    60
    -60     180     43   180
    -60     -60     84   -77
    -60     -83   -120   -60

Number Residues Buried upon Assembly formation
You may choose to limit your search on the conditions that the likely Quaternary structures have residues that are buried upon assembly formation.
Residues are listed where they show the most significant change in relative accessible surface area going from an isolated chain to the oligomeric state.
A true oligomer would be expected to have at least some residues of this type. The cutoff used to include residues here is that the isolated chain has a relative ASA of at least 60% (exposed) and the same residue has in the oligomeric assembly a relative ASA of less than 3% (exposed).
You may choose no condition (default), structures with no buried sidechains, or only those structures that have buried side chains.


Total Number of Residues
You may choose to limit your search based on the overall total number of protein residues in the generated likely quaternary structures.


SpaceGroup
You may choose to search quaternary structures that have a particular spacegroup.
NOTE: The spacegroup should be entered without spaces, i.e. enter P212121 not P21 21 21
The default is to search ignoring spacegroup field.
Current known SpaceGroups are:
A2 B2     C121     C1211   C2     C21    C222    C2221   C4212  
F222    F23    F422     F432    H3     H32    I121    I2 I212121 
I213    I222   I23 I4 I41    I4122  I4132   I422    I432 
P1 P1121  P1211    P2 P21    P21212 P212121 P21221  P213  
P22121  P2221  P3  P31     P3112  P3121  P32     P321    P3212  
P3221   P4     P41 P41212  P4122  P4132  P42     P4212   P422    
P42212  P4222  P43 P432    P43212 P4322  P4332   P6 P61   
P6122   P622   P6222    P63     P6322  P64    P6422   P65     P6522 
R3 R32

Assembly Formula as Chains
For HETERO complexes you may choose to enter a chain formula of the type
A2B1 for example for a hetero-trimeric complex containing two chemically distinct chains.
or A3B3C3 for example for a hetero-trimeric complex containing three chemically distinct chains.
DO NOT enter the square brackets.
Current known formulas are:
[A10]                                  [A3]			 
[A11]				       [A3B1]			 
[A11B1]				       [A3B1C1]		 
[A12]				       [A3B1C1D1]		 
[A12B12]			       [A3B2C1]		 
[A12B12C12]			       [A3B2C1D1]		 
[A12B2]				       [A3B2C2D1E1]		 
[A14]				       [A3B3]			 
[A14B14]			       [A3B3C1]		 
[A16]				       [A3B3C1D1E1]		 
[A16B8]				       [A3B3C1D1E1F1G1H1]	 
[A22]				       [A3B3C2D2]		 
[A24]				       [A3B3C3]		 
[A2]				       [A3B3C3D3]		 
[A2B1]				       [A3B4C1]		 
[A2B1C1]			       [A3B6]			 
[A2B1C1D1]			       [A3B6C1D1E1]		 
[A2B1C1D1E1]			       [A4]			 
[A2B1C1D1E1F1G1]		       [A4B1]			 
[A2B2]				       [A4B1C1]		 
[A2B2C1]			       [A4B1C1D1E1]		 
[A2B2C1D1]			       [A4B2]			 
[A2B2C1D1E1F1]			       [A4B2C1D1]		 
[A2B2C2]			       [A4B2C2]		 
[A2B2C2D1]			       [A4B3C1]		 
[A2B2C2D2]			       [A4B4]			 
[A2B2C2D2E2]			       [A4B4C4]		 
[A2B2C2D2E2F1G1]		       [A4B4C4D4E4F4]		 
[A2B2C2D2E2F2G2H1I1J1K1L1M1]	       [A4B8]                   
[A2B2C2D2E2F2G2H2I2J2]                 [H]			     
[A2B2C2D2E2F2G2H2I2J2K2]	       [HH]			     
[A2B2C2D2E2F2G2H2I2J2K2L1]	       [HHH]			     
[A2B2C2D2E2F2G2H2I2J2K2L2M2]	       [HHHH]			     
[A2B2C2D2E2F2G2H2I2J2K2L2M2N2]	       [HHHHH]			     
[A2B2C4]			       [HHHHHH]		     
[A2B3]				       [HHHHHHHH]		     
[A2B4]				       [HHHHHHHHHH]		     
[A2B4C1D1]			       [HHNN]			     
[A2nuc]				       [HN]			     
[A5]				       [HNH]			     
[A5B1]				       [HNN]			     
[A5B1C1]			       [HNP]			     
[A6]				       [HP]			     
[A6B1C1]			       [HPH]			     
[A6B1C1D1E1F1G1]		       [HPN]			     
[A6B3C1D1E1]			       [HPNN]			     
[A6B6]				       [HPP]			     
[A6B6C2]			       [HPPPNN]		     
[A6B6C6D4E2]			       [N]			     
[A6B9C1D1E1]			       [NH]			     
[A7]				       [NHHPPPPPHHHHHHPHHPPH]	     
[A7B7]				       [NHNN]			     
[A8]				       [NHNP]			     
[A8B1C1D1E1F1G1H1I1]		       [NHP]			     
[A8B4]				       [NHPPPPPPPPPPPPPPPPPPPP]     
[A8B8]				       [NN]			     
[A9]				       [NNN]			     
[A9B9]				       [NNNN]			     
[AB]				       [NNNNN]			     
[ABC]				       [NNNNNN]		     
[ABCD]				       [NNNNNNNN]		     
[ABCDE]				       [NNP]			     
[ABCDEF]			       [NP]			     
[ABCDEFG]			       [NPN]			     
[ABCDEFGH]			       [NPNN]                       
[ABCDEFGHI]                            [PP]				     
[ABCDEFGHIJ]                           [PPH]				     
[ABCDEFGHIJK]			       [PPNN]				     
[ABCDEFGHIJKL]			       [PPNNNN]			     
[ABCDEFGHIJKLM]			       [PPP]				     
[AWATER]			       [PPPHH]				     
[Awater]			       [PPPN]				     
[P]				       [PPPP]				     
[PH]				       [PPPPP]				     
[PHH]				       [PPPPPP]			     
[PHN]				       [PPPPPPH]			     
[PHNP]				       [PPPPPPP]			     
[PN]				       [PPPPPPPHPPPNNH]		     
[PNHN]				       [PPPPPPPP]			     
[PNN]				       [PPPPPPPPPP]			     
[PNNH]				       [PPPPPPPPPPP]			     
[PNPN]				       [PPPPPPPPPPPP]			     
				       [PPPPPPPPPPPPPP]		     
				       [PPPPPPPPPPPPPPPPPP]		     
				       [PPPPPPPPPPPPPPPPPPPP]		     
				       [PPPPPPPPPPPPPPPPPPPPP]		     
				       [PPPPPPPPPPPPPPPPPPPPPP]	     
				       [PPPPPPPPPPPPPPPPPPPPPPPP]	     
				       [PPPPPPPPPPPPPPPPPPPPPPPPP]	     
				       [PPPPPPPPPPPPPPPPPPPPPPPPPPPP]	     
				       [PPPPPPPPPPPPPPPPPPPPPPPPPPPPP]	     
				       [PPPPPPPPPPPPPPPPPPPPPPPPPPPPPP]     
				       [PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP]   
				       [VIRUS]				     
				       [Virus]                              

Mean Percent ASA per Chain
You may choose to search on the mean percent accessible surface area per chain that is lost upon complex formation relative to the isolated chains. The default is to ignore this field.
WARNING: The database is preliminary and some entries that have been loaded show very small (or even negative delta-ASA) - this is due to a problem in considering the relative ASA for sugars, large hetgroups and nucleic acids, therefore very low values are most likely to be wrong at the moment.


Order output by
You may choose to order your hit list by the selected column.
The default is to sort on PDB idcode.


Complex Searches
Please Note if you carry out a simple query, e.g. all DIMERIC complexes, that will return a large number of hits, then the return of the statistics per molecule will take some time. Therefore very simple queries will have the stats part turned off. If you want the details of possible di-sulphides, saltbridges, surface buried residues, and properties of contributing chains to the assembly then force the use of statistics in the last check box below. All queries that have at least 2 conditions, e.g. author and dimeric will have statistics automatically.

spacer
spacer