PROCOGNATE

Main Page |
Help |
Stats |
Download


Download

PROCOGNATE Flat File

From this page you can download the flat files for the SCOP, CATH and Pfam cognate ligand domain mappings.


Version 1.5 Jun 2008

Download SCOP flat file v1.6 (28 MB)

Download CATH flat file v1.6 (29 MB)

Download Pfam flat file v1.6 (41 MB)


Version 1.5 Jun 2008

Download SCOP flat file v1.5 (24 MB)

Download CATH flat file v1.5 (25 MB)

Download Pfam flat file v1.5 (32 MB)


Version 1.4 Dec 2007

Download SCOP flat file v1.4 (17 MB)

Download CATH flat file v1.4 (25 MB)

Download Pfam flat file v1.4 (30 MB)


Version 1.3 May 2007

Download SCOP flat file v1.3 (16 MB)

Download CATH flat file v1.3 (17 MB)

Download Pfam flat file v1.3 (24 MB)


Version 1.1 August 2006

Download SCOP flat file v1.1 (16 MB)

Download CATH flat file v1.1 (16 MB)


Version 1.0 (dataset used in the JMB paper):

Download SCOP flat file v1.0 (13 MB)

Download CATH flat file v1.0 (14 MB)



Notes

This flat file takes the form of a tab delimited file with tabs (/t) separating each column or field, each row is a new record.  It should be noted that if a structure has no cognate ligands in our assignment then no entry will be found for it in the flat file.  Each line of the flat file is a SCOP/CATH/Pfam domain - PDB-ligand - assigned alternative-ligand combination.  The PDB ligands take the form of a three letter PDB HET code and the potential cognates as a KEGG comp_id.  It should also be noted that where a structure is associated with more than one EC number or KEGG reaction then multiple alternative potential cognate ligands may be present in the mapping corresponding to the different reactions catalyzed by the enzyme.



Domain - Ligand Assignment

During creation of the mapping the binding of ligands is assigned to particular domains.  If any one domain has greater than, or equal to 75% of the contacts to any one ligand then the binding of that ligand is assigned to that domain only, thus only this domain will be recoded as binding the ligand in the flat file.  If no one domain has grater than, or equal to 75% of the contacts to a ligand then the binding of this ligand is considered to be shared, and all contacting domains will be listed in the flat file.


Cut-offs Two cut-offs have been employed in generating this data, firstly all potential cognate ligands must have a graph matching score of greater than 0.5 (this ensures good chemical similarity) and secondly all PDB ligands must have at least 3 contacts made to them by the whole structural assembly (this ensures that the ligands are not present on the outside of the assembly) in order to be considered for the mapping.


Column headings The column headings are present on the first line of the flat files, these are described in turn below.

PDBcode - This is the PDB code of the structure.

MSD_assembly_serial - This number can be used in conjunction with the PDB code for finding the same assembly in the MSDSD.

MSD_assembly_id - This id can be used for finding the assembly in the MSDSD.

EC - This is the enzyme commission number for the reaction catalyzed by the enzyme.

KEGG_reaction_id - This is the KEGG reaction id for this EC number.  Note that often there can be more than one of these per EC number.

prot_chain - This is the PDB chain code of the contacting domain.


SCOP specific:

SCOP_sunid - The SCOP domain identifier.

SCOP_sccs - SCOP code describing hierarchy: class, fold, superfamily, and family.


CATH specific:

CATH_id - The CATH domain identifier.

CATH_code - CATH code describing hierarchy: class, architecture, topology, homologus superfamily, and sequence family (s-level)


Pfam specific:

Domain_id - Unique id for each domain composed of MSD chain_id, Pfam accession number and domain start end points (PDB residue numbers).

Pfam_acc - Pfam Pfam accession number.



shared - Y or N indicates if the binding of the ligand is shared (see Domain - Ligand Assignment above).

ligand_chain - This is the PDB chain code of the ligand (note some ligands do not have a chain code).

pdb_ligand - This is the PDB three letter HET code.

ligand_pdb_seq - This is the PDB residue number of the ligand.

KEGG_comp_id - This is the KEGG id of the potential cognate ligand from the KEGG LIGAND database.

cdk_graph - This is the graph matching score of chemical similarity between the PDB ligands and the assigned potential cognate ligands.  A score of 1 indicates an exact match.

domain_count - This is the number of contacting residues the domain makes to the ligand.

total_count - This is the total number of contacting residues made to the ligand by all domains and chains of the current quaternary structural assembly.