PDBe database documentation

Entities of External database links

The entities that belong to a specific mart
Entity Details: A=Number of attributes of the Entity
R=Number of relations of the Entity
T=Name of the database table
I=Approximation of the number of instances of the entity

External database links Cross reference on entry, chain or residue level with external databases like: the Swiss-prot protein sequence database (http://www.ebi.ac.uk/swissprot/), the EC enzyme database (http://ca.expasy.org/enzyme/), the GO gene ontology database (http://www.geneontology.org/), the Interpro database of protein families, domains and functional sites (http://www.ebi.ac.uk/interpro/) and Pfam database of protein families and hidden markov models (http://pfam.wustl.edu/) the Scop database for structural classification of proteins (http://scop.mrc-lmb.cam.ac.uk/scop/) the PubMed database that provides access to millions of MEDLINE citations (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed)
Swiss-Prot link per Chain A:7, R:4, T:SWP_CHAIN, I:0	Links to Swiss-prot protein sequence database (http://www.ebi.ac.uk/swissprot/) on the chain level
EC Enzyme Database A:6, R:2, T:EC_MAPPING, I:11600	Links to the EC enzyme database (http://ca.expasy.org/enzyme/) and Swiss-Prot on the molecule level
PubMed Citation Database A:4, R:1, T:PUBMEDLIST, I:30800	Links to the PubMed database that provides access to millions of MEDLINE citations (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed) on the entry level
SCOP per chain A:6, R:2, T:SCOP_INT, I:8800000	Links to SCOP database (http://nar.oupjournals.org/cgi/content/full/30/1/264) for structural classification of proteins (http://scop.mrc-lmb.cam.ac.uk/scop/) on the chain level
CATH per chain A:6, R:2, T:CATH_INT, I:9040000	Links to the CATH Protein Structure Classification database (http://www.biochem.ucl.ac.uk/bsm/cath/) on a detailed chain level
PFAM per chain A:5, R:2, T:PFAM_INT, I:0	Links to the PFAM database of protein families and profile hidden markov models (http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D138) , (http://nar.oupjournals.org/cgi/content/full/30/1/276?ijkey=wfgjAWVRY.wto&keytype=ref&siteid=nar) on the chain level (http://www.sanger.ac.uk/Software/Pfam/help/index.shtml)
GO Gene Ontology A:5, R:1, T:GO_SWP_MAPPING, I:311000	Links to the GO gene ontology database (http://www.geneontology.org/) and Interpro database on the entry level
Interpro Protein Family A:5, R:1, T:INT_SWP_MAPPING, I:262000	Links to the Interpro database of protein families, domains and functional sites (http://www.ebi.ac.uk/interpro/) on the chain level
Representative Entries A:5, R:1, T:REPRESENTATIVE, I:7620	Grouping of entries in set of representative entries for various purposes (ie SCOP, DALI). This is very convinient in research or statistics operations because it resolves the effect from the existence of many structures of the same or similar proteins in the PDB that can bias such results
Related Entry A:7, R:2, T:RELATED_ENTRY, I:27200	Entries that are related in some way (type of relation)
Swiss-Prot link per Entry A:4, R:3, T:SP_MAP_INT, I:0	Links to Swiss-prot protein sequence database (http://www.ebi.ac.uk/swissprot/) on the entry level
Swiss-Prot description A:2, R:3, T:SP_DESCRIPTION, I:0	Descriptions provided by Swissprot database
Swiss-Prot keyword A:2, R:3, T:SP_KEYWORD, I:0	Keywords referred in Swissprot database
Swiss-Prot Protein Knowledgebase A:28, R:7, T:SWISS_PROT_MAPPING, I:9950000	Links to Swiss-prot protein sequence database (http://www.ebi.ac.uk/swissprot/) on the very detailed residue level
SCOP Structural Classification of Proteins A:31, R:5, T:SCOP_MAPPING, I:8800000	Links to SCOP database (http://nar.oupjournals.org/cgi/content/full/30/1/264) for structural classification of proteins (http://scop.mrc-lmb.cam.ac.uk/scop/) on the residue level
CATH Protein Structure Classification A:29, R:5, T:CATH_MAPPING, I:9040000	Links to the CATH Protein Structure Classification database (http://www.biochem.ucl.ac.uk/bsm/cath/) on a detailed residue level
PFAM per Residue A:30, R:5, T:PFAM_MAPPING, I:0	Links to the PFAM database of protein families and profile hidden markov models (http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D138) , (http://nar.oupjournals.org/cgi/content/full/30/1/276?ijkey=wfgjAWVRY.wto&keytype=ref&siteid=nar) on the detailed residue level (http://www.sanger.ac.uk/Software/Pfam/help/index.shtml)
PFAM Protein Family A:9, R:1, T:PFAM_SWP_MAPPING, I:262000	Links to the Pfam database of protein families and hidden markov models (http://nar.oupjournals.org/cgi/content/full/30/1/276?ijkey=wfgjAWVRY.wto&keytype=ref&siteid=nar) on the chain level

Attributes/Relations of External database links Entities

The attributes and relations that belong to a specific entity
Attribute Details: Type of the attribute String

:String,

:Integer,

:Number,

:Date,

:Unknown
C=Name of the database column
S=Maximum size of the attribute
=Actual average size used for the attribute
Naming

The attribute is a part of the name of an instance

The attribute is a part of the reference key of an instance
Hidden

The attribute is not supposed to be visible and used for queries
Summery

The attribute is supposed to be used in summary reports (lists) for the entity

Relation Details: Cardinality of the relation

:Optional, Many

:Many

=Reverse relation of the entity that the relation refers to

=Entity that this relation establishes an association (reverse entity)

The relation is the containment relation of the entity

The relation is associated with an external entity from a different mart
Hidden

The relation is not supposed to be visible and used for queries

Swiss-Prot link per Chain Links to Swiss-prot protein sequence database (http://www.ebi.ac.uk/swissprot/) on the chain level Reference attributes:Chain Id,SwissProt accession - Naming attributes:Accession Code,Chain Code,SwissProt accession
Chain Id C:CHAIN_ID, S:10, A:0.0	The database identifier of the Chain
SwissProt accession C:SP_PRIMARY_ID, S:15, A:0.0	The accession number (AC) associated with a swissprot entry (http://ca.expasy.org/sprot/userman.html#AC_line)
Swissprot entry C:SP_SECONDARY_ID, S:255, A:0.0	Swissprot entry name: The first item on the ID line is the entry name of the sequence. This name is a useful means of identifying a sequence (http://ca.expasy.org/sprot/userman.html#ID_line)
Accession Code C:ACCESSION_CODE, S:8, A:0.0	The PDB accession code of the entry
Chain Code C:CHAIN_CODE, S:8, A:0.0	The standard code of the chain that uniquely identifies it in the assembly. It is an extension of the PDB chain Id. In cases where symmetry operations have been applied to a chain, these chains are named with a numeric suffix, ie. A,A1,A2,A3 ... The chain id specified in the PDB file is also ignored for waters and bound molecules, and their codes are derived from the name of the chain that they are bound to. Finally there are no "null" chain codes and in cases where no id was specified in the PDB file, then arbitrary chain codes are assigned (i.e. A,B)
Chain PDB Code C:CHAIN_PDB_CODE, S:1, A:0.0	The original code of the chain as found in the PDB. There are problems with the chain code since it is not used in a consistent way in the PDB. Firstly in many cases this is null in cases where there is a single chain in the entry. Additionally very often the same chain code is used both for a polymer chain and a bound molecule (that is bound to it). So generally the PDB chain code is often not a distinct identified for a chain. For this reason the chain code was introduced which is consistent and uniform. The purpose of the chain code is to uniquely identify a chain in an assembly. So in cases where chain A is used 4 times in an assembly, the generated chains will have chain codes A, A1, A2, A3. Although for the chain that has been marked as non-symmetric valid (that should be used to extract the original asymmetric PDB data), then the original PDB code is used (if it is correct) i.e. A. In these cases where a chain in the PDB did not have a chain code, then the first not used letter is reserved (i.e. A). When 2 different chains (i.e. polymer chain and bound molecule chain) share the same PDB code, then the chain code of the bound molecule is consistently derived from the chain code of the polymer chain
Entry Id C:ENTRY_ID, S:10, A:0.0	The database identifier of the Entry
has related	Reverse relation:for of Reverse entity:Swiss-Prot description - Relation attributes:SwissProt accession
has related	Reverse relation:for of Reverse entity:Swiss-Prot keyword - Relation attributes:SwissProt accession
for related	Reverse relation:has of Reverse entity:Chain - Relation attributes:Chain Id
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id

EC Enzyme Database Links to the EC enzyme database (http://ca.expasy.org/enzyme/) and Swiss-Prot on the molecule level Reference attributes:Molecule Id,SwissProt accession,EC Number - Naming attributes:Accession Code,EC Number
Accession Code C:ACCESSION_CODE, S:8, A:4.0	The PDB accession code of the entry
EC Number C:EC_NUMBER, S:12, A:8.0	EC Number of the Enzyme Nomenclature (http://www.chem.qmul.ac.uk/iubmb/enzyme/)
Entry Id C:ENTRY_ID, S:10, A:4.0	The database identifier of the Entry
Molecule Id C:MOLECULE_ID, S:10, A:4.0	The database identifier of the Molecule
SwissProt accession C:SP_PRIMARY_ID, S:15, A:6.0	The accession number (AC) associated with a swissprot entry (http://ca.expasy.org/sprot/userman.html#AC_line)
Swissprot entry C:SP_SECONDARY_ID, S:255, A:9.0	Swissprot entry name: The first item on the ID line is the entry name of the sequence. This name is a useful means of identifying a sequence (http://ca.expasy.org/sprot/userman.html#ID_line)
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id
for related	Reverse relation:has of Reverse entity:Molecule - Relation attributes:Molecule Id

PubMed Citation Database Links to the PubMed database that provides access to millions of MEDLINE citations (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed) on the entry level Reference attributes:Entry Id,Ordinal - Naming attributes:Accession Code,Ordinal,PubMed Id
Entry Id C:ENTRY_ID, S:10, A:4.0	The database identifier of the Entry
Ordinal C:ORDINAL, S:3, A:2.0	Refers to the location where the PubMed entry corresponds in the PDB data
Accession Code C:ACCESSION_CODE, S:4, A:4.0	The PDB accession code of the entry
PubMed Id C:PUBMEDID, S:10, A:5.0	The PMID PubMed Unique Identifier Unique number assigned to each PubMed citation (http://www.ncbi.nlm.nih.gov/entrez/query/static/help/pmhelp.html#MEDLINEDisplayFormat)
refers related	Reverse relation:referred of Reverse entity:Entry - Relation attributes:Entry Id,Ordinal

SCOP per chain Links to SCOP database (http://nar.oupjournals.org/cgi/content/full/30/1/264) for structural classification of proteins (http://scop.mrc-lmb.cam.ac.uk/scop/) on the chain level Reference attributes:Scop sunid,Chain Id - Naming attributes:Accession Code,Chain Code,Scop sunid,Sccs
Accession Code C:ACCESSION_CODE, S:8, A:4.0	The PDB accession code of the entry
Chain Code C:CHAIN_CODE, S:8, A:2.0	The standard code of the chain that uniquely identifies it in the assembly. It is an extension of the PDB chain Id. In cases where symmetry operations have been applied to a chain, these chains are named with a numeric suffix, ie. A,A1,A2,A3 ... The chain id specified in the PDB file is also ignored for waters and bound molecules, and their codes are derived from the name of the chain that they are bound to. Finally there are no "null" chain codes and in cases where no id was specified in the PDB file, then arbitrary chain codes are assigned (i.e. A,B)
Chain Id C:CHAIN_ID, S:10, A:4.0	The database identifier of the Chain
Entry Id C:ENTRY_ID, S:0, A:4.0	The database identifier of the Entry
Sccs C:SCCS, S:20, A:8.0	Scop family identifier
Scop sunid C:SUNID, S:0, A:4.0	A number which uniquely identifies each entry in the SCOP hierarchy, including leaves and entries corresponding to the protein level (http://scop.mrc-lmb.cam.ac.uk/scop/release-notes.html#sunid)
for related	Reverse relation:has of Reverse entity:Chain - Relation attributes:Chain Id
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id

CATH per chain Links to the CATH Protein Structure Classification database (http://www.biochem.ucl.ac.uk/bsm/cath/) on a detailed chain level Reference attributes:CATH domain name,Chain Id - Naming attributes:Accession Code,Chain Code,CATH superfamily code
Chain Id C:CHAIN_ID, S:10, A:5.0	The database identifier of the Chain
Chain Code C:CHAIN_CODE, S:24, A:2.0	The standard code of the chain that uniquely identifies it in the assembly. It is an extension of the PDB chain Id. In cases where symmetry operations have been applied to a chain, these chains are named with a numeric suffix, ie. A,A1,A2,A3 ... The chain id specified in the PDB file is also ignored for waters and bound molecules, and their codes are derived from the name of the chain that they are bound to. Finally there are no "null" chain codes and in cases where no id was specified in the PDB file, then arbitrary chain codes are assigned (i.e. A,B)
Entry Id C:ENTRY_ID, S:10, A:4.0	The database identifier of the Entry
Accession Code C:ACCESSION_CODE, S:24, A:4.0	The PDB accession code of the entry
CATH domain name C:CATH_ID, S:18, A:6.0	Cath Domain Names: MUST be SIX characters (e.g. 1cuk03). CHARACTERS 1-4: PDB Code: The first 4 characters determine the PDB code e.g. 1cuk CHARACTER 5: Chain Character: This determines which PDB chain is represented.A chain character of zero ('0') indicates that the PDB file has no chain field. CHARACTER 6: Domain Number: A domain number of ZERO ('0') indicates that the domain is a whole PDB chain. (http://www.biochem.ucl.ac.uk/bsm/cath/formats/CathList.html)
CATH superfamily code C:CATHCODE, S:45, A:11.0	CATH superfamily code that provide information about the CATH hierarchy. (http://www.biochem.ucl.ac.uk/bsm/cath/cath_info.html) The hierarchy is build up by the following levels - Architecture, A-level: This describes the overall shape of the domain structure as determined by the orientations of the secondary structures but ignores the connectivity between the secondary structures. - Topology (Fold family), T-level: Structures are grouped into fold families at this level depending on both the overall shape and connectivity of the secondary structures. - Homologous Superfamily, H-level: This level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous. - Sequence families, S-level: Structures within each H-level are further clustered on sequence identity. (http://www.biochem.ucl.ac.uk/bsm/cath/formats/CathDomainDescriptionFile.html)
for related	Reverse relation:has of Reverse entity:Chain - Relation attributes:Chain Id
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id

PFAM per chain Links to the PFAM database of protein families and profile hidden markov models (http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D138) , (http://nar.oupjournals.org/cgi/content/full/30/1/276?ijkey=wfgjAWVRY.wto&keytype=ref&siteid=nar) on the chain level (http://www.sanger.ac.uk/Software/Pfam/help/index.shtml) Reference attributes:Pfam Id,Chain Id - Naming attributes:Accession Code,Chain Code,Pfam Id
Accession Code C:ACCESSION_CODE, S:24, A:0.0	The PDB accession code of the entry
Entry Id C:ENTRY_ID, S:10, A:0.0	The database identifier of the Entry
Chain Id C:CHAIN_ID, S:10, A:0.0	The database identifier of the Chain
Chain Code C:CHAIN_CODE, S:24, A:0.0	The standard code of the chain that uniquely identifies it in the assembly. It is an extension of the PDB chain Id. In cases where symmetry operations have been applied to a chain, these chains are named with a numeric suffix, ie. A,A1,A2,A3 ... The chain id specified in the PDB file is also ignored for waters and bound molecules, and their codes are derived from the name of the chain that they are bound to. Finally there are no "null" chain codes and in cases where no id was specified in the PDB file, then arbitrary chain codes are assigned (i.e. A,B)
Pfam Id C:PFAM_ID, S:30, A:0.0	The PFAM accession number
for related	Reverse relation:has of Reverse entity:Chain - Relation attributes:Chain Id
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id

GO Gene Ontology Links to the GO gene ontology database (http://www.geneontology.org/) and Interpro database on the entry level Reference attributes:Entry Id,SwissProt accession,GO Id - Naming attributes:Accession Code,GO Id
Accession Code C:ACCESSION_CODE, S:8, A:4.0	The PDB accession code of the entry
GO Id C:GO_ID, S:10, A:10.0	The database identifier of the Go
Entry Id C:ENTRY_ID, S:10, A:4.0	The database identifier of the Entry
SwissProt accession C:SP_PRIMARY_ID, S:15, A:6.0	The accession number (AC) associated with a swissprot entry (http://ca.expasy.org/sprot/userman.html#AC_line)
Swissprot entry C:SP_SECONDARY_ID, S:255, A:10.0	Swissprot entry name: The first item on the ID line is the entry name of the sequence. This name is a useful means of identifying a sequence (http://ca.expasy.org/sprot/userman.html#ID_line)
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id

Interpro Protein Family Links to the Interpro database of protein families, domains and functional sites (http://www.ebi.ac.uk/interpro/) on the chain level Reference attributes:Entry Id,SwissProt accession,InterPro Id - Naming attributes:Accession Code,InterPro Id
Accession Code C:ACCESSION_CODE, S:8, A:4.0	The PDB accession code of the entry
Entry Id C:ENTRY_ID, S:10, A:4.0	The database identifier of the Entry
InterPro Id C:INTERPRO_ID, S:9, A:9.0	Interpro entry accession number (http://www.ebi.ac.uk/interpro/tutorial.html#N556)
SwissProt accession C:SP_PRIMARY_ID, S:15, A:6.0	The accession number (AC) associated with a swissprot entry (http://ca.expasy.org/sprot/userman.html#AC_line)
Swissprot entry C:SP_SECONDARY_ID, S:255, A:10.0	Swissprot entry name: The first item on the ID line is the entry name of the sequence. This name is a useful means of identifying a sequence (http://ca.expasy.org/sprot/userman.html#ID_line)
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:

Representative Entries Grouping of entries in set of representative entries for various purposes (ie SCOP, DALI). This is very convinient in research or statistics operations because it resolves the effect from the existence of many structures of the same or similar proteins in the PDB that can bias such results Reference attributes:Entry Id,Source - Naming attributes:Source,Accession Code
Entry Id C:ENTRY_ID, S:10, A:4.0	The database identifier of the Entry
Source C:SOURCE, S:10, A:4.0	The source of the set of representative entries (i.e. SCOP,DALI). It also serves as the identifier and discriminator of representative sets
Accession Code C:ACCESSION_CODE, S:8, A:4.0	The PDB accession code of the entry
Details C:DETAILS, S:100, A:1.0
Release Status C:REL_STATUS, S:1, A:1.0	A flag that specifies if the current entry is publicly released or not
has related	Reverse relation:in of Reverse entity:Entry - Relation attributes:Entry Id

Related Entry Entries that are related in some way (type of relation) Reference attributes:Related Entry Id - Naming attributes:Accession Code,Related Accession Code
Related Entry Id C:RELATED_ENTRY_ID, S:10, A:4.0	The database identifier of the Related Entry
Accession Code C:ACCESSION_CODE, S:8, A:4.0	The PDB accession code of the entry
Related Accession Code C:RELATED_ACCESSION_CODE, S:8, A:4.0	The PDB accession code of the related entry
Dre Id C:DRE_ID, S:10, A:4.0	The database identifier of the Dre
Entry Id C:ENTRY_ID, S:10, A:4.0	The database identifier of the Entry
Relation Type C:RELATION_TYPE, S:80, A:7.0	The type and source of the relation with the related entry (ie PDB remark 900)
Relationship Details C:RELATIONSHIP_DETAILS, S:255, A:59.0	Textual information about the relation
of related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id
related related	Reverse relation:related of Reverse entity:Entry - Relation attributes:Related Entry Id

Swiss-Prot link per Entry Links to Swiss-prot protein sequence database (http://www.ebi.ac.uk/swissprot/) on the entry level Reference attributes:SwissProt accession,Entry Id - Naming attributes:Accession Code,SwissProt accession
SwissProt accession C:SP_PRIMARY_ID, S:15, A:0.0	The accession number (AC) associated with a swissprot entry (http://ca.expasy.org/sprot/userman.html#AC_line)
Swissprot entry C:SP_SECONDARY_ID, S:255, A:0.0	Swissprot entry name: The first item on the ID line is the entry name of the sequence. This name is a useful means of identifying a sequence (http://ca.expasy.org/sprot/userman.html#ID_line)
Accession Code C:ACCESSION_CODE, S:8, A:0.0	The PDB accession code of the entry
Entry Id C:ENTRY_ID, S:10, A:0.0	The database identifier of the Entry
has related	Reverse relation:for of Reverse entity:Swiss-Prot description - Relation attributes:SwissProt accession
has related	Reverse relation:for of Reverse entity:Swiss-Prot keyword - Relation attributes:SwissProt accession
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id

Swiss-Prot description Descriptions provided by Swissprot database Reference attributes:SwissProt accession,Description text - Naming attributes:SwissProt accession,Description text
SwissProt accession C:PRIMARYACC#, S:15, A:0.0	The accession number (AC) associated with a swissprot entry
Description text C:TEXT, S:4000, A:0.0
for related	Reverse relation:has of Reverse entity:Swiss-Prot Protein Knowledgebase - Relation attributes:SwissProt accession
for related	Reverse relation:has of Reverse entity:Swiss-Prot link per Chain - Relation attributes:SwissProt accession
for related	Reverse relation:has of Reverse entity:Swiss-Prot link per Entry - Relation attributes:SwissProt accession

Swiss-Prot keyword Keywords referred in Swissprot database Reference attributes:SwissProt accession,Keyword - Naming attributes:SwissProt accession,Keyword
SwissProt accession C:PRIMARYACC#, S:15, A:0.0	The accession number (AC) associated with a swissprot entry
Keyword C:KEYWORD, S:80, A:0.0
for related	Reverse relation:has of Reverse entity:Swiss-Prot Protein Knowledgebase - Relation attributes:SwissProt accession
for related	Reverse relation:has of Reverse entity:Swiss-Prot link per Entry - Relation attributes:SwissProt accession
for related	Reverse relation:has of Reverse entity:Swiss-Prot link per Chain - Relation attributes:SwissProt accession

Swiss-Prot Protein Knowledgebase Links to Swiss-prot protein sequence database (http://www.ebi.ac.uk/swissprot/) on the very detailed residue level Reference attributes:Swp Mapping Id - Naming attributes:Accession Code,Assembly Serial,Chain Code,Residue Serial,SwissProt accession
Swp Mapping Id C:SWP_MAPPING_ID, S:0, A:5.0	The database identifier of the Swp Mapping
Accession Code C:ACCESSION_CODE, S:8, A:4.0	The PDB accession code of the entry
Assembly Serial C:ASSEMBLY_SERIAL, S:38, A:2.0	The serial identifier of the assembly in the entry
Chain Code C:CHAIN_CODE, S:8, A:2.0	The standard code of the chain that uniquely identifies it in the assembly. It is an extension of the PDB chain Id. In cases where symmetry operations have been applied to a chain, these chains are named with a numeric suffix, ie. A,A1,A2,A3 ... The chain id specified in the PDB file is also ignored for waters and bound molecules, and their codes are derived from the name of the chain that they are bound to. Finally there are no "null" chain codes and in cases where no id was specified in the PDB file, then arbitrary chain codes are assigned (i.e. A,B)
Residue Serial C:RESIDUE_SERIAL, S:38, A:3.0	Serial number of the residue in the chain. Starts with 1 for the first residue (N-terminal or 5'-terminal) in the chain, and increases by 1 with each position along the chain uniquely identifying the residue in the chain.
SwissProt accession C:SP_PRIMARY_ID, S:15, A:7.0	The accession number (AC) associated with a swissprot entry (http://ca.expasy.org/sprot/userman.html#AC_line)
Assembly Id C:ASSEMBLY_ID, S:10, A:4.0	The database identifier of the Assembly
PDB Export Chain Code C:CHAIN_CODE_1_LETTER, S:1, A:1.0	This is an additional 1 letter code that uniquely identifies it in the assembly. It is arbitrary and its purpose is to be able to export files in PDB format
Chain Id C:CHAIN_ID, S:10, A:4.0	The database identifier of the Chain
Chain PDB Code C:CHAIN_PDB_CODE, S:1, A:1.0	The original code of the chain as found in the PDB. There are problems with the chain code since it is not used in a consistent way in the PDB. Firstly in many cases this is null in cases where there is a single chain in the entry. Additionally very often the same chain code is used both for a polymer chain and a bound molecule (that is bound to it). So generally the PDB chain code is often not a distinct identified for a chain. For this reason the chain code was introduced which is consistent and uniform. The purpose of the chain code is to uniquely identify a chain in an assembly. So in cases where chain A is used 4 times in an assembly, the generated chains will have chain codes A, A1, A2, A3. Although for the chain that has been marked as non-symmetric valid (that should be used to extract the original asymmetric PDB data), then the original PDB code is used (if it is correct) i.e. A. In these cases where a chain in the PDB did not have a chain code, then the first not used letter is reserved (i.e. A). When 2 different chains (i.e. polymer chain and bound molecule chain) share the same PDB code, then the chain code of the bound molecule is consistently derived from the chain code of the polymer chain
Ligand code C:CHEM_COMP_CODE, S:12, A:7.0	The standard extended molecule code of the aminoacid or ligand. It is composed by the PDB 3 letter code with an optional topological indicator appended after an underscore
Cif Serial C:CIF_SERIAL, S:8, A:1.0	(obsolete)
Code 1 Letter C:CODE_1_LETTER, S:5, A:1.0	One code letter for the residue as specified in the PDB sequence and structure.
Conflict Type C:CONFLICT_TYPE, S:255, A:2.0	It marks and classifies conflicts between the PDBe (PDB) and Swiss-prot (for future use)
Segment location C:DSC_TYPE, S:3, A:3.0	Provides information about the location of the residue on the mapped segment with swiss-prot (ie begining, end)
Entry Id C:ENTRY_ID, S:10, A:4.0	The database identifier of the Entry
Molecule Id C:MOLECULE_ID, S:10, A:4.0	The database identifier of the Molecule
Ncbi Tax Id C:NCBI_TAX_ID, S:15, A:4.0	The NCBI taxonomy identifier (taxid) that points to a node of the taxonomy tree
Valid In Assymetric Unit C:NON_ASSEMBLY_VALID, S:1, A:1.0	This item is to be used not only in an assembly context, but also to represent the original asymmetric unit
Not Observed C:NOT_OBSERVED, S:1, A:1.0	The residue's coordinates are not available because the residue was not observed in the experiment data. There are no coordinates for any of its atoms.
Residue Id C:RESIDUE_ID, S:0, A:5.0	The database identifier of the Residue
Residue PDB Code C:RESIDUE_PDB_CODE, S:3, A:3.0	The code of the residue or ligand as was defined in the PDB. The reference ligand (chem comp) code should be used instead, since in cases where these two are different there was some error in the original PDB data that was identified during clean up. Common cases of these are 1) The chemical structure implied by the PDB coordinates is entirely irrelevant with the ligand with this code in the chemical dictionary. A big mess in the PDB data. 2) The structure in the PDB is a structurally modified version of the ligand with this code for example an extra atom was introduced (i.e. modified aminoacids). In these cases a new ligand is defined in the chemical dictionary and is assigned to this residue or bound molecule. 3) The PDB coordinates imply a different stereoisomer. A new ligand is introduced in the chemical dictionary for the new stereoisomer.
Residue PDB Insert Code C:RESIDUE_PDB_INSERT_CODE, S:1, A:1.0	The insertion code of the residue, as was originally found in the PDB. The residue serial should be used instead since the PDB SEQ and INSERT CODE are not consistently and uniformly used in PDB
Residue PDB Seq C:RESIDUE_PDB_SEQ, S:4, A:3.0	The sequence of the residue, as was originally found in the PDB (has to be used together with insert code).
Residue Type C:RESIDUE_TYPE, S:1, A:1.0	The type of the component R:residue, B:bound molecule, W:water. This normally has to correspond with the type of the chain where there residue belongs
Swissprot entry C:SP_SECONDARY_ID, S:255, A:10.0	Swissprot entry name: The first item on the ID line is the entry name of the sequence. This name is a useful means of identifying a sequence (http://ca.expasy.org/sprot/userman.html#ID_line)
Swissprot residue serial C:SP_SERIAL, S:10, A:3.0	The serial identifier of the residue in the swiss-prot sequence. This does not correspond to the PDB residue serial because often only partial fragments of the actual sequence are involved and observed in the PDB experiment
SwissProt 1 Letter Code C:SP_1_LETTER_CODE, S:1, A:1.0	This is the 1 letter code of the aminoacid as specified in the Swiss-prot database. This can be different from the code in the PDB in very few cases where for several reasons the protein sequence and the actual structure observed in the experiment deviate.
has related	Reverse relation:for of Reverse entity:Swiss-Prot description - Relation attributes:SwissProt accession
for related	Reverse relation:has of Reverse entity:Assembly - Relation attributes:Assembly Id
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id
for related	Reverse relation:has of Reverse entity:Residue - Relation attributes:Residue Id
for related	Reverse relation:has of Reverse entity:Molecule - Relation attributes:Molecule Id
for related	Reverse relation:has of Reverse entity:Chain - Relation attributes:Chain Id
has related	Reverse relation:for of Reverse entity:Swiss-Prot keyword - Relation attributes:SwissProt accession

SCOP Structural Classification of Proteins Links to SCOP database (http://nar.oupjournals.org/cgi/content/full/30/1/264) for structural classification of proteins (http://scop.mrc-lmb.cam.ac.uk/scop/) on the residue level Reference attributes:Scop Mapping Id - Naming attributes:Accession Code,Assembly Serial,Chain Code,Residue Serial,Sccs,Scop sunid
Scop Mapping Id C:SCOP_MAPPING_ID, S:0, A:5.0	The database identifier of the Scop Mapping
Accession Code C:ACCESSION_CODE, S:8, A:4.0	The PDB accession code of the entry
Assembly Serial C:ASSEMBLY_SERIAL, S:38, A:2.0	The serial identifier of the assembly in the entry
Chain Code C:CHAIN_CODE, S:8, A:2.0	The standard code of the chain that uniquely identifies it in the assembly. It is an extension of the PDB chain Id. In cases where symmetry operations have been applied to a chain, these chains are named with a numeric suffix, ie. A,A1,A2,A3 ... The chain id specified in the PDB file is also ignored for waters and bound molecules, and their codes are derived from the name of the chain that they are bound to. Finally there are no "null" chain codes and in cases where no id was specified in the PDB file, then arbitrary chain codes are assigned (i.e. A,B)
Residue Serial C:RESIDUE_SERIAL, S:38, A:3.0	Serial number of the residue in the chain. Starts with 1 for the first residue (N-terminal or 5'-terminal) in the chain, and increases by 1 with each position along the chain uniquely identifying the residue in the chain.
Scop Id C:SCOP_ID, S:8, A:7.0	The old scop identifier (sid) (http://scop.bic.nus.edu.sg/release-notes.html)
Assembly Id C:ASSEMBLY_ID, S:10, A:4.0	The database identifier of the Assembly
Begin residue C:BEG_RES, S:8, A:2.0	The PDB seq of the first residue of the chain segment that belongs in the same mapping with this SCOP domain
Chain C:CHAIN, S:15, A:1.0	This field contains the chain identifier as in the original scop data, it should be the same as in the CHAIN_PDB_CODE , except in cases that, this particular entry has been cleaned up
PDB Export Chain Code C:CHAIN_CODE_1_LETTER, S:1, A:1.0	This is an additional 1 letter code that uniquely identifies it in the assembly. It is arbitrary and its purpose is to be able to export files in PDB format
Chain Id C:CHAIN_ID, S:10, A:4.0	The database identifier of the Chain
Chain Msd Code C:CHAIN_MSD_CODE, S:8, A:5.0	An internal longer code for a chain (defined by MSD) that includes the type of the chain (protein, bound molecule etc). It does not identify uniquely a chain in an assembly; the chain code has to be used instead
Chain PDB Code C:CHAIN_PDB_CODE, S:1, A:1.0	The original code of the chain as found in the PDB. There are problems with the chain code since it is not used in a consistent way in the PDB. Firstly in many cases this is null in cases where there is a single chain in the entry. Additionally very often the same chain code is used both for a polymer chain and a bound molecule (that is bound to it). So generally the PDB chain code is often not a distinct identified for a chain. For this reason the chain code was introduced which is consistent and uniform. The purpose of the chain code is to uniquely identify a chain in an assembly. So in cases where chain A is used 4 times in an assembly, the generated chains will have chain codes A, A1, A2, A3. Although for the chain that has been marked as non-symmetric valid (that should be used to extract the original asymmetric PDB data), then the original PDB code is used (if it is correct) i.e. A. In these cases where a chain in the PDB did not have a chain code, then the first not used letter is reserved (i.e. A). When 2 different chains (i.e. polymer chain and bound molecule chain) share the same PDB code, then the chain code of the bound molecule is consistently derived from the chain code of the polymer chain
Ligand code C:CHEM_COMP_CODE, S:12, A:7.0	The standard extended molecule code of the aminoacid or ligand. It is composed by the PDB 3 letter code with an optional topological indicator appended after an underscore
Code 1 Letter C:CODE_1_LETTER, S:5, A:1.0	One code letter for the residue as specified in the PDB sequence and structure.
3 Letter Code C:CODE_3_LETTER, S:3, A:3.0	This attribute provides a code from the chem comp dictionary for standard residues. This attribute must be the same for small molecules that represent our variations on topology/chemistry for a polymer component e.g. All ALA's should have a code_3_letter of ALA. All adenosine nucleotides should have a 3 letter code of A, except for those that have a topology of 'free'. This code is now obsolete and the Comp Code should be used instead in most cases
End Residue C:END_RES, S:8, A:2.0	The PDB seq of the last residue of the chain segment that belongs in the same mapping with this SCOP domain
Entry Id C:ENTRY_ID, S:0, A:4.0	The database identifier of the Entry
Ncbi Tax Id C:NCBI_TAX_ID, S:15, A:4.0	The NCBI taxonomy identifier (taxid) that points to a node of the taxonomy tree
Valid In Assymetric Unit C:NON_ASSEMBLY_VALID, S:1, A:1.0	This item is to be used not only in an assembly context, but also to represent the original asymmetric unit
Order In C:ORDER_IN, S:0, A:2.0	The serial of the mapping to a cath domain when 2 different segments of the same chain map to the same SCOP domain
PDB Insert Code C:PDB_INSERT_CODE, S:1, A:1.0	The insertion code of the residue, as was originally found in the PDB.
Residue Id C:RESIDUE_ID, S:0, A:5.0	The database identifier of the Residue
Residue PDB Code C:RESIDUE_PDB_CODE, S:3, A:3.0	The code of the residue or ligand as was defined in the PDB. The reference ligand (chem comp) code should be used instead, since in cases where these two are different there was some error in the original PDB data that was identified during clean up. Common cases of these are 1) The chemical structure implied by the PDB coordinates is entirely irrelevant with the ligand with this code in the chemical dictionary. A big mess in the PDB data. 2) The structure in the PDB is a structurally modified version of the ligand with this code for example an extra atom was introduced (i.e. modified aminoacids). In these cases a new ligand is defined in the chemical dictionary and is assigned to this residue or bound molecule. 3) The PDB coordinates imply a different stereoisomer. A new ligand is introduced in the chemical dictionary for the new stereoisomer.
Residue PDB Seq C:RESIDUE_PDB_SEQ, S:4, A:3.0	The sequence of the residue, as was originally found in the PDB (has to be used together with insert code).
Sccs C:SCCS, S:20, A:8.0	Scop family identifier
Swissprot entry C:SP_SECONDARY_ID, S:1, A:1.0	Swissprot entry name: The first item on the ID line is the entry name of the sequence. This name is a useful means of identifying a sequence (http://ca.expasy.org/sprot/userman.html#ID_line)
SwissProt accession C:SP_PRIMARY_ID, S:15, A:6.0	The accession number (AC) associated with a swissprot entry (http://ca.expasy.org/sprot/userman.html#AC_line)
Swissprot residue serial C:SP_SERIAL, S:0, A:3.0	The serial identifier of the residue in the swiss-prot sequence. This does not correspond to the PDB residue serial because often only partial fragments of the actual sequence are involved and observed in the PDB experiment
Scop sunid C:SUNID, S:0, A:4.0	A number which uniquely identifies each entry in the SCOP hierarchy, including leaves and entries corresponding to the protein level (http://scop.mrc-lmb.cam.ac.uk/scop/release-notes.html#sunid)
SwissProt 1 Letter Code C:SP_1_LETTER_CODE, S:0, A:0.0	This is the 1 letter code of the aminoacid as specified in the Swiss-prot database. This can be different from the code in the PDB in very few cases where for several reasons the protein sequence and the actual structure observed in the experiment deviate.
of organism related	Reverse relation:belong of Reverse entity:NCBI Taxonomy - Relation attributes:Ncbi Tax Id
for related	Reverse relation:has of Reverse entity:Chain - Relation attributes:Chain Id
for related	Reverse relation:has of Reverse entity:Residue - Relation attributes:Residue Id
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id
for related	Reverse relation:has of Reverse entity:Assembly - Relation attributes:Assembly Id

CATH Protein Structure Classification Links to the CATH Protein Structure Classification database (http://www.biochem.ucl.ac.uk/bsm/cath/) on a detailed residue level Reference attributes:Cath Mapping Id - Naming attributes:Accession Code,Assembly Serial,Chain Code,Residue Serial,Ligand code,CATH domain name
Cath Mapping Id C:CATH_MAPPING_ID, S:0, A:5.0
Assembly Id C:ASSEMBLY_ID, S:10, A:4.0	The database identifier of the Assembly
Assembly Serial C:ASSEMBLY_SERIAL, S:38, A:2.0	The serial identifier of the assembly in the entry
Residue Id C:RESIDUE_ID, S:0, A:5.0	The database identifier of the Residue
Ncbi Tax Id C:NCBI_TAX_ID, S:15, A:4.0	The NCBI taxonomy identifier (taxid) that points to a node of the taxonomy tree
Chain Id C:CHAIN_ID, S:10, A:5.0	The database identifier of the Chain
Chain Code C:CHAIN_CODE, S:24, A:2.0	The standard code of the chain that uniquely identifies it in the assembly. It is an extension of the PDB chain Id. In cases where symmetry operations have been applied to a chain, these chains are named with a numeric suffix, ie. A,A1,A2,A3 ... The chain id specified in the PDB file is also ignored for waters and bound molecules, and their codes are derived from the name of the chain that they are bound to. Finally there are no "null" chain codes and in cases where no id was specified in the PDB file, then arbitrary chain codes are assigned (i.e. A,B)
Chain Msd Code C:CHAIN_MSD_CODE, S:24, A:5.0	An internal longer code for a chain (defined by MSD) that includes the type of the chain (protein, bound molecule etc). It does not identify uniquely a chain in an assembly; the chain code has to be used instead
Chain PDB Code C:CHAIN_PDB_CODE, S:3, A:1.0	The original code of the chain as found in the PDB. There are problems with the chain code since it is not used in a consistent way in the PDB. Firstly in many cases this is null in cases where there is a single chain in the entry. Additionally very often the same chain code is used both for a polymer chain and a bound molecule (that is bound to it). So generally the PDB chain code is often not a distinct identified for a chain. For this reason the chain code was introduced which is consistent and uniform. The purpose of the chain code is to uniquely identify a chain in an assembly. So in cases where chain A is used 4 times in an assembly, the generated chains will have chain codes A, A1, A2, A3. Although for the chain that has been marked as non-symmetric valid (that should be used to extract the original asymmetric PDB data), then the original PDB code is used (if it is correct) i.e. A. In these cases where a chain in the PDB did not have a chain code, then the first not used letter is reserved (i.e. A). When 2 different chains (i.e. polymer chain and bound molecule chain) share the same PDB code, then the chain code of the bound molecule is consistently derived from the chain code of the polymer chain
PDB Export Chain Code C:CHAIN_CODE_1_LETTER, S:3, A:1.0	This is an additional 1 letter code that uniquely identifies it in the assembly. It is arbitrary and its purpose is to be able to export files in PDB format
Valid In Assymetric Unit C:NON_ASSEMBLY_VALID, S:1, A:1.0	This item is to be used not only in an assembly context, but also to represent the original asymmetric unit
3 Letter Code C:CODE_3_LETTER, S:9, A:3.0	This attribute provides a code from the chem comp dictionary for standard residues. This attribute must be the same for small molecules that represent our variations on topology/chemistry for a polymer component e.g. All ALA's should have a code_3_letter of ALA. All adenosine nucleotides should have a 3 letter code of A, except for those that have a topology of 'free'. This code is now obsolete and the Comp Code should be used instead in most cases
Code 1 Letter C:CODE_1_LETTER, S:15, A:1.0	One code letter for the residue as specified in the PDB sequence and structure.
Ligand code C:CHEM_COMP_CODE, S:36, A:7.0	The standard extended molecule code of the aminoacid or ligand. It is composed by the PDB 3 letter code with an optional topological indicator appended after an underscore
Residue PDB Code C:RESIDUE_PDB_CODE, S:9, A:3.0	The code of the residue or ligand as was defined in the PDB. The reference ligand (chem comp) code should be used instead, since in cases where these two are different there was some error in the original PDB data that was identified during clean up. Common cases of these are 1) The chemical structure implied by the PDB coordinates is entirely irrelevant with the ligand with this code in the chemical dictionary. A big mess in the PDB data. 2) The structure in the PDB is a structurally modified version of the ligand with this code for example an extra atom was introduced (i.e. modified aminoacids). In these cases a new ligand is defined in the chemical dictionary and is assigned to this residue or bound molecule. 3) The PDB coordinates imply a different stereoisomer. A new ligand is introduced in the chemical dictionary for the new stereoisomer.
Residue PDB Seq C:RESIDUE_PDB_SEQ, S:4, A:3.0	The sequence of the residue, as was originally found in the PDB (has to be used together with insert code).
PDB Insert Code C:PDB_INSERT_CODE, S:3, A:1.0	The insertion code of the residue, as was originally found in the PDB.
Residue Serial C:RESIDUE_SERIAL, S:5, A:3.0	Serial number of the residue in the chain. Starts with 1 for the first residue (N-terminal or 5'-terminal) in the chain, and increases by 1 with each position along the chain uniquely identifying the residue in the chain.
Entry Id C:ENTRY_ID, S:10, A:4.0	The database identifier of the Entry
Accession Code C:ACCESSION_CODE, S:24, A:4.0	The PDB accession code of the entry
SwissProt accession C:SP_PRIMARY_ID, S:45, A:6.0	The accession number (AC) associated with a swissprot entry (http://ca.expasy.org/sprot/userman.html#AC_line)
Swissprot entry C:SP_SECONDARY_ID, S:765, A:9.0	Swissprot entry name: The first item on the ID line is the entry name of the sequence. This name is a useful means of identifying a sequence (http://ca.expasy.org/sprot/userman.html#ID_line)
SwissProt 1 Letter Code C:SP_1_LETTER_CODE, S:3, A:1.0	This is the 1 letter code of the aminoacid as specified in the Swiss-prot database. This can be different from the code in the PDB in very few cases where for several reasons the protein sequence and the actual structure observed in the experiment deviate.
Swissprot residue serial C:SP_SERIAL, S:10, A:3.0	The serial identifier of the residue in the swiss-prot sequence. This does not correspond to the PDB residue serial because often only partial fragments of the actual sequence are involved and observed in the PDB experiment
CATH domain name C:CATH_ID, S:18, A:6.0	Cath Domain Names: MUST be SIX characters (e.g. 1cuk03). CHARACTERS 1-4: PDB Code: The first 4 characters determine the PDB code e.g. 1cuk CHARACTER 5: Chain Character: This determines which PDB chain is represented.A chain character of zero ('0') indicates that the PDB file has no chain field. CHARACTER 6: Domain Number: A domain number of ZERO ('0') indicates that the domain is a whole PDB chain. (http://www.biochem.ucl.ac.uk/bsm/cath/formats/CathList.html)
CATH superfamily code C:CATHCODE, S:45, A:11.0	CATH superfamily code that provide information about the CATH hierarchy. (http://www.biochem.ucl.ac.uk/bsm/cath/cath_info.html) The hierarchy is build up by the following levels - Architecture, A-level: This describes the overall shape of the domain structure as determined by the orientations of the secondary structures but ignores the connectivity between the secondary structures. - Topology (Fold family), T-level: Structures are grouped into fold families at this level depending on both the overall shape and connectivity of the secondary structures. - Homologous Superfamily, H-level: This level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous. - Sequence families, S-level: Structures within each H-level are further clustered on sequence identity. (http://www.biochem.ucl.ac.uk/bsm/cath/formats/CathDomainDescriptionFile.html)
Order In C:ORDER_IN, S:0, A:2.0	The serial of the mapping to a cath domain when 2 different segments of the same chain map to the same CATH domain
Begin residue C:BEG_RES, S:18, A:2.0	The PDB seq of the first residue of the chain segment that belongs in the same mapping with this CATH domain
End Residue C:END_RES, S:18, A:3.0	The PDB seq of the last residue of the chain segment that belongs in the same mapping with this CATH domain
for related	Reverse relation:has of Reverse entity:Assembly - Relation attributes:Assembly Id
for related	Reverse relation:has of Reverse entity:Chain - Relation attributes:Chain Id
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id
for related	Reverse relation:has of Reverse entity:Residue - Relation attributes:Residue Id
of organism related	Reverse relation:belong of Reverse entity:NCBI Taxonomy - Relation attributes:Ncbi Tax Id

PFAM per Residue Links to the PFAM database of protein families and profile hidden markov models (http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D138) , (http://nar.oupjournals.org/cgi/content/full/30/1/276?ijkey=wfgjAWVRY.wto&keytype=ref&siteid=nar) on the detailed residue level (http://www.sanger.ac.uk/Software/Pfam/help/index.shtml) Reference attributes:Pfam Mapping Id - Naming attributes:Accession Code,Assembly Serial,Chain Code,Residue Serial,Ligand code,Pfam Id
Pfam Mapping Id C:PFAM_MAPPING_ID, S:0, A:0.0	Database identifier of the PFAM mapping
Accession Code C:ACCESSION_CODE, S:24, A:0.0	The PDB accession code of the entry
Entry Id C:ENTRY_ID, S:10, A:0.0	The database identifier of the Entry
Assembly Id C:ASSEMBLY_ID, S:10, A:0.0	The database identifier of the Assembly
Assembly Serial C:ASSEMBLY_SERIAL, S:38, A:0.0	The serial identifier of the assembly in the entry
Residue Id C:RESIDUE_ID, S:0, A:0.0	The database identifier of the Residue
Ncbi Tax Id C:NCBI_TAX_ID, S:15, A:0.0	The NCBI taxonomy identifier (taxid) that points to a node of the taxonomy tree
Chain Id C:CHAIN_ID, S:10, A:0.0	The database identifier of the Chain
Chain Code C:CHAIN_CODE, S:24, A:0.0	The standard code of the chain that uniquely identifies it in the assembly. It is an extension of the PDB chain Id. In cases where symmetry operations have been applied to a chain, these chains are named with a numeric suffix, ie. A,A1,A2,A3 ... The chain id specified in the PDB file is also ignored for waters and bound molecules, and their codes are derived from the name of the chain that they are bound to. Finally there are no "null" chain codes and in cases where no id was specified in the PDB file, then arbitrary chain codes are assigned (i.e. A,B)
Chain Msd Code C:CHAIN_MSD_CODE, S:24, A:0.0	An internal longer code for a chain (defined by MSD) that includes the type of the chain (protein, bound molecule etc). It does not identify uniquely a chain in an assembly; the chain code has to be used instead
Chain PDB Code C:CHAIN_PDB_CODE, S:3, A:0.0	The original code of the chain as found in the PDB. There are problems with the chain code since it is not used in a consistent way in the PDB. Firstly in many cases this is null in cases where there is a single chain in the entry. Additionally very often the same chain code is used both for a polymer chain and a bound molecule (that is bound to it). So generally the PDB chain code is often not a distinct identified for a chain. For this reason the chain code was introduced which is consistent and uniform. The purpose of the chain code is to uniquely identify a chain in an assembly. So in cases where chain A is used 4 times in an assembly, the generated chains will have chain codes A, A1, A2, A3. Although for the chain that has been marked as non-symmetric valid (that should be used to extract the original asymmetric PDB data), then the original PDB code is used (if it is correct) i.e. A. In these cases where a chain in the PDB did not have a chain code, then the first not used letter is reserved (i.e. A). When 2 different chains (i.e. polymer chain and bound molecule chain) share the same PDB code, then the chain code of the bound molecule is consistently derived from the chain code of the polymer chain
PDB Export Chain Code C:CHAIN_CODE_1_LETTER, S:3, A:0.0	This is an additional 1 letter code that uniquely identifies it in the assembly. It is arbitrary and its purpose is to be able to export files in PDB format
Valid In Assymetric Unit C:NON_ASSEMBLY_VALID, S:1, A:0.0	This item is to be used not only in an assembly context, but also to represent the original asymmetric unit
3 Letter Code C:CODE_3_LETTER, S:9, A:0.0	This attribute provides a code from the chem comp dictionary for standard residues. This attribute must be the same for small molecules that represent our variations on topology/chemistry for a polymer component e.g. All ALA's should have a code_3_letter of ALA. All adenosine nucleotides should have a 3 letter code of A, except for those that have a topology of 'free'. This code is now obsolete and the Comp Code should be used instead in most cases
Code 1 Letter C:CODE_1_LETTER, S:15, A:0.0	One code letter for the residue as specified in the PDB sequence and structure.
Ligand code C:CHEM_COMP_CODE, S:36, A:0.0	The standard extended molecule code of the aminoacid or ligand. It is composed by the PDB 3 letter code with an optional topological indicator appended after an underscore
Residue PDB Code C:RESIDUE_PDB_CODE, S:9, A:0.0	The code of the residue or ligand as was defined in the PDB. The reference ligand (chem comp) code should be used instead, since in cases where these two are different there was some error in the original PDB data that was identified during clean up. Common cases of these are 1) The chemical structure implied by the PDB coordinates is entirely irrelevant with the ligand with this code in the chemical dictionary. A big mess in the PDB data. 2) The structure in the PDB is a structurally modified version of the ligand with this code for example an extra atom was introduced (i.e. modified aminoacids). In these cases a new ligand is defined in the chemical dictionary and is assigned to this residue or bound molecule. 3) The PDB coordinates imply a different stereoisomer. A new ligand is introduced in the chemical dictionary for the new stereoisomer.
Residue PDB Seq C:RESIDUE_PDB_SEQ, S:4, A:0.0	The sequence of the residue, as was originally found in the PDB (has to be used together with insert code).
PDB Insert Code C:PDB_INSERT_CODE, S:3, A:0.0	The insertion code of the residue, as was originally found in the PDB.
Residue Serial C:RESIDUE_SERIAL, S:5, A:0.0	Serial number of the residue in the chain. Starts with 1 for the first residue (N-terminal or 5'-terminal) in the chain, and increases by 1 with each position along the chain uniquely identifying the residue in the chain.
Chain C:CHAIN, S:3, A:0.0	This field contains the chain identifier as in the original pfam data, it should be the same as in the CHAIN_PDB_CODE , except in cases that, this particular entry has been cleaned up
Serial C:SERIAL, S:5, A:0.0	(obsolete) same as the residue serial
PDB 3 letter code C:PDB_ID, S:9, A:0.0	This is the 3 letter code of the aminoacid as specified in the PDB This can be different from the 3 letter code in the PDBe in very rare cases mainly when some cleanup was involved
Pdb Seq C:PDB_SEQ, S:123, A:0.0	The sequence of the residue, as was originally found in the PDB (has to be used together with insert code).
SwissProt accession C:SP_PRIMARY, S:45, A:0.0	The accession number (AC) associated with a swissprot entry (http://ca.expasy.org/sprot/userman.html#AC_line)
SwissProt 1 Letter Code C:SP_COMPONENT, S:3, A:0.0	This is the 1 letter code of the aminoacid as specified in the Swiss-prot database. This can be different from the code in the PDB in very few cases where for several reasons the protein sequence and the actual structure observed in the experiment deviate.
Swissprot residue serial C:SP_SERIAL, S:10, A:0.0	The serial identifier of the residue in the swiss-prot sequence. This does not correspond to the PDB residue serial because often only partial fragments of the actual sequence are involved and observed in the PDB experiment
Pfam Id C:PFAM_ID, S:30, A:0.0	The PFAM accession number
Starting Residue C:SP_RES_FROM, S:5, A:0.0	The Swissprot residue serial of the begining of the sequence segment maps to the same PFAM family
Ending Residue C:SP_RES_TO, S:5, A:0.0	The Swissprot residue serial of the end of the sequence segment maps to the same PFAM family
of organism related	Reverse relation:belong of Reverse entity:NCBI Taxonomy - Relation attributes:Ncbi Tax Id
for related	Reverse relation:has of Reverse entity:Residue - Relation attributes:Residue Id
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id
for related	Reverse relation:has of Reverse entity:Assembly - Relation attributes:Assembly Id
for related	Reverse relation:has of Reverse entity:Chain - Relation attributes:Chain Id

PFAM Protein Family Links to the Pfam database of protein families and hidden markov models (http://nar.oupjournals.org/cgi/content/full/30/1/276?ijkey=wfgjAWVRY.wto&keytype=ref&siteid=nar) on the chain level Reference attributes:Entry Id,Chain Code,SwissProt accession,Ending Residue,PFAM Id - Naming attributes:Accession Code,Chain Code,PFAM Id
Accession Code C:ACCESSION_CODE, S:8, A:4.0	The PDB accession code of the entry
Chain Code C:CHAIN_CODE, S:8, A:2.0	The standard code of the chain that uniquely identifies it in the assembly. It is an extension of the PDB chain Id. In cases where symmetry operations have been applied to a chain, these chains are named with a numeric suffix, ie. A,A1,A2,A3 ... The chain id specified in the PDB file is also ignored for waters and bound molecules, and their codes are derived from the name of the chain that they are bound to. Finally there are no "null" chain codes and in cases where no id was specified in the PDB file, then arbitrary chain codes are assigned (i.e. A,B)
PFAM Id C:PFAM_ID, S:10, A:7.0	The database identifier of the Pfam
Chain PDB Code C:CHAIN_PDB_CODE, S:1, A:1.0	The original code of the chain as found in the PDB. There are problems with the chain code since it is not used in a consistent way in the PDB. Firstly in many cases this is null in cases where there is a single chain in the entry. Additionally very often the same chain code is used both for a polymer chain and a bound molecule (that is bound to it). So generally the PDB chain code is often not a distinct identified for a chain. For this reason the chain code was introduced which is consistent and uniform. The purpose of the chain code is to uniquely identify a chain in an assembly. So in cases where chain A is used 4 times in an assembly, the generated chains will have chain codes A, A1, A2, A3. Although for the chain that has been marked as non-symmetric valid (that should be used to extract the original asymmetric PDB data), then the original PDB code is used (if it is correct) i.e. A. In these cases where a chain in the PDB did not have a chain code, then the first not used letter is reserved (i.e. A). When 2 different chains (i.e. polymer chain and bound molecule chain) share the same PDB code, then the chain code of the bound molecule is consistently derived from the chain code of the polymer chain
Entry Id C:ENTRY_ID, S:10, A:4.0	The database identifier of the Entry
Ending Residue C:FROM_SP_SERIAL, S:5, A:3.0	The Swissprot residue serial of the begining of the sequence segment maps to the Interpro IPR
SwissProt accession C:SP_PRIMARY_ID, S:15, A:6.0	The accession number (AC) associated with a swissprot entry (http://ca.expasy.org/sprot/userman.html#AC_line)
Swissprot entry C:SP_SECONDARY_ID, S:255, A:10.0	Swissprot entry name: The first item on the ID line is the entry name of the sequence. This name is a useful means of identifying a sequence (http://ca.expasy.org/sprot/userman.html#ID_line)
Starting Residue C:TO_SP_SERIAL, S:5, A:4.0	The Swissprot residue serial of the begining of the sequence segment maps to the Interpro IPR
for related	Reverse relation:has of Reverse entity:Entry - Relation attributes:Entry Id