spacer
Previous section Next section

Annotating the Overall Reaction in MACiE

Overall annotation in MACIE has two components:

  1. Overall annotation in ISIS/Base
  2. Overall annotation using the annotation script

Annotation in ISIS/Base

Screen shot of top overall reaction in ISIS/Base
Screen shot of bottom overall reaction in ISIS/Base
Figure 1: Screen Shot of Overall Reaction in ISIS/Base

There are a number of fields of the overall annotation that must be filled in within ISIS/Base, these are:

  • ID. This is automatically assigned by ISIS, and is the unique identifier from which the MACiE IDs are created. Please note that because this is automatically assigned, if an overall entry is deleted, this number cannot be reused.
  • Name. This is the name of the enzyme. In all cases it should be the accepted (recommended) name as assigned by the Enzyme Commission. The accepted names can be found on the IntEnz website under the entry for the specific EC number. In cases where the accepted name contains a Greek letter, this letter should be spelt out, e.g. β becomes beta. Even if the accepted name is not quite the correct one for the MACiE entry, e.g. For EC 3.1.26.4 (M0163) the accepted name is calf thymus ribonuclease H whereas the MACiE entry is for E. coli, the accepted name should be used. Alternative names will be picked up by the name query on the MACiE homepage.
  • EC Number. This is the hierarchical four number code, each number separated by periods, which classifies the enzyme function according to the enzyme's overall reaction scheme and is assigned to the enzyme by the Enzyme Commission. The first number of the EC code, which can take values of 1 to 6, represents the class of the enzyme (1 = oxidoreductase, 2 = transferase, 3 = hydrolase, 4 = lyase, 5 = transferase and 6 = ligase). The second number represents the sub-class, the exact meaning of which is dependent on the Class, the third number represents the sub-subclass and the fourth number is the Serial Number, which essentially defines the substrate specificity of the enzyme. PDB files often include the EC number in their annotation, the literature references may also include the EC number. However, this number should always be confirmed with IntEnz, as EC numbers are not completely static and are frequently reviewed.
  • PDB Code. This is the representative PDB code chosen for the entry. Wherever possible, it should be the wild type enzyme (no mutations), and should contain any essential metal ions, ideally it should also include any cofactors, and should be the best quality resolution available that fulfils the first two criteria.
  • Unique Identifier. This is the unique identifier for the overall reaction. This identifier is made up from the ID assigned by ISIS, e.g. 1, and enough zeros to make the number four digits in length, e.g. 0001. Finally the text string ".ov" is added to signify that the reaction is the overall reaction, e.g. 0001.ov. This unique identifier will be used to create the file name for the overall reaction .rxn file.
  • Overall Reaction. This is the chemical representation of the overall transformation taking place. This should mirror the reaction found in the enzyme commission description of the EC number. It is created in ISIS/Draw as described previously (Using ISIS). Due to the fact that some overall reactions can contain multiple instances of the same molecule, which is not always well handled by ISIS/Base, the overall reaction fie that is used in the final version of MACiE is created automatically during the conversion process from the substrate and product lists.
  • Each Reference used needs to be included in the relevant field of the overall annotation. The references consist of three fields:
    • Reference number. This should be a unique number from 1 to n
    • Reference details. This is the list of authors, journal, volume, page numbers, year and title. Author names are separated by a comma (e.g. G. Yang, R. Liu, K. L. Taylor). Each field is separated by a semi-colon. Page numbers should be complete (e.g. 10879-10885) and the journal name should be the standard abbreviated name of the journal, e.g. Proc. Natl Acad. Sci. USA, or J. Mol. Biol. A complete example of a reference might be G. Yang, R. Liu, K. L. Taylor, H. Xiang, J. Price, D. Dunaway-Mariano; Biochemistry; 35; 10879-10885; 1996; Identification of active site residues essential to 4-chlorobenzoyl-coenzyme A dehalogenase catalysis by chemical modification and site directed mutagenesis
    • Medline Id. This is the PubMed unique identifier (PMID) associated with the reference as cited by Medline. If no PMID is available, this field should be left blank.
  • Substrate molecules. This is a complete list of the substrates involved in the overall reaction and consists of four fields:
    • Number, a unique number from 1 to n
    • Substrate Identifiers. The KEGG code and ChEBI identifier, separated by a comma, of the substrate (e.g. for water the identifier should be "C00001,CHEBI:15377"). However, the substrate does not always have a representative in KEGG. If the substrate is a protein, it is sometimes possible to assign this a PDB code, in which case, the PDB code should be used, but more frequently an X number should be assigned. Before creating a new X number, check the current list. Please note, it is important to use the ChEBI identifier which has the correct protonation state of the substrate molecule for this, if the correct protonation state is not currently in ChEBI, then only the KEGG identifier should be listed.
    • Structure. The chemical structure of the substrate.
    • Name. The name is the standard chemical name of the substrate - usually the same as cited in KEGG or ChEBI. Where there are more than one instance of the molecule in the overall reaction, this field also includes a count, which is included in the format (count=2) where the number represents the number of times the molecule occurs.
  • Product molecules have the same fields as the substrate molecules.

Figure 2, below, shows an example of the overall annotation that is entered using ISIS.

Example of top annotation done in ISIS
Example of bottom annotation done in ISIS
Figure 2: Example of annotation done in ISIS/Base

Once this annotation has been completed, the remainder should be done using the annotation script, available to registered developers.

Annotation using the Script

This script is only available to registered developers of MACiE and is password protected. Below is a screen shot of the overall annotation script. If you would like to become a registered developer, please email me with details on how you would like to cotribute.

Screen shot of overall annotation script
Figure 3: Screen Shot of Overall Annotation Script

The overall annotation basically consists of four sections:

1) Annotation Details

These have nothing to do with the reaction annotation, but state the date the entry was last updated, and the people responsible for adding the reaction. The two fields here are:

Date last updated. This should be in the format ddmmyyy, e.g. 05122007 for the 5th December 2007

Entry created by. This should be the initials of the person annotating the reaction. Current initials used for annotators are: GLH (Gemma Holliday), GJB (Gail Bartlett), DEA (Daniel Almonacid), AM (Andre Minoche), JDF (Julia Fischer), JAR (Judith Reeks).

2) Overall Reaction Identifiers

These are the identifiers that link the current MACiE entry to other databases. The identifiers included are:

MACiE ID. This is the MACiE id of the current entry. It should always take the form M\d{4}, e.g. M0001

CATH code(s). This links MACiE to the CATH database and allows us to access information on structural evolution of the enzyme. This is auto-generated from the primary PDB code for the MACiE entry. CATH codes are split into catalytic domains (those domains which furnish at least one catalytic residue) and "other" domains.

KEGG Reaction Identifiers. This should the the reaction identifier in KEGG for the enzyme's overall reaction. It will not always be possible to find an exact correspondence, in which case, the field should be left blank.

EzCatDB code. This links MACiE entries to Nozomi Nagano's database (EzCatDB).

SFLD identifier. This links MACiE to the Structure-Function Linkage Database (SFLD). To find the relevant ID in the SFLD it is necessary to first search for the reaction using the EC number, this presents the user with a list of all the families that perform that EC number, select a family, some members of this family may well have crystal structures associated with them. Clicking upon the number of associated crystal structures will give a list of PDB codes, from which the relevant code can be selected, giving the selected structures database Id.

Species Name. There are two fields associated with species name, the scientific name and the common name.

UniProt code. This links us to the UniProt database, which gives information on the protein sequences which may be split into Catalytic and non-catalytic chains.

It should be noted that it is not always possible to fill in all the identifier fields, e.g. there is only a sinlge UniProt identifier, and no non-catalytic UniProt identifiers.

3) Overall Reaction Chemistry

This section describes the chemistry occurring in the overall reaction. It should just be a reflection of the changes between the reactants and products of the overall reaction and should not take into account the mechanism involved in the individual steps of the reaction.

The Reactive Centres should be annotated. These are defined as any atom at which a reaction occurs, such that a bond is either formed, cleaved or altered in order or an atom at which the oxidation state changes. They should be listed as if a unique ID was assignable to each one, thus there may be multiple occurrences of each atom type. Unfortunately, unique ID's are not assignable in ISIS, however, we hope to eventually automate this particular annotation, which will allow unique ID's to become assignable.

All the bonds annotated in MACiE are entered as if a unique Id were assignable to the bond, allowing for multiple occurrences of the same bond type.

The Bonds Involved are those bonds whose change cannot be described by bond formation, cleavage or change in order. An example of a bond involved would be one in which the stereochemistry is changed. They are written as a normal bond, e.g. N-H for a single bond, C=O for a double bond, C#N for a triple bond.

Bonds Formed are those bonds that are formed during the course of the overall reaction. Bonds Cleaved are those bonds that are broken during the course of the overall reaction. They are written as a normal bond, N-H for a single bond, C=O for a double bond, C#N for a triple bond. Note: a multiple bond may be formed or cleaved in the overall reaction. However, the bond should also be included in the bonds changed in order

Bonds changed in order are those bonds that change in order during the course of a reaction. They are written in a different format to the other bonds involved in the overall reaction, with the format: starting bond, starting order, final bond, final order. E.g. a C=O bond going to a C-O bond would be entered as: C=O,2,C-O,1.

Cofactors are non-standard amino-acid small molecules that assists an enzyme in catalysis. To exclude allosteric regulaters, we require a cofactor to be present in the active site. Cofactors can be inorganic molecules (e.g. metal ions), or organic molecules (e.g. PLP), which may somtimes be complexes with metal ions (e.g. heme). These should be annotated in the following way: First, the name of the cofactor should be included, followed by the HET group name from the PDB file. Where there is no HET group name, the word "none" should be used. Then, the number associated with the HET group (analogous to the residue number) followed by the chain identifier. Again, if these values are not available, the word "none" should be used. All these values should be separated by commas, e.g. FAD,FAD,450,A or for when there is no information available FAD,none,none,none. If there are multiple cofactors of the same type with no PDB information, please distinguish them by assigning an arbitrary series of numbers (e.g. 1 and 2).

Amino acid residues involved in the reaction should also be annotated here. They should be entered in the format of the three letter code, PDB residue number, followed by chain identifier, e.g. Asp126A.

Finally, the question Does the enzyme return to its native state is posed. There are three options available here: Yes, No and the default value of a blank field. This question is a control, and is used to determine the number of reactions in which further evidence is needed in order to return the enzyme to a state in which it is able to undergo another reaction. However, if the enzyme does not have any amino acid residues that change in their bonding or electronic configurations, then the enzyme has not changed, and so cannot return, but it is not a No return, and so this field should be left blank. A special case of return are racemases, which may act upon either enantiomer to produce to the other enantiomer, in these cases the reaction is often symmetrical and so the enzyme is considered to return at the end of a single cycle, even if the end point is not identical to the original starting point.

4) Other Overall Reaction Information

Evidence for the mechanism should also be included. This currently takes one or more of the following values, which are a subset of those also allowed for amino acid residue function:

  • Chemical modification of the enzyme - This includes such studies where the enzyme has been chemically modified in some way, for example the chemical trapping of an intermediate or chemical denaturing of an amino acid residue. It does not include the mutagenesis of amino acid residues.
  • Theoretical studies of the enzyme - This includes any study that has performed computational (theoretical) studies on the mechanism of the enzyme.
  • Conservation of residues - Conservation of amino acid residues involved between a family of proteins.
  • Presence of a covalently bound intermediate - This includes any mechanism that utilises the formation of an enzyme-substrate (or cofactor) covalent bond, and has had a crystal structure solved for that complex.
  • Kinetic studies - Kinetic studies, which will include kinetic isotope effect studies, rate studies, etc. that have been performed in order to determine the mechanism.
  • Modelling studies of species within the active site
  • Mutagenesis of catalytic residues - Where mutagenesis has been used to determine the importance of a given catalytic residue, or group of residues.
  • pH studies
  • Similarity to a homologue with a known mechanism - Quite often mechanisms are derived based upon the fact that the enzyme is similar to another for which the mechanism has already been determined.
  • Spectroscopic studies - Spectroscopic studies, which includes NMR, EPR, etc.
  • 3-dimensional structure - Where the 3-dimensional crystal structure has been used to infer the mechanism.

The evidence terms have been designed to be somewhat analogous to those used by the CSA (although a little more generic) and more can be added if needed.

Because the EC number for an enzyme may change over time, it is important to include Previous EC Numbers in the annotation, especially as the PDB very often only lists the original EC number associated with the crystal structure, and not the current one.

The biological unit (either according to the authors or the computational methods of the PDBe) should be noted. If the biological unit is a homodimer, then it should be annotated as homodimeric. Currently, this is a free-text box.

The location of the active site should also be annotated, this can be within a single domain, at the interface between two domains on a single chain, at the interface between two (or more) chains or, more rarely, the reaction can occur across multiple active sites.

Finally, Overall Reaction Comments may be added to the overall annotation. These are comments that are more appropriate to the entire reaction than a specific reaction step and may include information on alternative mechanisms that have been suggested, or simply more detailed information on the protein structure itself.

Once all this data has been input, the script should be updated (using the Update button), at which point it will be possible to add in the evidence for the presence and function of the amino acid residues and cofactors. The evidence for the amino acid residues and the cofactors is essentially the same as for the overall reaction, with an extra value:

  • Cofactor is required for the mechanism to occur - This is primarily for cofactor species, as presence of the cofactor is often determines, initially, by the requirement for the cofactor to be present. It should not, as a general rule, be used as evidence for the amino acid residues (this is the equivalent of mutagenesis).

Overall Evidence for Cofactors Overall Evidence for Amino Acid Residues
Figure 4: Overall Evidence for Cofactors (left) and Amino Acid Residues (right)

Completing the Overall Reaction Annotation

When all the annotation has been completed, the Write to File box should be checked and the Update button pressed, and this produces a text string in the correct format for copy and pasting into the annotation field of the overall reaction in ISIS/Base.

Example of completed overall reaction annotation
Figure 5: An example of the completed overall reaction annotation in ISIS/Base

Previous section Next section
spacer
spacer