A Glossary of terms used in MEROPS
The following is a list of loose definitions of many of the important terms that are used in the MEROPS database. In addition, terms for the different kinds of peptidase activity are to be found under "Classification - by reaction catalysed".
An aminopeptidase liberates a single amino acid residue from the unblocked N-terminus of its substrate: Xaapeptide (or XaaXaan). Examples are aminopeptidase N (M01.001) and aminopeptidase C (C01.086). Aminopeptidases form sub-subclass EC 3.4.11 in the NC-IUBMB scheme.
A carboxypeptidase hydrolyses a single residue from the unblocked C-terminus of its substrate: peptideXaa (or XaanXaa). Examples are carboxypeptidase A1 (M14.001) and carboxypeptidase Y (S10.001). Carboxypeptidases form sub-subclasses EC 3.4.16-18 in the NC-IUBMB scheme, being divided by catalytic type.
The following is a list of abbreviations for the names of chemical compounds that may be used without further definition in MEROPS.
|DAN||diazoacetylnorleucine methyl ester|
|FPLC||fast protein liquid chromatography|
|H-kininogen||high molecular mass kininogen|
|HPLC||high-performance liquid chromatography|
|MHC||major histocompatibility complex|
|NMR||nuclear magnetic resonance|
|PAGE||polyacrylamide gel electrophoresis|
|PCR||polymerase chain reaction|
|SDS||sodium dodecyl sulfate|
The catalytic type of a peptidase relates to the chemical groups responsible for its catalysis of peptide bond hydrolysis. The six specific catalytic types that are recognised are the serine, threonine, cysteine, aspartic, glutamic and metallo- peptidases. In peptidases of serine, threonine and cysteine type, the catalytic nucleophile is the reactive group of an amino acid side chain, either a hydroxyl group (serine and threonine peptidases) or a sulfhydryl group (cysteine peptidases). In aspartic and metallo- peptidases, the nucleophile is commonly an activated water molecule. In aspartic peptidases, the water molecule is directly bound by the side chains of aspartic residues. In metallopeptidases, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. The metal may be zinc, cobalt or manganese, and a single metal ion is usually bound by three amino acid ligands. The glutamic peptidases (all in the small family G1) were recognised only in 2004, and much remains to be learned about their catalytic mechanisms, but they seem to employ a Glu/Gln catalytic dyad. Just a few peptidases are still of unknown catalytic type. The initial letters S, T, C, A, G, M and U are used in forming the names of the clans and families of peptidases in the MEROPS sustem.
In a clan we aim to include all the modern-day peptidases that have arisen from a single evolutionary origin of peptidases, although they commonly have diverged so far that they now belong in more than one family. The homology of peptidases in different families in a clan is most clearly shown by their similar protein folds. The significance of the similarity can often be quantified by use of the DALI program (Holm & Sander, 1997), and a table of results obtained in this way can be seen here. When structures are not available, the order of catalytic-site residues in the polypeptide chain and sequence motifs around them may provide less direct evidence of homology at the clan level. Each clan is identified with two letters, the first of which represents the catalytic type of the families included in the clan. The letter "P" is used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine. Some families cannot yet be assigned to any clan, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example clan MA contains subclan MA(E), the gluzincins, and subclan MA(M), the metzincins. (See also the use of clans in the MEROPS classification.)
The families of proteins that inhibit peptidases are assigned to clans in similar ways to the families of peptidases. The series of identifiers IA - IZ has not proved quite sufficient, and the additional series JA - JZ is beginning to be used.
At least 12 of the families of peptidase inhibitors contain what we term compound inhibitors. (The families are I1, I2, I3, I8, I12, I15, I17, I19, I20, I25, I27, I31). The compound inhibitors are proteins that contain multiple inhibitor units. The number of inhibitor units ranges from 2 - 15 (Rawlings et al., 2004). The identifier for each of these compound inhibitors starts with the letter "L" followed by the name of the family to which the peptidase units belong, a hyphen, and a serial number. For example, ovomucoid contains three inhibitor units, I01.001, I01.002 and I01.003, and whole protein has the identifier LI01-001. The summary page for the compound inhibitor, LI01-001 contains a diagram that shows how the individual units are arranged. A few compound inhibitors are known that contain units from more than one family of inhibitors. These have identifiers that start "LI90", and an example is chelonianin (LI90-003).
Compound or complex peptidase
The MEROPS classification of peptidases is a classification of peptidase units, and the great majority of proteins with peptidase activity contain only a single peptidase unit. But occasionally it happens that a single protein molecule contains several peptidase units. Such a molecule clearly requires special treatment because no single location in the classification is right for it. We term such a peptidase a "compound peptidase". There are also multi-subunit peptidase molecules that contain more than one peptidase unit in separate polypeptide chains; these we term "complex peptidases". We use a special type of identifier starting in "X" for the compound and complex peptidases. In addition, a conventional MEROPS identifier is assigned to each of the individual peptidase units. For example, the somatic form of peptidyl-dipeptidase A (angiotensin-converting enzyme) is XM02-001, and its two peptidase units are M02.001 and M02.004. There is a summary page for XM02-001 in addition to the standard pages for M02.001 and M02.004. To see which other compound peptidases are recognized by MEROPS, look under "X" in the alphabetical index of identifiers in the side-bar menu.
A dipeptidase hydrolyses a dipeptide, and typically requires that both termini be free: XaaYaa. Examples are dipeptidase A (C69.001) and membrane dipeptidase (M19.001). Dipeptidases form sub-subclass EC 3.4.13 in the NC-IUBMB scheme.
A dipeptidyl-peptidase hydrolyses an N-terminal dipeptide from its substrate: dipeptidepeptide (i.e. Xaa2Xaan), and that being the case, the term dipeptidyl-peptidase (short for 'dipeptidyl-peptide hydrolase is clearly appropriate. These enzymes are sometimes erroneously called aminopeptidases or dipeptidases. Examples are dipeptidyl-peptidase I (C01.070) and dipeptidyl-peptidase III (M49.001). Dipeptidyl-peptidases, together with tripeptidyl-peptidases, form sub-subclass EC 3.4.14 in the NC-IUBMB scheme.
An endopeptidase hydrolyses internal, alpha-peptide bonds in a polypeptide chain, tending to act away from the N-terminus or C-terminus. Examples of endopeptidases are chymotrypsin (S01.001), pepsin (A01.001) and papain (C01.001). A very few endopeptidases act a fixed distance from one terminus of the substrate, an example being mitochondrial intermediate peptidase (M03.006). Some endopeptidases act only on substrates smaller than proteins, and these are termed oligopeptidases. An example of an oligopeptidase is thimet oligopeptidase (M03.001). Endopeptidases initiate the digestion of food proteins, generating new N- and C-termini that are substrates for the exopeptidases that complete the process. Endopeptidases also process proteins by limited proteolysis. Examples are the removal of signal peptides from secreted proteins (e.g. signal peptidase I, S26.001) and the maturation of precursor proteins (e.g. enteropeptidase, S01.156; furin, S08.071). In the nomenclature of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) endopeptidases are allocated to sub-subclasses EC 3.4.21, EC 3.4.22, EC 3.4.23, EC 3.4.24 and EC 3.4.25 for serine-, cysteine-, aspartic-, metallo- and threonine-type endopeptidases, respectively (Enzyme Nomenclature).
The exopeptidases require a free N-terminal amino group, C-terminal carboxyl group or both, and hydrolyse a bond not more than three residues from the the terminus. The exopeptidases are further divided into aminopeptidases, carboxypeptidases, dipeptidyl-peptidases, peptidyl-dipeptidases, tripeptidyl-peptidases and dipeptidases.
The term family is used to describe a group of peptidases or peptidase inhibitors each of which can be proved to be homologous to the type example. The homology is shown by a significant similarity in amino acid sequence either to the type example itself, or to another protein that has been shown to be homologous to the type example, and thus a member of the family. The relationship must exist in the peptidase unit at least. A family can contain a single peptidase if no homologues are known, and a single gene product such as a virus polyprotein can contain more than one peptidase each assigned to a different family. Each family is identified by a letter representing the catalytic type of the peptidases it contains together with a unique number.
Some families are divided into subfamilies because there is evidence of a very ancient divergence within the family. Typically, the divergence corresponds to more than 150 accepted point mutations per 100 amino acid residues, which would represent an event 2,500 million years ago for a family with an typical evolutionary rate of 0.6 substitutions per amino acid site per 1,000 million years. A putative protein sequence that is very divergent from known peptidases in the family does not normally found a new subfamily but is described as "unassigned". (See also the use of families in the MEROPS classification.)
Many metallopeptidases bind a tetrahedally-coordinated atom of zinc, or rarely another metal, by use of the sequence motif: His-Glu-Xaa-Xaa-His (‘HEXXH’). In this, the two His residues are ligands of the metal atom, and the Glu residue has a catalytic role. A third ligand of the zinc atom is a residue of Glu, Asp or His towards the C-terminus. The fourth ligand is a molecule of water that becomes activated and mediates the nucleophilic attack on the scissile peptide bond. However, it is important to recognise that the pentapeptide HEXXH also occurs in many proteins that are not peptidases. A longer consensus sequence that is was more reliable in detecting metallopeptidases was described by Jongeneel et al. (1989). Even this does not recognise peptidases of family M2, however, and it was further refined by Rawlings & Barrett (1995). The refined sequence is: Xaa-Xbb-Xcc-His-Glu-Xbb-Xbb-His-Xbb-Xdd, in which Xaa is hydrophobic or Thr, Xbb is an uncharged residue, Xcc is any amino acid except Pro, and Xdd is hydrophobic. In the folded metallopeptidases, the HEXXH motif occurs on an alpha-helix, as does the third ligand, and a turn is required between the two helices to bring the ligands together. The families of peptidases that contain the HEXXH motif are assigned to two clans in MEROPS: MA and MM.
A single peptidase (or inhibitor) is expected to be encoded in the genomes of many organisms, so there is a population of species variants of it. So that we can summarise the properties of the peptidase, we find it helpful to recognise a single representative form, and by analogy with the taxonomy of organisms, this is called the holotype. It is analogous to the type peptidase or type inhibitor at the family and clan levels of the classification.
Homology between two protein sequences is shown by statistically significant similarity between the amino acid sequences. When this was tested with random shuffles of the sequences in the RDF program of Lipman & Pearson (1985), we took a value of six standard deviation units as that above which the similarity could be regarded as being significant. Alternatively an e-value of 0.01 or less in BLAST against the non-redundant database was acceptable.
Homology may be established by indirect or transitive relationships. Thus, if peptidase B is directly homologous to the type example of a family, and peptidase C is directly homologous to peptidase B, then peptidase C is homologous to the type example and is a member of the family.
Homologous peptidase units are taken to show that the peptidases have common ancestry and have resulted from divergent evolution (Reeck et al., 1987).
Proteins that are homologous but exist in the same organism are defined as paralogues. Equivalent homologous proteins in different organisms are described as orthologues (Dayhoff et al., 1978; Tatusov et al., 1997).
Identifier: MEROPS ID
Each peptidase is given a unique identifier known as a MEROPS ID. The identifier consists of the family identifier (padded to three characters), a dot, and a three-digit number, e.g. S01.001. Peptidases from different organisms are assigned to a single ID when the available data indicate that they are equivalent. Special forms of MEROPS ID are used for uncharacterized peptidases from model organisms, unassigned peptidases, non-peptidase homologues, pseudogenes and unsequenced peptidases.
Much like a peptidase unit, an inhibitor unit is the minimal part of the structure of an inhibitor that can be expected to express activity.
An isopeptidase is an enzyme that hydrolyses an isopeptide bond. That is to say, an amide bond between amino acid residues in which either the carboxyl group or the amino group, or both, are not in the alpha position (as they are in a eupeptide bond). The non-alpha carboxyl and amino groups of amino acids include the beta-carboxyl group of aspartic acid, the gamma-carboxyl group of glutamate, the epsilon-amino group of lysine, and the carboxyl group and amino groups of beta-alanine.
Laskowski mechanism of peptidase inhibition
The most fully studied of the mechanisms whereby protein inhibitors of peptidases achieve the inhibition was termed the 'standard mechanism' in the key review of Laskowski & Kato (1980). In view of the huge contribution that the late Michael Laskowski, Jr. made to the elucidation of this mechanism during three decades of meticulous research, MEROPS suggests that it is appropriate to adopt the new name 'Laskowski mechanism' for what has been known as the standard mechanism. The Laskowski mechanism inhibitors interact with the peptidase in a substrate-like way. To quote from Laskowski & Kato (1980) "Inhibitors obeying this mechanism are highly specific, limited proteolysis substrates for their target enzymes. On the surface of each inhibitor molecule lies at least one (more in multi-headed inhibitors) peptide bond called the reactive site, which interacts with the active site of the cognate enzyme. The value of kcat/Km for the hydrolysis of this peptide bond by the cognate enzyme at neutral pH is very high, 104 - 106 M-1s-1, compared to a typical value for normal substrates of about 103 M-1s-1. However, for inhibitors, the values of both kcat and Km are both many orders of magnitude lower than the values for normal substrates. At typically used concentrations and neutral pH, therefore, their hydrolysis is extremely slow, and the system behaves as if it were a simple equilibrium between the enzyme and free inhibitor on the one hand and the complex on the other. The equilibrium constant for the association is extremely high (in the range of 107 - 1013 M-1). An additional property of the inhibitory reactive sites is that their hydrolysis does not proceed to virtual completion. Instead, at neutral pH, the equilibrium constant between modified inhibitor (reactive site peptide bond hydrolyzed) and virgin inhibitor (reactive site peptide bond intact) is near unity. Since the same stable complex is formed between the enzyme and either virgin or modified inhibitor, both are thermodynamically equally strong inhibitors of the cognate enzyme." Recommended reviews include Bode & Huber (1991), Laskowski & Qasim (2000) and Krowarsch etal. (2003).
The Met-turn is a structural characteristic of a subset of metallopeptidases that was first recognised by Bode et al., 1993 in the crystallographic structures of peptidases that are now assigned to families M10 and M12 in MEROPS. It is a turn, located towards the C-terminus from the HEXXH motif, which contains a highly-conserved methionine residue. The turn provides a hydrophobic environment for the zinc ion and the three ligating histidine residues at the catalytic centres of the enzymes. It commonly also contains a tyrosine residue that is important for catalysis. The Met-turn is a defining characteristic of the peptidases that are assigned to subclan M of clan MA in the MEROPS classification, termed the "metzincins" by Bode et al. (1993).
Special forms of MEROPS identifiers, where the character after the dot is 'A', 'B', 'C' or 'D', are used for uncharacterized peptidases from the following model organisms: zebrafish (Danio rerio), the fruit fly Drosophila melanogaster, the nematode Caenorhabditis elegans, mouse-ear cress (Arabidopsis thaliana), baker's yeast (Saccharomyces cerevisiae), the fission yeast Schizosaccharomyces pombe, the malaria parasite Plasmodium falciparum, the slime mould Dictyostelium discoideum, the Gram-negative bacterium Escherichia coli, the Gram-positive bacterium Bacillus subtilis and the archaean Pyrococcus furiosus.
A protein of known sequence that can be placed in a peptidase family but lacks one or more of the expected catalytic residues is described as a non-peptidase homologue. Some non-peptidase homologues are enzymes of other kinds. An example is dienelactone hydrolase (Cheah et al., 1993 ), a homologue of family S9 that has the catalytic serine replaced by cysteine. In order to classify every human and mouse peptidase homologue, we have used some special MEROPS ID identifiers for non-peptidase homologues in these species. These all have a nine as the first digit after the dot. An example is haptoglobin-1 (S01.972).
The omega-peptidases form the second group of peptidases that have no requirement for a free N-terminus or C-terminus in the substrate. Despite their lack of requirement for a charged terminal group, they often act close to one terminus or the other, and are thus totally distinct from endopeptidases. Some hydrolyse peptide bonds that are not alpha-bonds; that is, they are isopeptide bonds, in which one or both of the amino and carboxyl groups are not directly attached to the alpha-carbon of the parent amino acid. The omega-peptidases are a varied assortment of enzymes, including ubiquitinyl hydrolases (e.g. ubiquitinyl hydrolase-L3, C12.003), pyroglutamyl peptidases (C15.001, C15.010, M01.008) and gamma-glutamyl hydrolase (C26.001). The omega-peptidases are placed in sub-subclass EC 3.4.19 by NC-IUBMB.
Peptidase is the term recommended in the Enzyme Nomenclature of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) for any protein that causes the hydrolysis of peptide bonds. It is thus applicable to the endopeptidases that act on the internal bonds in proteins and large polypeptides as well as to the oligopeptidases and exopeptidases that act primarily on smaller substrates.
Peptidase is the most correct scientific term for the proteolytic enzymes that are colloquially called proteases or proteinases. Amongst the reasons that this is the word that is recommended by IUBMB and also by the Human Gene Nomenclature Committee, as well as MEROPS, is the fact that it is the word that already forms the root of the names of the many different sub-types of peptidases: aminopeptidase, carboxypeptidase, and so on, and thus leads to a very rational and intuitive system of terminology.
A few peptidases are not catalysts in the strict sense because the peptide bond hydrolysed is in the peptidase itself, and the peptidase is inactivated by the cleavage.
Any one peptidase is expected to occur in many species of organisms, and the species variants are orthologues. Criteria we use to recognize the species variants of a single peptidase are as follows:
- They have similar properties as enzymes, showing the same types and specificities of catalytic activity, pH optima and sensitivity to inhibitors. Where biochemical data are available, there are no differences in the protein sequences that would be predicted to result in differences in specificity.
- They have similar amino acid sequences throughout the length of the polypeptide encoded by the open reading frame.
- An evolutionary tree for the peptidase units shows that the protein sequences have diverged at the same time as the organisms in which they occur. An earlier divergence would imply that they are separate enzymes and not orthologues.
A single peptidase may include products of the allelic variants of a single gene and variants resulting from post-translational modification, and it may be expressed in different tissues or different stages of an organism's development. For each peptidase a single representative form termed the holotype is recognised.
The peptidase inhibitors that are classified in MEROPS are naturally-occurring proteins or polypeptides down to about 20 amino acids in length. Inhibitory peptides, synthetic substrates and other small molecule inhibitors are listed alphabetically and for many summaries have been written.
The peptidase unit is that part of the protein sequence that is directly responsible for peptidase activity, as far as it is known to MEROPS. In the simplest case, this is that part of the sequence that aligns with the smallest mature peptidase molecule in the family. Many peptidases and their precursors are chimeric proteins containing non-peptidase domains at the N- or C-terminus, or even inserted into the middle of the peptidase unit. In some families even the smallest mature peptidase can be seen to be a multidomain protein by the presence of a segment that is homologous to a known non-peptidase domain found in other proteins. Such a domain is excluded from the peptidase unit. Since it is the case that for most peptidases the limits of the peptidase unit are inferred indirectly from a multiple sequence alignment, they can be refined from time to time as new data become available.
A peptidyl-dipeptidase hydrolyses a dipeptide from the C-terminus of its substrate: peptidedipeptide, and this explains the name. An example is peptidyl-dipeptidase A (XM02-001). Peptidyl-dipeptidases form sub-subclass EC 3.4.15 in the NC-IUBMB scheme.
A peptidase homologue from human or mouse that has been identified in the literature as a pseudogene is assigned a special MEROPS identifier in which the first character after the dot is a "P". An example is the napsin B pseudogene (A01.P01).
A peptide bond that is hydrolysed by a peptidase may be termed the scissile bond, and marked with the symbol: .
Small molecule inhibitor
A small molecule inhibitor (SMI) is an inhibitor that is not a protein but is a peptide or synthetic inhibitor. SMIs include laboratory reagents used in the characterization of peptidases, and drugs such as the inhibitors of the retropepsin of the HIV virus.
The specificity of a peptidase for cleavage of a peptide bond with particular amino acids in nearby positions is described in terminology based on that originally created by Schechter & Berger to describe the specificity of papain (PubMed).
Crystallographic structures of peptidases show that the active site is commonly located in a groove on the surface of the molecule between adjacent structural domains, and the substrate specificity is dictated by the properties of binding sites arranged along the groove on one or both sides of the catalytic site that is responsible for hydrolysis of the scissile bond. Accordingly, the specificity of a peptidase is described by use of a conceptual model in which each specificity subsite is able to accommodate the sidechain of a single amino acid residue. The sites are numbered from the catalytic site, S1, S2...Sn towards the N-terminus of the substrate, and S1', S2'...Sn' towards the C-terminus. The residues they accommodate are numbered P1, P2...Pn, and P1', P2'...Pn', respectively, as follows:
Substrate: - P3 - P2 - P1 P1' - P2' - P3' - Enzyme: - S3 - S2 - S1 * S1' - S2' - S3' -
In this representation the catalytic site of the enzyme is marked * and the peptide bond cleaved (the scissile bond) is indicated by the symbol . For greater typographical convenience the original system is usually modified in that the subsite numbers are not subscripted, and they are followed rather than preceded by the prime characters.
A tripeptidyl-peptidase hydrolyses a tripeptide from the N-terminus of its substrate: tripeptidepeptide (i.e. Xaa3Xaan), and again, this explains the name. Examples are tripeptidyl-peptidase I (S53.003) and tripeptidyl-peptidase II (S08.090). Tripeptidyl peptidases, together with dipeptidyl-peptidases, form sub-subclass EC 3.4.14 in the NC-IUBMB scheme.
Type peptidase, type inhibitor
A type peptidase is nominated for each family and subfamily. All peptidases that are homologous to the type peptidase are members of this family. Similarly, a type inhibitor is nominated for each inhibitor family.
The term unassigned peptidase is used for a protein of known sequence that can be placed in a peptidase family, and can be seen to contain all the catalytic residues that are expected in the family, but which is not close in sequence to any holotype. A new MEROPS identifier is typically created for such a peptidase when it is functionally characterised.
There are also some peptidase homologues that possess all the expected active site residues and/or metal ligands and yet are not known to cleave peptide bonds, but do catalyse other reactions. An example is acetylornithine deacetylase, a non-peptidase homologue of family M20. It is worth noting, however, that another family M20 homologue, succinyl-diaminopimelate desuccinylase (M20.010), was also thought to be a non-peptidase homologue until it was shown to act as a peptidase in the presence of manganese (Broder & Miller, 2003).
There are some peptidases that we have to treat as unsequenced peptidases because the available amino acid sequence data are insufficient to allow us to assign the peptidase to a family. In order to be able to present data for these peptidases we have created a series of special MEROPS identifiers in which the family name part of the identifier is replaced by a code that indicates only the catalytic type and the kind of peptidase activity. The first character of this shows the catalytic type as in a family identifier, the second character is always 9, and the third is a letter that indicates the kind of peptidase activity: 'A' for aminopeptidase, 'B' for dipeptidase, 'C' for dipeptidyl-peptidase, 'D' for peptidyl-dipeptidase, 'E' for carboxypeptidase, 'F' for omega peptidase and 'G' for endopeptidase. An example would be the MEROPS ID M9A.007 for aminopeptidase W. As soon as fuller sequence data appear for an unsequenced peptidase we assign it a normal MEROPS ID.