Classification: three orthogonal approaches

A landmark in the development of any field of study is the appearance of a sound system of nomenclature and classification for the objects with which it deals. The introduction of the Linnaean system for naming and classifying organisms in the eighteenth century and the invention of a system of nomenclature for enzymes in the 1950's were such key events, and their value has been obvious. Both nomenclature and classification are vitally important for information-handling. They allowing people to communicate efficiently, knowing that they are talking about the same thing, and to store and retrieve information unambiguously. A good system also serves to highlight important questions and thus prompts new discoveries. Three useful methods of grouping peptidases are currently in use:

Each of these is now described in a little more detail.

Proteolytic enzymes grouped by the chemical mechanism of catalysis

In 1960 the seminal paper of Hartley (Hartley, B.S. Proteolytic enzymes. Annu Rev Biochem 29, 45-72, 1960.  PubMed) initiated a sequence of developments that has now provided the peptidase community with the very useful concept of catalytic type. Proteolytic enzymes can now be described as of serine, cysteine, threonine, aspartic, glutamic, asparagine or metallo catalytic type (and just a few remain of unknown catalytic type). Such assignments are widely used. For example, the names of clans and families in the MEROPS database are built on the letters S, C, T, A, G, M, N, and U that stand for the catalytic types (plus P, which stands for peptidases with protein nucleophiles of mixed catalytic type). The system of catalytic types has great strengths, but it also has limitations that need to be recognised. It is a strength that every serine peptidase contains a serine residue that acts as the nucleophile at the heart of the catalytic site, and as a result many are affected by generic inhibitors of serine peptidases. But the serine peptidases include many very different molecular structures and catalytic mechanisms. Morevoer, they are by no means all homologues of each other, so an expression like "the serine peptidase family" has little meaning.

Proteolytic enzymes grouped by the kinds of reaction they catalyse

In a sense, all peptidases catalyse the same reaction - hydrolysis of a peptide bond. But none of them will hydrolyse all peptide bonds, because they invariably show selectivity for the bonds they will hydrolyse. One of the forms of selectivity they exhibit is that for a peptide bond in a particular position in the polypeptide chain of the substrate molecule. On this basis, they can be classified into groups such as endopeptidases, omega-peptidases, exopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases, tripeptidyl-peptidases, peptidyl-dipeptidases and dipeptidases, all of which are described more fully in the Glossary. Since the classification of enzymes in the EC List of the Nomenclature Committee of IUBMB is a classification by reaction catalysed, these groupings are important there. They provide an essential part of the description of the activity of any peptidase.

Proteolytic enzymes grouped by molecular structure and homology

The classification of peptidases by molecular structure and homology is the newest of the three methods, because it depends on the availability of data for amino acid sequences and three-dimensional structures in quantities that were realised only in the early 1990s. In 1993, Rawlings & Barrett described a system in which individual peptidases were assigned to families, and the families were grouped in clans (Rawlings, N.D. & Barrett, A.J. Evolutionary families of peptidases. Biochem J 290, 205-218, 1993.  PubMed). This scheme was developed to provide the structure of the MEROPS database, and has been extended to include the proteins that inhibit peptidases (Rawlings et al. Evolutionary families of peptidase inhibitors. Biochem J 378, 705-716, 2004.  PubMed). The description below relates specifically to the way the classification of individual peptidases and inhibitors by molecular structure and homology is implemented in the MEROPS database.

Peptidases

Each peptidase protein is assigned to a peptidase species, and this (see below) is given a MEROPS identifier that acts much like an accession number. This is formed by concatenation of the three-character identifier of the family to which the peptidase belongs, a point, and a three-figure number. For example, the identifier of chymotrypsin, the first peptidase in family S1, is S01.001. A peptidase is considered to merit the assignment of an identifier when knowledge of it includes one or more amino acid sequences and information about substrate specificity or biological function. A satisfactory name is also very helpful. Over 4400 individual peptidases and over 700 inhibitors were recognised in Release 9.9 of the MEROPS database.

Each sequence is given a unique MERNUM (for example, MER123456). These numbers are for convenience only, are not necessarily stable and should not be cited. Instead, please cite the relevant UniProt or ProtID accession.

Families

The families of peptidases are constructed by comparisons of amino acid sequences. Every member of a MEROPS family shows a statistically significant relationship in amino acid sequence to at least one other member of the family. oreover, it is required that the relationship exists in the part of the molecule termed the 'peptidase unit' that is most directly responsible for catalytic activity. This is necessary because some peptidases are chimeric proteins. For example, procollagen C-peptidase (M12.005) is a chimeric protein that contains a catalytic domain related to that of astacin, but also contains segments that are clearly homologous to non-catalytic parts of the complement components C1r and C1s, which are in the chymotrypsin family. The procollagen endopeptidase is placed in the family of astacin (M12), and not that of chymotrypsin (S1). At the time of writing, there are nearly 250 families of peptidases in MEROPS Release 9.9. A few of the families contain two or more rather distinct groups of peptidases, as shown by a deep divergence in the dendrogram. For these, subfamilies are recognised. The naming of the families follows the system introduced by Rawlings & Barrett (Rawlings, N.D. & Barrett, A.J. Evolutionary families of peptidases. Biochem J 290, 205-218, 1993.  PubMed) in which each family is named with a letter denoting the catalytic type (S, C, T, A, G, M, N or U, for serine, cysteine, threonine, aspartic, glutamic, metallo-, asparagine or unknown), followed by an arbitrarily assigned number. For example, the caspase family of cysteine peptidases is C14. When a family disappears, usually because it is merged with another, the family name is not re-used. For this reason, there are interruptions in the numerical sequences of families that are of no current significance. (See also the definition of family.)

A type example is set-up for each family (or subfamily), which is usually the peptidase that has been most studied biochemically, and all sequences in the family must be directly or indirectly related to the type example sequence. A family is built by submitting the sequence of the type example to a blastp search of the non-redundant protein sequence database at NCBI and any sequence returned by the blastp search is included within the family provided the expect value is less than 0.001 and the matched region includes the peptidase unit of the type example. Sequences that are more distantly related can be added to the family by repeating the search with a sequences of other members of the family, or by performing psi-blast or HMMER searches. HMMER searches are being increasingly used because of the limitations on the number of hits that can be returned by blast searches.

Clans

Although the families are the largest groupings of peptidases that can be proven rigorously to be homologous, there are persuasive lines of evidence that many of the families do share common ancestry with others. That is to say, there are sets of families in which all of the proteins have diverged from a single ancestral protein, but they have diverged so far that their relationship can no longer be proved by comparison of the primary structures. The term "clan" is used to describe such a group of families (Rawlings, N.D. & Barrett, A.J. Evolutionary families of peptidases. Biochem J 290, 205-218, 1993.  PubMed). The best kind of evidence to support the formation of a clan is similarity in three-dimensional structures when the data are available, but the arrangement of catalytic residues in the polypeptide chains and limited similarities in amino acid sequence around the catalytic amino acids are also taken into account.

The name of each clan is formed from the letter for the catalytic type of the peptidases it contains (as for families) followed by an arbitrary second capital letter. For example, the first clan of cysteine peptidases is clan CA. As with families, if a clan disappears, the name is not re-used. A few clans contain families of more than one of types C, S and T; the names of these are built on the letter "P". For example clan PA contains families of serine peptidases including S1 as well as homologous cysteine peptidases in families such as C3. About 50 clans of peptidases were recognised in MEROPS Release 9.9. (See also the definition of clan.)

The concept of a peptidase species

Peptidase proteins are assigned to peptidase species, and each species has a name (like ´trypsin´ or ´cathepsin B´) and a unique identifier in MEROPS. Each peptidase species is likely to be present in many organisms in addition to the one in which it was first found, and the species variants of it are expected to have closely similar properties. It is to be hoped that the definition of the peptidase species is such that, at least to a first approximation, one can assume that its biological functions are similar across the breadth of species in which it is expressed.

MEROPS attempts to recognise distinct peptidase species in a way that has functional significance. Many different properties are taken into account. These include all of the molecular-level criteria such as the reaction catalysed, the chemical mechanism of catalysis, and the homology relationships that are revealed by analysis of the structure. Moreover, the position of each species variant on an evolutionary tree is expected to be consistent with the idea that it is indeed a species variant of the same protein. And beyond the peptidase unit, it is necessary to consider what associated non-peptidase domains are present, because these can have profound effects on biological functions. But the molecular aspects alone are still not sufficient. One needs also to consider biological properties such as the number of genes and the regulation of gene expression. For example, there are forms of elastase in the human pancreas and in polymorphonuclear leukocytes that are quite similar peptidases by the molecular-level criteria, but they are the products of different genes, and are expressed in different kinds of cells. They are then trafficked in different ways, and activated to perform different functions. By any useful definition they must be considered different peptidases (S01.153 and S01.131 in MEROPS). For further information, please see Barrett, A.J. & Rawlings, N.D. ´Species´ of peptidases. Biol Chem 388, 1151-1157, 2007. At the time of writing MEROPS is not able to treat splice variants of a peptidase as different peptidases even when they have functional differences, but this may change in the future.

The Peptidase List

The Peptidase List (sometimes abbreviated PepList) is a selection of peptidases that have been particularly thoroughly characterised. Many of them are the type peptidases of families in MEROPS and/or have published three-dimensional structures. When a peptidase is added to the list it is assigned an accession number in the form "PL00000", and a history of the use of each accession number is maintained. Please note that the items in the EC List of IUBMB that are indicated to be relevant are in many cases only approximately equivalent because different kinds of criteria are used by MEROPS and the EC list.