 |
InterPro: Databases
| InterPro Member Databases |
|
The
UniProt (Universal Protein Resource) is the world's most comprehensive catalog
of information on proteins. It is a central repository of protein sequence and function created by joining
the information contained in Swiss-Prot, TrEMBL, and PIR.
|
|
PROSITE is a database of protein
families and domains. It consists of biologically significant
sites, patterns and profiles that help to reliably identify
to which known protein family (if any) a new sequence
belongs.
|
|
HAMAP, stands for High-quality
Automated and Manual Annotation of microbial Proteomes.
HAMAP profiles are manually created by expert curators they identify proteins that are part of
well-conserved bacterial, archaeal and plastid-encoded (i.e. chloroplasts, cyanelles, apicoplasts, non-photosynthetic plastids)
proteins families or subfamilies.
|
|
Pfam is
a large collection of multiple sequence alignments and
hidden Markov models covering many common protein domains.
|
|
PRINTS
is a compendium of protein fingerprints. A fingerprint
is a group of conserved motifs used to characterise a
protein family; its diagnostic power is refined by iterative
scanning of UniProt. Usually
the motifs do not overlap, but are separated along a sequence,
though they may be contiguous in 3D-space. Fingerprints
can encode protein folds and functionalities more flexibly
and powerfully than can single motifs, their full diagnostic
potency deriving from the mutual context afforded by motif
neighbours.
|
|
The
ProDom
protein domain database consists of an automatic compilation
of homologous domains. Current versions of ProDom are
built using a novel procedure based on recursive PSI-BLAST
searches (Altschul SF, Madden TL, Schaffer AA, Zhang J,
Zhang Z, Miller W & Lipman DJ, 1997, Nucleic Acids
Res., 25:3389-3402; Gouzy J., Corpet F. & Kahn D.,
1999, Computers and Chemistry 23:333-340.) Large families
are much better processed with this new procedure than
with the former DOMAINER program (Sonnhammer, E.L.L. &
Kahn, D., 1994, Protein Sci., 3:482-492).
|
|
SMART
(a Simple Modular Architecture Research Tool) allows the
identification and annotation of genetically mobile domains
and the analysis of domain architectures. More than 500
domain families found in signalling, extracellular and
chromatin-associated proteins are detectable. These domains
are extensively annotated with respect to phyletic distributions,
functional class, tertiary structures and functionally
important residues. Each domain found in a non-redundant
protein database as well as search parameters and taxonomic
information are stored in a relational database system.
User interfaces to this database allow searches for proteins
containing specific combinations of domains in defined
taxa.
|
|
TIGRFAMs
is a collection of protein families, featuring curated
multiple sequence alignments, hidden Markov models (HMMs)
and annotation, which provides a tool for identifying
functionally related proteins based on sequence homology.
Those entries which are "equivalogs" group homologous
proteins which are conserved with respect to function.
|
|
PIRSF The PIRSF protein classification system is a network with multiple levels
of sequence diversity from superfamilies to subfamilies that reflects the
evolutionary relationship of full-length proteins and domains.
The primary PIRSF classification unit is the homeomorphic family, whose members are both
homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity
and a common domain architecture).
|
|
SUPERFAMILY is a library of profile hidden Markov models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent the entire SCOP superfamily that the domain belongs to. SUPERFAMILY has been used to carry out structural assignments to all completely sequenced genomes. The results and analysis are available from the SUPERFAMILY website.
|
|
Gene3D database describes protein families and domain architectures in
complete genomes. Protein families are formed using a Markov clustering
algorithm, followed by multi-linkage clustering according to sequence
identity. Mapping of predicted structure and sequence domains is
undertaken using hidden Markov models libraries representing CATH and Pfam
domains. Functional annotation is provided to proteins from multiple
resources. Functional prediction and analysis of domain architectures is
available from the Gene3D website.
|
|
PANTHER is a large collection of protein families that have been subdivided into functionally related subfamilies, using human expertise. These subfamilies model the divergence of specific functions within protein families, allowing more accurate association with function (human-curated molecular function and biological process classifications and pathway diagrams), as well as inference of amino acids important for functional specificity. Hidden Markov models (HMMs) are built for each family and subfamily for classifying additional protein sequences. PANTHER is publicly available without restriction. |
|
|
|