 |
Protein Databases
Secondary protein databases - InterPro
Unfortunately, these secondary databases do not share the same formats and nomenclature as each other,
which makes the use of all of them in an automated way difficult. In response to this the
UniProtKB/Swiss-Prot group at the EBI has developed the Integrated resource of Protein
domains and functional sites more commonly known as InterPro (Apweiler et al., 1996). This database is an integration of the PROSITE, PRINTS, Pfam and
ProDom groups databases. InterPro will allow users access to a wider,
complementary range of site and domain recognition methods in a single package.
In the task of sequence characterisation, we need more reliable, concerted
methods for identifying protein family traits and for inheriting functional annotation.
This is especially important given our dependence on automatic methods for assigning
functions to the raw sequence data issuing from genome projects. Rationalising this
process by creating a single coherent resource for diagnosis and documentation of protein
families is difficult, given entirely different database formats, different search tools
and different search outputs. InterPro is an attempt to address some of these issues.
This new resource provides an integrated view of a number of commonly used pattern databases,
and provides an intuitive interface for text- and sequence-based searches.
Flat-files submitted by each of the groups were systematically merged and dismantled.
Where relevant, family annotations were amalgamated, and all method-specific annotation
separated out. This process was complicated by the relationships that can exist, both
between entries in the same database, and between entries in different databases. Different
types of parent-child relationship were evident, leading to the differentiation
into ‘sub-types' and ‘sub-strings'. A sub-string means that a motif or motifs
are contained within a region of sequence encoded by a wider pattern. Examples would be; a PROSITE
pattern is typically contained within a PRINTS fingerprint; or a fingerprint might be
contained within a Pfam domain. A sub-type means that one or more motifs are specific for
a sub-set of sequences captured by another more general pattern . Examples would be; a super-family
fingerprint may contain several family- and sub-family-specific fingerprints; or a
generic Pfam domain may include several family fingerprints.
|
|
|
Protein Databases <<< 9/11 >>> |
References:
Apweiler R., Attwood T.K., Bairoch A., Bateman A., Birney E., Biswas M., Bucher P., Cerutti L., Corpet F., Croning M.D.R., Durbin R., Falquet L., Fleischmann W., Gouzy J., Hermjakob H., Hulo N., Jonassen I., Kahn D., Kanapin A., Karavidopoulou Y., Lopez R., Marx B., Mulder N.J., Oinn T.M., Pagni M., Servant F., Sigrist C.J.A., Zdobnov E.M. (2001)
Nucl. Acids Res. 29, 37-40. |
|