spacer
spacer

Protein Databases

<<< 9/11 >>>

Go to the main EBI website Secondary protein databases - InterPro

Unfortunately, these secondary databases do not share the same formats and nomenclature as each other, which makes the use of all of them in an automated way difficult. In response to this the UniProtKB/Swiss-Prot group at the EBI has developed the Integrated resource of Protein domains and functional sites more commonly known as InterPro (Apweiler et al., 1996). This database is an integration of the PROSITE, PRINTS, Pfam and ProDom groups databases. InterPro will allow users access to a wider, complementary range of site and domain recognition methods in a single package.

In the task of sequence characterisation, we need more reliable, concerted methods for identifying protein family traits and for inheriting functional annotation. This is especially important given our dependence on automatic methods for assigning functions to the raw sequence data issuing from genome projects. Rationalising this process by creating a single coherent resource for diagnosis and documentation of protein families is difficult, given entirely different database formats, different search tools and different search outputs. InterPro is an attempt to address some of these issues. This new resource provides an integrated view of a number of commonly used pattern databases, and provides an intuitive interface for text- and sequence-based searches.

Flat-files submitted by each of the groups were systematically merged and dismantled. Where relevant, family annotations were amalgamated, and all method-specific annotation separated out. This process was complicated by the relationships that can exist, both between entries in the same database, and between entries in different databases. Different types of parent-child relationship were evident, leading to the differentiation into ‘sub-types' and ‘sub-strings'. A sub-string means that a motif or motifs are contained within a region of sequence encoded by a wider pattern. Examples would be; a PROSITE pattern is typically contained within a PRINTS fingerprint; or a fingerprint might be contained within a Pfam domain. A sub-type means that one or more motifs are specific for a sub-set of sequences captured by another more general pattern . Examples would be; a super-family fingerprint may contain several family- and sub-family-specific fingerprints; or a generic Pfam domain may include several family fingerprints.

Protein Databases <<< 9/11 >>>



References:

Apweiler R., Attwood T.K., Bairoch A., Bateman A., Birney E., Biswas M., Bucher P., Cerutti L., Corpet F., Croning M.D.R., Durbin R., Falquet L., Fleischmann W., Gouzy J., Hermjakob H., Hulo N., Jonassen I., Kahn D., Kanapin A., Karavidopoulou Y., Lopez R., Marx B., Mulder N.J., Oinn T.M., Pagni M., Servant F., Sigrist C.J.A., Zdobnov E.M. (2001) Nucl. Acids Res. 29, 37-40.



spacer
spacer