 |
2Can Support Portal - Protein and Proteomic Analysis
Introduction
Determination of protein/peptide sequences is a basic requirement for biomedical research, including cancer research. It is absolutely essential for characterising and identifying proteins or peptides.
Imagine you are a Biologist, who has discovered an unknown peptide, perhaps theoretically translated from a nucleotide sequence, or isolated from a gel, which you have had sequenced. You will want to try and find out as much as you can about it. The first step in this process is to look for similarities with already discovered peptide sequences/proteins. This is accomplished by comparing the novel sequence with those contained in protein databases, the most important of these being UniProt.
The UniProt Knowledgebase (accessible as UniProt on the EBI SRS server) is a central database of protein sequence and function created by joining the information contained in UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, and PIR-PSD.
Until recently, the EBI/SIB UniProtKB/Swiss-Prot + UniProtKB/TrEMBL databases and the PIR Protein Sequence Database (PIR-PSD) coexisted as protein databases with differing protein sequence coverage and annotation priorities. In 2002, EBI, SIB, and PIR (at the Georgetown University Medical Center and National Biomedical Research Foundation) joined forces as the UniProt consortium.
Consisting of richly-annotated entries, the UniProtKB is the centerpiece of the consortium activities. Initially, the knowledgebase derived from the merge of UniProtKB/Swiss-Prot, UniProtKB/TrEMBL and PIR-PSD protein sequences with annotations of sequence and functional information. Future knowledgebase entries will be derived from the UniProt archive sequences we see as essential for the UniProt knowledgebase. For example, sequences for which novel functional, structural, and biochemical data has been published have high annotation priority. The UniProt knowledgebase consists of two parts, a section containing fully manually-annotated records resulting from information extracted from literature and curator-evaluated computational analyses, and a section with computationally-analysed records awaiting full manual annotation. For the sake of continuity and name recognition, the two sections are referred to as "UniProtKB/Swiss-Prot" and "UniProtKB/TrEMBL" respectively.
UniProt is composed of three parts:
- UniProtKB//Swiss-Prot: The current UniProtKB/Swiss-Prot release (including all updates and
new entries since the last release).
- UniProtKB/TrEMBL (SpTrEMBL) is a computer-annotated protein sequence database
supplementing UniProtKB/Swiss-Prot. It contains translations of protein coding
sequences in the EMBL Nucleotide Sequence Database.
- PIR-PSD (PIR Protein Sequence Database): The PIR, in collaboration with MIPS and JIPID, produces and distributes the PIR-International Protein Sequence Database (PSD), a comprehensive and expertly annotated protein sequence database in the public domain. The primary objective for its continuing development and enhancement is to achieve the properties of: Comprehensiveness, Timeliness, Non-Redundancy, Quality Annotation, and Full Classification.
All three UniProt components are updated bi-weekly. Therefore UniProt provides an
up-to-date view of the protein sequences currently publicly available. |
Comparing a novel sequence with those contained in protein databases is accomplished via any of the tools in this protein analysis tutorial, each of which has their own particular merits, speed of search and sensitivity issues.
|
|