Introduction
Exon/Intron Annotation of Proteins (ENTRAP) is a resource for
annotating protein chains of known structure with intron and
exon data.
The ENTRAP web pages are an interface to two database:
- The Intron Exon Sequence Database (INXS)
- The Protein Structure Database (PSDB)
INXS is a database of protein coding sequences that contain introns.
These coding sequences have been extracted from GenBank
(release 136.0).
PSDB is a database of protein chains that have been derived from the
Protein DataBank. In addition related annotation derived from
CATH, SCOP, and PDBsum is also stored. Homologous sequences in the
INXS database are identified using BLAST sequence alignments.
Searching the database
The database search page can be accessed using the Tools control box. A protein is retrieved by entering a PDB identifier and optional chain identifier. The results page will then be displayed showing all the coding sequences in INXS that match this protein chain. If more than one matching chain is found with the same identifier then a list of possible matches is displayed.
The search results can be filtered by percentage sequence identity and e-value. There are also options to restrict the results to full length matches and to filter out predicted coding sequences.
The result pages produced by ENTRAP consist of a number of control boxes, which appear in the left hand side border, and a central results section. If homologous sequences are found in the INXS database then two results pages are produced: a graphical representation page and a sequence alignment page, which can be switched between using the "Related Pages" control box:
Graphical Representation
The graphical alignment representation is the default page loaded after a search is completed. It is split into three sections: a protein summary, annotation control buttons (allowing different annotation schemes to be displayed, and a graphical representation of the protein alignments.
This graphic is a representation of how the matching INXS coding
sequences align to the query protein sequence. The query is
displayed at the top of the image and is annotated by SCOP domain.
The annotation control buttons allow CATH domains to be displayed if
required. Beneath the query protein is a consensus view of all the
intron positions found in the matching coding sequences. Below this
consensus the individual INXS hits are displayed.
Each hit is aligned in a pairwise fashion to the query chain. Gaps
are not shown so that all equivalent amino acids will be aligned.
Each exon in the hit is individually coloured. Above a coding
sequence the position and phase (0,1,2) of any introns are shown.
If the intron occurs in a gap then it is represented by a "G".
On the far left of the graphic are select boxes assigned to each hit.
These are used with the recalculate button to reproduce the output
containing only the selected hits. Next to these boxes is a column
containing the PDB identifier of the protein chain and the GenBank
accession numbers of all the hits. These link to the appropriate
database entry for the row. The column starting with "Shade All"
contains the unique INXS ID for each hit. Clicking any of these
Ids or "Shade All" can be used to annotated a 3D model of the
protein using
Rasmol.
If your browser is
configured correctly clicking on the "launch"
link in the "Rasmol Control" control box will open a rasmol window
displaying the protein chain. Clicking any of the ids for a hit
will colour all the displayed annotation on the rasmol structure.
Any individual annotated element can also be displayed on its own by
clicking the related section of the output. The "exit" link of the
"Rasmol Control" control box will close rasmol.
Sequence Alignment
The sequence alignment results can be accessed via the
"Seq Alignment" link of the "Related Pages" control box.
It is split into three sections: a summary of all the hits, the
sequence alignment, and a summary of the BLAST results for each hit.
The alignment shows all the matching INXS coding sequences in
relation to the query chain. The query protein is coloured by
SCOP domain and secondary structrue is given in letter codes above
the chain. Each INXS sequence is aligned in a pairwise fashion
to the query (so equivalent residues are aligned).
Gaps are represented by a "-" and those regions where a gap occurs
in another hit but not this one are represented by a "*".
The start and stop of the alignment is coloured for each hit. Any
amino acids associated with introns are also coloured (in the case
of phase 0 introns the residue after the intron is coloured). If
the alignment start is also associated with an intron then the
intron annotation takes precedence.
Maintained by: Gordon Whamond
Contact: whamond@ebi.ac.uk
Document: