Introduction

Exon/Intron Annotation of Proteins (ENTRAP) is a resource for annotating protein chains of known structure with intron and exon data.
The ENTRAP web pages are an interface to two database:

  1. The Intron Exon Sequence Database (INXS)
  2. The Protein Structure Database (PSDB)

INXS is a database of protein coding sequences that contain introns. These coding sequences have been extracted from GenBank (release 136.0).
PSDB is a database of protein chains that have been derived from the Protein DataBank. In addition related annotation derived from CATH, SCOP, and PDBsum is also stored. Homologous sequences in the INXS database are identified using BLAST sequence alignments.

Searching the database

The database search page can be accessed using the Tools control box. A protein is retrieved by entering a PDB identifier and optional chain identifier. The results page will then be displayed showing all the coding sequences in INXS that match this protein chain. If more than one matching chain is found with the same identifier then a list of possible matches is displayed.

The search results can be filtered by percentage sequence identity and e-value. There are also options to restrict the results to full length matches and to filter out predicted coding sequences.

The result pages produced by ENTRAP consist of a number of control boxes, which appear in the left hand side border, and a central results section. If homologous sequences are found in the INXS database then two results pages are produced: a graphical representation page and a sequence alignment page, which can be switched between using the "Related Pages" control box:

Graphical Representation

The graphical alignment representation is the default page loaded after a search is completed. It is split into three sections: a protein summary, annotation control buttons (allowing different annotation schemes to be displayed, and a graphical representation of the protein alignments.

Annotated Graphical Result Example

This graphic is a representation of how the matching INXS coding sequences align to the query protein sequence. The query is displayed at the top of the image and is annotated by SCOP domain. The annotation control buttons allow CATH domains to be displayed if required. Beneath the query protein is a consensus view of all the intron positions found in the matching coding sequences. Below this consensus the individual INXS hits are displayed.

Each hit is aligned in a pairwise fashion to the query chain. Gaps are not shown so that all equivalent amino acids will be aligned. Each exon in the hit is individually coloured. Above a coding sequence the position and phase (0,1,2) of any introns are shown. If the intron occurs in a gap then it is represented by a "G".

On the far left of the graphic are select boxes assigned to each hit. These are used with the recalculate button to reproduce the output containing only the selected hits. Next to these boxes is a column containing the PDB identifier of the protein chain and the GenBank accession numbers of all the hits. These link to the appropriate database entry for the row. The column starting with "Shade All" contains the unique INXS ID for each hit. Clicking any of these Ids or "Shade All" can be used to annotated a 3D model of the protein using Rasmol.

If your browser is configured correctly clicking on the "launch" link in the "Rasmol Control" control box will open a rasmol window displaying the protein chain. Clicking any of the ids for a hit will colour all the displayed annotation on the rasmol structure. Any individual annotated element can also be displayed on its own by clicking the related section of the output. The "exit" link of the "Rasmol Control" control box will close rasmol.



Sequence Alignment

The sequence alignment results can be accessed via the "Seq Alignment" link of the "Related Pages" control box. It is split into three sections: a summary of all the hits, the sequence alignment, and a summary of the BLAST results for each hit.

The alignment shows all the matching INXS coding sequences in relation to the query chain. The query protein is coloured by SCOP domain and secondary structrue is given in letter codes above the chain. Each INXS sequence is aligned in a pairwise fashion to the query (so equivalent residues are aligned). Gaps are represented by a "-" and those regions where a gap occurs in another hit but not this one are represented by a "*".

The start and stop of the alignment is coloured for each hit. Any amino acids associated with introns are also coloured (in the case of phase 0 introns the residue after the intron is coloured). If the alignment start is also associated with an intron then the intron annotation takes precedence.