Guided example: searching InterPro with an amino acid sequence
First, we need to locate the sequence search box on the InterPro homepage.
Now we need a sequence to search against the database.
We will now go through the search results in the slideshow below.
Sequence search results: overview page
The search result page should look something like the one shown in Figure 13 below. This is an interactive page that contains a large amount of information.
Section A shows the family to which InterPro predicts the sequence belongs. This is displayed as a hierarchy, where appropriate. Clicking the link will take you to the InterPro entry page for the family, where detailed information about its function may be found.
Section B summarises the domain and repeats that InterPro predicts the protein to contain. The sequence is represented as a grey bar with its length in amino acids displayed along the bottom. Domains and repeats are indicated as coloured bars. Mousing over the bars reveals the type of domain or repeat that they represent, along with their position on the sequence and a link to the relevant InterPro entry page.
Section C holds detailed signature match information, showing the raw match position of all the different signatures in InterPro to the sequence, including (where available) signatures representing families, domains, repeats and sites, and unintegrated signatures that are not associated with InterPro entries. The information displayed in this section can be controlled using the interactive menu on the left hand side of the screeen (Section D).
Section E shows the Gene Ontology (GO) terms predicted for the protein. These terms are assigned based on the matches to the InterPro entries shown above.
Figure 13. InterPro sequence search results overview page.
Sequence search results: domain information (1)
Looking at the results overview page, we will first examine the domains that InterPro predicts the query protein to contain.
Figure 14. Mousing over domains reveals their name, position and the InterPro entries with which they are associated.
Mousing over the left hand domain, we see its name (death domain) and InterPro accession number (IPR000488), its position on the sequence (amino acid residues 5-125), and that it belongs to a wider class of death-like domains.
Examining the entry page for the death domain (Figure 15), we can see that our domain is a protein-interaction module that is involved in the association of receptors so that they can signal downstream events, including programmed cell death (apoptosis). The other proteins in UniProtKB that InterPro predicts to contain such a domain can be accessed by clicking the 'Proteins matched' link on the left hand side menu. The species in which these proteins are found, the pathways in which they may be involved, and any solved protein structures can also be accessed from the left hand menu, via the 'Species', 'Pathways & interactions' and 'Structures' links, respectively.
Figure 15. InterPro entry page for the death domain.
Sequence search results: domain information (2)
Navigating back to the search results page, we can examine the second domain that InterPro predicts the query protein to contain.
Figure 16. Mousing over the second domain reveals its name, amino acid residue position and a link to the relevant InterPro page.
Mousing over the domain, we can see that it is a Toll/interleukin-1 receptor homology (TIR) domain. Following the link to the InterPro entry page, we can find out more about this domain, including its involvement in signalling (Figure 17). As with the death domain entry page, more information on the proteins predicted to contain a TIR domain (such as the species in which they are found and the pathways in which they are involved) can be found using the left hand side menu.
Figure 17. InterPro entry page for the TIR domain.
Sequence search results: family information
Returning to the results overview page, next we look at the family membership section (Figure 18). We can see that InterPro predicts that the protein belongs to the myeloid differentiation primary response protein MyD88 family.
Figure 18. Predicted protein family membership for the query sequence.
Clicking on the link will take us to the InterPro entry page for this family (Figure 19). This page explains what MyD88 is, what its functions are, and the GO terms that can be applied to members of the family. Clicking on the left hand menu allows us to examine the different UniProt proteins that are predicted to belong to the family (Proteins matched), their different domain architectures (Domain organisations) and the species in which they are found (Species). Clicking on any of these menu items opens a new page with the appropriate information and links that allow you to download specific datasets (e.g., all the proteins in a family, or all the proteins with a particular domain architecture, or all the protein family members from a particular kingdom, class or species).
Figure 19. InterPro entry page for the myeloid differentiation primary response protein MyD88 family of proteins.
Sequence search results: exploring other proteins in the family
Figure 20. The protein families side menu page, where links to more information on proteins belonging the same family can be found.
Following the 'Proteins matched' link takes us to a page like the one shown in Figure 21. This is a paginated list of UniProtKB proteins, showing their name, accession number, species and domain architectures (if known). Those sequences with a gold star are from Swiss-Prot, the manually annotated section of UniProtKB, whilst those with a silver star are from the automatically annotated section TrEMBL. Proteins for which structural information is available are shown with a '3D' icon, next to their accession number. A FASTA file containing the sequences of all of the proteins can be downloaded by clicking the 'Export FASTA' button at the top right of the page.
Figure 21. The proteins matched page, displaying UniProtKB proteins belonging to the same protein family as the query sequence.