Guided example: searching InterPro with an amino acid sequence

Getting started

In this guided example, we will look at searching InterPro with an amino acid sequence.

First, we need to locate the sequence search box on the InterPro homepage.

 

Steps

Navigate to the InterPro homepage and find the sequence search box.

 Now we need a sequence to search against the database.

 

StepsCopy and paste the following sequence into the sequence search box, and press 'Search'                                    

>MY_SEQ
MAAGGSGAESAPPTPSMSSLPLAALNVRVRHRLSLFLNVRTQVAADWTGLAEEM
NFEYLEIRRLETHPDPTRSLLDDWQGRPGASVGRLLELLAKLGRDDVLVELGPS
IEEDCRKYILKQQQEAAEKPLQVDSVDSSIPWMSGITIRDDPLGQMPEHFDAFI
CYCPSDIQFVQEMIRQLEQTNYRLKLCVSDRDVLPGTCVWSIASELIEKRCRRM
VVVVSDDYLQSKECDFQTKFALSLSPGAHQKRLIPVKYKSMKKEFPSILRFITV
CDYTNPCTKSWFWTRLARALSLP

We will now go through the search results in the slideshow below.

Sequence search results: overview page

The search result page should look something like the one shown in Figure 13 below. This is an interactive page that contains a large amount of information.

Section A shows the family to which InterPro predicts the sequence belongs. This is displayed as a hierarchy, where appropriate. Clicking the link will take you to the InterPro entry page for the family, where detailed information about its function may be found.

Section B summarises the domain and repeats that InterPro predicts the protein to contain. The sequence is represented as a grey bar with its length in amino acids displayed along the bottom. Domains and repeats are indicated as coloured bars. Mousing over the bars reveals the type of domain or repeat that they represent, along with their position on the sequence and a link to the relevant InterPro entry page.

Section C holds detailed signature match information, showing the raw match position of all the different signatures in InterPro to the sequence, including (where available) signatures representing families, domains, repeats and sites, and unintegrated signatures that are not associated with InterPro entries. The information displayed in this section can be controlled using the interactive menu on the left hand side of the screeen (Section D).

Section E shows the Gene Ontology (GO) terms predicted for the protein. These terms are assigned based on the matches to the InterPro entries shown above.

 InterPro sequence search results overview page

Figure 13. InterPro sequence search results overview page. 

Sequence search results: domain information (1)

Looking at the results overview page, we will first examine the domains that InterPro predicts the query protein to contain. 

 

StepsOn the results page, place your mouse over the left hand domain in the Domains and Repeats section (Figure 14). 

 

Mousing over domains reveals their name, position and the InterPro entries with which they are associated

Figure 14. Mousing over domains reveals their name, position and the InterPro entries with which they are associated.

Mousing over the left hand domain, we see its name (death domain) and InterPro accession number (IPR000488), its position on the sequence (amino acid residues 5-125), and that it belongs to a wider class of death-like domains.  

StepsClick on the link to visit the entry page for the death domain (IPR000488)

 

Examining the entry page for the death domain (Figure 15), we can see that our domain is a protein-interaction module that is involved in the association of receptors so that they can signal downstream events, including programmed cell death (apoptosis). The other proteins in UniProtKB that InterPro predicts to contain such a domain can be accessed by clicking the 'Proteins matched' link on the left hand side menu. The species in which these proteins are found, the pathways in which they may be involved, and any solved protein structures can also be accessed from the left hand menu, via the 'Species', 'Pathways & interactions' and 'Structures' links, respectively.

 

InterPro entry page for the death domain

Figure 15. InterPro entry page for the death domain.

Sequence search results: domain information (2)

Navigating back to the search results page, we can examine the second domain that InterPro predicts the query protein to contain.

StepsOn the results page, place your mouse over the right hand domain in the Domains and Repeats section (Figure 16). 

 

Mousing over the second domain reveals its name, amino acid residue position and a link to the relevant InterPro page

Figure 16. Mousing over the second domain reveals its name, amino acid residue position and a link to the relevant InterPro page.

Mousing over the domain, we can see that it is a Toll/interleukin-1 receptor homology (TIR) domain. Following the link to the InterPro entry page, we can find out more about this domain, including its involvement in signalling (Figure 17). As with the death domain entry page, more information on the proteins predicted to contain a TIR domain (such as the species in which they are found and the pathways in which they are involved) can be found using the left hand side menu.

 InterPro entry page for the TIR domain

Figure 17. InterPro entry page for the TIR domain.

Sequence search results: family information

Returning to the results overview page, next we look at the family membership section (Figure 18). We can see that InterPro predicts that the protein belongs to the myeloid differentiation primary response protein MyD88 family.

 

Predicted protein family membership for the query sequence

Figure 18. Predicted protein family membership for the query sequence.

StepsClick on the link to visit the entry page for myeloid differentiation primary response protein MyD88. 

Clicking on the link will take us to the InterPro entry page for this family (Figure 19). This page explains what MyD88 is, what its functions are, and the GO terms that can be applied to members of the family. Clicking on the left hand menu allows us to examine the different UniProt proteins that are predicted to belong to the family (Proteins matched), their different domain architectures (Domain organisations) and the species in which they are found (Species). Clicking on any of these menu items opens a new page with the appropriate information and links that allow you to download specific datasets (e.g., all the proteins in a family, or all the proteins with a particular domain architecture, or all the protein family members from a particular kingdom, class or species).

 InterPro entry page for the myeloid differentiation primary response protein MyD88 family of proteins

Figure 19. InterPro entry page for the myeloid differentiation primary response protein MyD88 family of proteins.

Sequence search results: exploring other proteins in the family

It is possible to find all of the UniProtKB proteins that match a protein family, using the 'Proteins matched' link on the left hand side of the family page (Figure 20).

                                                                                                            

Figure 20. The protein families side menu page, where links to more information on proteins belonging the same family can be found.

Steps

Click the 'Proteins matched' link to see other proteins in UniProtKB with the same family membership.

Following the 'Proteins matched' link takes us to a page like the one shown in Figure 21. This is a paginated list of UniProtKB proteins, showing their name, accession number, species and domain architectures (if known). Those sequences with a gold star are from Swiss-Prot, the manually annotated section of UniProtKB, whilst those with a silver star are from the automatically annotated section TrEMBL. Proteins for which structural information is available are shown with a '3D' icon, next to their accession number. A FASTA file containing the sequences of all of the proteins can be downloaded by clicking the 'Export FASTA' button at the top right of the page.

The similar proteins page

Figure 21. The proteins matched page, displaying UniProtKB proteins belonging to the same protein family as the query sequence.