IDA search tool documentation
This documentation relates to the search by domain architecture tool. Here you will find a short description of the tool, some basic information on how to use it.
The InterPro Domain Architecture (IDA) tool allows you to search the InterPro database with a particular set of domains, and returns all of the domain architectures and associated proteins that match the query. This makes it easy to rapidly identify all of the different domain combinations where one type of domain co-occurs with another, or a particular domain is followed by another (e.g., an SH3 domain is found C-terminal to a protein kinase domain, or vice versa), and to list the proteins that match each domain architecture.
The tool uses a specialised algorithm, developed in-house, that rapidly searches through all domain matches within InterPro and returns proteins that match the domains in the order specified in the query. Since InterPro integrates data from a number of different member databases whose domain boundary predictions do not always agree, InterPro domains may overlap. This is in contrast to the ‘beads on a string’ representation that is sometimes used to display domain architectures.
To start a query, use the filter panel on the left hand side, and either click on the add/remove button or click in the domain viewer box, where it says ‘Click here to add domains’. A pop-up window (see Figure 1) will then list all of the domain entries in the InterPro database. This interactive list can be refined using the free text ‘Search’ box in the top right of the pop-up window. Alternatively, the list can be narrowed down alphabetically or numerically, using the ‘Select’ dropdown menu on the top left hand side. Once an appropriate domain has been identified from the list, it can be added to or removed from the query, using the plus or minus buttons next to each individual domain name. The same domain can be added multiple times in this way – this number is displayed in the box to the right of the plus and minus buttons.
Once the required number of different domains have been selected, pressing the ‘Apply’ button will perform the appropriate database query. The selected domains will be indicated graphically as a coloured square in the domain viewer on the left hand side, and the domain architecture results, along with the number of proteins matched by each domain architecture, will be displayed in the result page (see Figure 2).
The results page shows the proteins in UniProt that match the query, grouped by IDA, with different IDAs displayed on separate rows. For each IDA, a summary view is provided, showing the positions where the domains match on a representative protein sequence, with exact amino acid coordinates available via mouse over. Each IDA also has the total number of matching UniProt proteins listed. Clicking on this link returns the full list of proteins, which will become available to download in FASTA format in future versions of the tool.
In addition, a text string accompanies the graphical version of the IDAs, summarising their composition. InterPro accession numbers are shown, linked by ‘~’ or ‘-‘ symbols, depending on whether the corresponding domains overlap or are adjacent to each other, respectively (overlapping domains being defined where at least 20% of the shortest domain is covered by a longer domain). For example, the string ‘IPR001452 ~ IPR000980 - IPR020635’ would indicate an SH3 domain that overlaps with an SH2 domain, with a tyrosine-protein kinase catalytic domain adjacent to the SH2 domain. As the string is essentially a 2D representation of the IDA, the linking symbols only refer to the relation of the preceding domain. Note, if two signatures match at exactly the same start amino acid, the domains will be randomly chosen when constructing the IDA. This can lead to the same domain architecture being classified into two different IDAs, but this is a rare case.
Activating ‘Order sensitivity’ via the checkbox in the domain architecture panel (see Figure 3) means that the order in which the domains are placed (from left to right on the panel, indicating N- to C-terminal) is reflected in the search results. The domains can be reordered by dragging and dropping their graphical representations, which will automatically update the search results. Note that domain overlaps are not currently catered for with this type of search.
The domain selection can be reconfigured using the ‘Add/remove domains’ button. Individual domains can also be removed from the selection by dragging their graphical representation to the bin icon or by clicking on the [x] icon next to the domain’s name and InterPro accession number in the domain architecture panel.
What does the colour of domains mean?
Each InterPro domain has a specific colour. The colour of a domain in the domain architecture viewer will match the colour of the domain in the resulting protein sequence summary views. If two domains have the same colour then it can mean that they are related in a same hierarchy (domain relationship).
Future implementation of the IDA search tool will include some of the following features:
- the option to insert gaps and overlaps between domains, exclusion of domains.
- the possibility to have multiple filters, to improve the search (like filter for species and sequence length).
- the ability to export a full list of matching proteins.
- the possibility to sort columns and change number of records per page, for list of proteins table.
More questions? How to contribute?
Please do not hesitate to contact us if you have more questions, or if there is something that doesn't work properly when you use the domain architecture search.