What is IntAct?

What is IntAct?

IntAct is a central, public repository where molecular interactions data can be stored and accessed (Figure 1). It is hosted in the European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK, where it is maintained by a group of curators and developers.

Graphical display of the IntAct entry page

Figure 1. Graphical display of the IntAct entry page. The homepage may change with new releases, but the above is a correct representation at time of writing (October 2014).

Where do the data come from?

We populate IntAct with interaction data from literature curation or direct user submissions (Figure 2). Most of the data refer to protein–protein interactions, but interactions involving other types of molecules, such as small chemical compounds or nucleic acids, can also be found in IntAct.

 

Sources of interaction data used in a IntAct entry

Figure 2. Sources of interaction data used in a IntAct entry.

Because our aim is to provide molecular interaction datasets to the scientific community, IntAct is freely available and uses an open source database system and analysis tools: it can be locally installed and adapted to the needs of the local organisation. This reduces development time and encourages researchers to build consistent interaction datasets by using the same infrastructure and annotation system. 

Controlled vocabularies

The database makes extensive use of controlled vocabularies, (lists of defined terms), which allow consistent description of the experimental detail that was used to generate the data. These vocabularies are hierarchically organised and where possible cross-referenced with other controlled vocabularies such as the NCBI taxonomy or Gene Ontology. The main vocabulary that is used by IntAct to define and describe molecular interactions is the PSI-MI controlled vocabulary, defined by the IMEx consortium (see next page). This can be navigated using the Ontology Lookup Service hosted by the EBI  (look it up under "Molecular Interactions (PSI-MI 2.5)" in the drop-down menu, Figure 3).

 

A glimpse of how the PSI-MI controlled vocabulary is organised

Figure 3. A glimpse of how the PSI-MI controlled vocabulary is organised. Here we see how a controlled vocabulary/ontology term is shown in the ontology lookup service. In these three different views you can navigate the ontology following its tree-like structure (left), see the relationships and dependencies that a particular term has in this structure (middle) or check its definition and other information such as accessions, synonyms or cross references (right).  

International Molecular Exchange Consortium

IntAct is a member of the International Molecular Exchange (IMEx) Consortium – a group of major public interaction data providers that share curation effort and exchange completed records on molecular interaction data (Figure 4). The PSI-MI controlled vocabulary shown in the previous slide is one of the tools that the consortium has produced with the aim of standardising the way that interactions are represented by consortium members.

When you query data in IntAct, you also access millons of interactions from different data resources via the PSICQUIC (Protemics Standard Initiative Common QUery InterfaCe) service or a consistently annotated, non-redundant, experimentally determined subset from the IMEx Consortium.

 

The consortium members that contribute to the IMEx and PSICQUIC

Figure 4. Consortium members that contribute to the IMEx and PSICQUIC.

 

Each data provider in the IMEx consortium contributes to varying depths of curation as exemplified by the previous figure. Shallow curation is used if the main goal is to re-publish selected content with minimum effort so that it complies to the MIMIx standards –  the minimum information required for reporting a molecular interaction experiment and a lightweight version of the IMEx guidelines 1. Deep curation requires a detailed description of all the features involved in the interaction and interacting partners and complies with the full version of the IMEx guidelines, therefore requiring more time and resources.