Understanding physical and functional interactions between molecules in living systems is of vital importance in biology. Several powerful methodologies and techniques have been developed to generate molecular interaction data, concentrating mainly on protein–protein interactions (Figure 1) 10.
Given the importance of protein-protein interactions and their vast numbers in comparison with datasets involving other types of molecules, we focus on them in this course.
Molecular interaction data can be generated using many different techniques, all of which have their strengths and weaknesses. However, it is important to stress that all molecular interaction data is to some degree artifactual. No single method can accurately reproduce a true binary interaction observed under physiological conditions.
The "boom" in molecular interaction research that we have experienced in the past few years has been caused by the increasingly wide availability of high throughput technologies that can potentially provide information on several thousand pairwise interactions at a time 11. Such high-throughput studies can provide a global 'snapshot' of the molecular interactions that take place in a cell, an organism or as particular physiological context. This is known as the interactome.
Understanding the cellular machinery and identifying interactions that underpin particular physiological processes relies on the retrieval, organisation and analysis of these valuable data 11. Efforts have been made in the protein interaction field towards addressing this challenge.
Figure 1. Data obtained by interaction detection methods are stored and represented in databases. This figure is adapted from Koh et al.5
This course will review some of the main techniques used to produce protein interaction data and discuss their respective advantages and disadvantages. We will discuss how you should regard the reliability of each of the methods. We will also explain how the experimental data are captured electronically.
In nucleotide sequence databases, sequence data is represented simply as a string of letters. Representing protein interaction data is somewhat more complex. We need to use the correct identifiers for the molecules reported to interact; we also need to record the method used to detect the interaction, among other relevant information. Although there are ongoing significant international efforts to standardise how such information is reported and described and enable exchange of data among different public repositories, this field is not as mature as the nucleotide sequence-data field.