What is proteomics?


Proteomics is the large-scale study of proteomes. A proteome is a set of proteins produced in an organism, system, or biological context. We may refer to, for instance, the proteome of a species (for example, Homo sapiens) or an organ (for example, the liver). The proteome is not constant; it differs from cell to cell and changes over time. To some degree, the proteome reflects the underlying transcriptome. However, protein activity (often assessed by the reaction rate of the processes in which the protein is involved) is also modulated by many factors in addition to the expression level of the relevant gene.

Proteomics is used to investigate (see Figure 1):

  • when and where proteins are expressed;
  • rates of protein production, degradation, and steady-state abundance;
  • how proteins are modified (for example, post-translational modifications (PTMs) such as phosphorylation);
  • the movement of proteins between subcellular compartments;
  • the involvement of proteins in metabolic pathways; 
  • how proteins interact with one another. 
Proteomics can provide significant biological information for many biological problems, such as:
  • Which proteins interact with a particular protein of interest (for example, the tumour suppressor protein p53)? (Human example)
  • Which proteins are localised to a subcellular compartment (for example, the mitochondrion)? (Human example)
  • Which proteins are involved in a biological process (for example, circadian rhythm)? (Human example)
Several high-throughput technologies have been developed to investigate proteomes in depth. The most commonly applied are mass spectrometry (MS)-based techniques such as Tandem-MS and gel-based techniques such as differential in-gel electrophoresis (DIGE). These high-throughput technologies generate huge amounts of data. Databases are critical for recording and carefully storing this data, allowing the researcher to make connections between their results and existing knowledge. The EBI hosts up-to-date and accurate databases to enable rapid searching and retrieval of these data. The four major databases related to proteomic research (UniProtKB, IntAct, Reactome and PRIDE) are described in the next section. These four databases (especially UniProtKB) draw from gene sequence data (e.g. Ensembl) and annotation tools (e.g. InterPro) also hosted by the EBI. You can find out more about such resources in other training courses, such as the Introduction to Functional Genomics Resources, and the InterPro Quick Tour.
Areas of proteomics
Figure 1. Areas of proteomics. Proteomic experiments generally collect data on three properties of proteins in a sample: location, abundance/turnover and post-translational modifications. Depending on the experimental design, researchers may be directly interested in these data, or may use them to infer additional information. For example, it may be possible to infer a protein's interaction partners among others that are colocalised with it, or to assess whether a protein is active from its post-translational modifications.