IntAct Logo


Curated Datasets

Curated datasets are publications tagged, either computationally or manually by a curator, as being relevant to a specific area of biology. These are actively maintained and grow with every release.
New datasets can be requested, if relevant to your work, by mailing

Manually selected datasets

  • Affinomics - Interactions curated for the Affinomics consortium.

    This dataset contains interactions which have been derived in the context of the EU Affinomics project (Grant number 241481). This comprises interactions directly submitted by the consortium partners as well as interaction derived from the literature. The current focus is on interactions derived by Proximity Ligation Assay (MI:0813), a method pioneered by the group of consortium partner Ulf Landegreen.

    webappLogoPSI 2.5
  • Alzheimers - Interaction dataset based on proteins with an association to Alzheimer disease

    The compilation of this dataset and its curation was carried out in collaboration with Perreau V.M. University of Melbourne, Australia. Interactions were investigated in the context of Alzheimers disease with a particular focus on APP (A4) protein. The articles to be curated were determined based on protein annotations and literature scanning.

    Publications based on this dataset: PMID: 20391539

    webappLogoPSI 2.5
  • BioCreative - Critical Assessment of Information Extraction systems in Biology

    The Biocreative dataset is a large dataset of curated publications from the Journal of Biological Chemistry (2006) and Nature publishing house which were manually curated by IntAct curators. This dataset has been used in BioCreative II (Critical Assessment of Information Extraction systems in Biology): Protein-Protein Interaction Task . The protein-protein interaction task focused on the prediction of protein interactions from full text articles, which are represented in the Biocreative dataset. The Biocreative dataset provided by IntAct is a resource for text mining development and testing. The data file (source-text.txt) that provides a mapping between IntAct interactions and the sentence(s) of the publication that allowed an IntAct curator to identify the interaction is available here.

    Publications based on this dataset: PMID: 18834496, 18834487, 19208158

    webappLogoPSI 2.5
  • Cancer - Interactions investigated in the context of cancer

    This dataset consists of interactions of proteins that are involved in cancer. An ongoing literature survey was carried out to determine publications of interest. Protein annotations were also considered when choosing the publications to be curated.

    webappLogoPSI 2.5
  • Cardiac - Interactions involving cardiac related proteins

    A collection of interactions relating to proteins identified as associated with the cardiovascular system. These annotations create a PPI network which can be used to advance the understanding of protein interactions within the cardiovascular system. The gene lists have been assembled by the Cardiovascular Gene Annotation group at the University College, London and the dataset has also largely been assembled by that group, funded by the British Heart Foundation grant RG/13/5/30112, with the help of IntAct curators located at the EBI. This work is a collaboration with the Cardiac Proteomics and Signalling Laboratory at UCLA, funded by NHLBI Proteomics Center Award HHSN268201000035C.

    webappLogoPSI 2.5
  • Chromatin - Epigenetic interactions resulting in chromatin modulation

    Chromatin relevant protein-protein interaction studies have been curated by IntAct curators from peer reviewed literature. These comprise interactions which are involved in modulating, modifying or forming chromatin. This dataset aims at capturing major epigenetic interactions resulting in chromatin modulation. Most of the publications were derived from 'Chromatin Papers ListServe' maintained by Bone J.

    webappLogoPSI 2.5
  • Cyanobacteria - Interaction dataset based on Cyanobacteria proteins and related species

    This dataset was obtained in a collaboratice effort with Franck Chauvat, Corinne Cassier-Chauvat, Jean-Cristophe Aude, Magali Michaut, and Pierre Legrain from DBJC, CEA Saclay, Gif-Sur-Yvette, France. Cyanobacteria like Synechocystis sp. can be used as model organism as they undergo both oxidative respiration and photosythesis. Cyanobacteria have many features common to bacteria including a lack of compartmentalisation. This dataset is used to gather articles showing interactions relevant to plant photosynthesis, redox metabolism, resistance to metal and oxidative stress. Most interactors belong to Cyanobacteria species, with a focus on Synechocystis sp. (strain PCC 6803), TaxID 1148, but some interactors belong to species where biological events seen in Cyanobacteria may also occurs like plants for photosynthesis. Also, the dataset contains a number of hybrid experiments using electron transfer between proteins from different species.

    Publications based on this dataset: PMID: 18508856

    webappLogoPSI 2.5
  • Diabetes - Interactions investigated in the context of Diabetes

    This dataset consists of interactions of proteins that are involved in diabetes.

    webappLogoPSI 2.5
  • Parkinsons - Interactions investigated in the context of Parkinsons disease

    Interactions were investigated in the context of Parkinsons disease with a particular focus on LRRK2 protein and were derived in the context of the The Michael J. Fox Foundation for Parkinson's Research LRRK2 Biology LEAPS Award 2012.

    webappLogoPSI 2.5

Computationally maintained datasets

These datasets are computationally maintained but additional papers may be manually added to this set by a curator during the curation process. When datasets are computationally added to a publication, the large scale papers (more than 100 interactions per experiment) are excluded.
  • AFCS - Interactions from the Alliance for Cell Signaling database

    This dataset was obtained from the Alliance for Cell Signaling database. The Alliance of Cellular Signalling (AFCS) consisted of around 20 institutions which were engaged in a collaborative effort to investigate and understand cellular signalling networks ( The AfCS used high-throughput methods to detect protein-protein interactions between signaling molecules expressed in B cells and cardiac myocytes. The AfCS arranged a collaboration with Myriad Genetics to perform large-scale yeast two-hybrid screens. IntAct acted as a data repository of protein-protein interaction data generated by the AFCS project.

    webappLogoPSI 2.5
  • Apoptosis - Interactions involving proteins with a function related to apoptosis

    Datasets of apoptosis relevant protein-protein interaction studies are curated by IntAct curators from peer reviewed literature. These datasets are a resource for biologists seeking to understand protein interaction networks and cell death. Small-scale interactions involving proteins annotated with the GO terms "Apoptosis" are included in this set.

    webappLogoPSI 2.5
  • Archaea - Interaction dataset based on Archaea proteins

    Archaea are phylogenetically very different from Bacteria and Eukarya and show many differences in their biochemistry from other forms of life. This was considered of interest and peer reviewed literature that is curated is scanned for interactions involving proteins from this group.

    webappLogoPSI 2.5
  • PDBe - Data obtained from the Protein Data Bank Europe

    The Protein Data Bank in Europe (PDB3) is the European project for the collection, management and distribution of data about macromolecular structures, in collaboration with Worldwide Protein Data Bank (wwPDB). IntAct has incorporated a subset of the data from this database involving heterodimeric protein interactions.

    webappLogoPSI 2.5
  • NDPK - Interactions involving proteins containing InterPro domain IPR001564, Nucleoside diphosphate kinase, core.

    NDPKs, which play a major role in the synthesis of nucleoside triphosphates other than ATP, also possess other enzymatic activities and are required for cell proliferation, differentiation and development.

    Publications based on this dataset: PMID: 19415463

    webappLogoPSI 2.5
  • Synapse - Interactions of proteins with an established role in the presynapse.

    This dataset has been created for proteins-protein interactions involving at least one protein with an established link to the synapse. The list of human, rat and mouse gene names used for computationnally maintaining this dataset are available here. Interactions made by orthologous proteins have been added manually by IntAct curators.

    webappLogoPSI 2.5
  • Virus - Publications including interactions involving viral proteins.

    The MINT and HPIDb databases are major contributors to this dataset.

    webappLogoPSI 2.5

Species-based datasets

Species specific datasets are generated from the protein-protein interaction data curated from peer reviewed journals and are available here. The data are based on the taxonomy of the proteins taking part in the interaction. Analysis of one such dataset, which involved Arabidopsis proteins has been discussed in PMID: 20371643.