Illuminating the ‘druggable genome’

Illuminating the ‘druggable genome’

23 Feb 2018 - 10:50

Scientists in the international Illuminating the Druggable Genome (IDG) project have revealed that around 40% of the genome remains unexplored, and propose an elegant method for redressing the balance. Published in Nature Reviews Drug Discovery, the study describes a new way to survey all regions of the human genome – including the ‘dark’ areas – to determine whether the proteins they encode are likely targets for novel drug therapies.

The problem

Biological research tends to focus on areas of the human genome that are already well described, in part because of the way funding models work. Scientists in the Illuminating the Druggable Genome (IDG) project, a National Institutes of Health Common Fund initiative, have proposed a way to solve the problem.

“Almost 9,000 proteins are not currently being subjected to NIH-sponsored research,” said lead author Tudor Oprea, MD, PhD, professor and chief of the Translational Informatics Division in the University of New Mexico’s Department of Internal Medicine. “This study attracts attention to that situation. We hope to encourage scientists and funders to explore the unknown.”

What they did

The new method involves categorising areas of the human genome according to how extensively genes – and the proteins they code for – have been studied. Based on the new classification, steps could be taken to establish whether proteins in different categories are likely targets for an array of novel drug therapies.

In this study, Oprea led IDG scientists in tracking the target development level of human proteins. To do so, they conducted an exhaustive review of a wide array of genomic, proteomic, chemical and disease-related data.

ChEMBL is EMBL-EBI’s public database of bioactive entities. SureChEMBL is an associated database containing molecules reported in the patent literature. Scientists working at EMBL-EBI mined ChEMBL and SureChEMBL to find relevant information about underexplored targets.

Map of Illuminating the Druggable Genome

How the solution works

In the new method, human proteins are grouped into one of four target development levels:

  • Clinical: these are the most-studied proteins – ones that have known interactions with approved drugs. Scientists have already identified their mechanism of action. These proteins represent 3% of the human proteome.
  • Chemical: These are proteins that bind to small molecules (i.e. drugs) with relatively high potency. This group accounts for 6% of the proteome.
  • Biological: Experimental evidence shows that these proteins are relevant to disease. Scientists have some understanding about their structure and function, but they have not been fully developed as drug targets. This group accounts for about 53% of the proteome.
  • Dark:The ‘dark genome’ includes all proteins that do not meet the criteria for inclusion in any of the other categories. According to the study, some of these proteins represent “unexplored opportunities within the druggable human genome”. These proteins account for 38% of the proteome.


“People working in drug discovery need readily available, integrated data and information about proteins that are linked to disease and the IDG project has delivered this, particularly for those proteins that are less studied” says Andrew Leach, head of Chemistry resources at EMBL-EBI.

“Projects like the IDG make it easier to make informed decisions,” he adds. “They offer tools and platforms that help you get into your research project and make decisions on possible areas of interest more quickly because you have information at your fingertips. In particular, the IDG is actively encouraging the community to look beyond the relatively small proportion of well-studied targets everyone’s been working on for years, and expand their horizons.”

Looking into the unknown

Research funding is often focused on previously studied protein targets, and scientists can have difficulty in attracting funding to study the so-called ‘dark genome’.

Partitioning the human genome into these categories highlights how science and drug discovery efforts have focused on different targets. More importantly, it helps to identify less studied but potentially new promising areas. This provides valuable insights for both researchers and funders.

Only around 10% of all the proteins made in a human being – the human proteome – have been verified as biological targets on which medicines can act. “This underlines the extent to which we don’t fully understand human biology. There’s a lot more work to be done,” says Oprea.

Source article

OPREA, T., et al. (2018). Unexplored therapeutic opportunities in the human genome.. Nature Reviews Drug Discovery (in press). Published online 23 February; 10.1038/s41467-018-03149-4

Read more on the University of New Mexico website.

Illuminating the Druggable Genome

The goal of the Illuminating the Druggable Genome (IDG) Program is to improve our understanding of the properties and functions of proteins that are currently not well studied within common drug-targeted protein families such as kinases and G-coupled protein receptors.

The pilot phase of the program developed a website that integrates information about understudied proteins so that researchers everywhere can easily access it, catalyzing their own research and helping them find new proteins that may be of interest. During the recently announced Implementation Phase, IDG is expanding its informatics tools, elucidating the function of understudied proteins from three key druggable protein families and disseminating IDG resources to the greater scientific community. 


Contact the news team

Oana Stroe
Communications Officer
+44 (0)1223 494 369

Subscribe to the e-mail newsletter
Get a monthly round-up of the hottest news and features from EMBL, straight to your inbox.
Or stay updated with the RSS feed (EMBL-EBI only).