Driving genomics: EMBL-EBI and GA4GH

Driving genomics: EMBL-EBI and GA4GH

9 Nov 2017 - 16:37


  • The Global Alliance for Genomics and Health (GA4GH) is developing and testing the data standards needed to bring genomics into the clinic and enable precision medicine. This work is being spearheaded by 15 real-world ‘Driver Projects’.
  • EMBL-EBI is involved in three Driver Projects through its molecular archives (ENA, EGA and EVA), membership in ELIXIR and collaboration in the Human Cell Atlas data-coordination platform.
  • GA4GH has delivered its first specification: a data-retrieval interface called ‘htsget’, developed with significant input from EMBL-EBI.

EMBL-EBI’s genomic data resources are among 15 formal collaborations launched this year by the Global Alliance for Genomics and Health (GA4GH), the standards-setting body for genomics in healthcare. Other ‘Driver Projects’ include Genomics England, the Human Cell Atlas and the All of Us Research Program in the US. Chaired by EMBL-EBI Director Ewan Birney, GA4GH seeks to enable the responsible sharing of clinical-grade genomic data by 2022.

First standard delivered

Htsget’, an application program interface (API) for genomic data retrieval developed with significant input from EMBL-EBI, is the first genomics standard under ‘GA4GH Connect’, the five-year strategic plan announced in October.

Htsget enables researchers to download sequencing data in bulk, selecting only the most relevant sections of their genome of interest. Htsget builds on existing standards and is a practical interface for anyone working with genomic data. It saves significant storage and compute power by avoiding the need to download whole files.

Real-world scenarios

Most human-genome datasets have been generated in research, but now the vast majority are expected to be generated by healthcare. That is a paradigm shift for human genetics research.

"We have a responsibility to enable this future for everyone."
-Ewan Birney, GA4GH Chair

GA4GH Driver Projects are established international genomic data initiatives. They were selected to identify, develop and pilot data-sharing frameworks and standards in real-world settings. This ensures GA4GH efforts are rooted in the immediate, practical needs of the scientific and clinical communities.

GA4GH: genomics in healthcare 2017

Detail from sharing data saves lives' infographic on the GA4GH website. This shows just a few examples of genomic-medicine programmes that were active in 2017.


One driver project is enabling a ‘federated’ approach to data sharing in healthcare. It focuses on interoperability between databases through open, standardised formats and APIs. It combines the efforts of three public data resources at EMBL-EBI:

  • The European Nucleotide Archive (ENA): an open data resource that provides experimental workflows for nucleotide sequencing.
  • The European Genome–phenome Archive (EGA): a resource for secure, controlled-access, permanent archiving and sharing of human data – both genetic and phenotypic – produced in biomedical research projects (co-managed by EMBL-EBI and the Centre for Genomic Regulation, Barcelona).
  • The European Variation Archive (EVA): an open-access database of all types of genetic variation data from all species.

These services already provide the infrastructure for sharing genetic data in research. Now, they are helping GA4GH create practical data-sharing standards for clinical research and healthcare around the world.

“Now, genomic data for healthcare is warehoused in many locations, in many countries. Our goal is to connect them, kind of like a postal service between data infrastructures – a secure, consistent delivery system that gets the right data to the right recipient,” says Thomas Keane, Team Leader for EGA infrastructure at EMBL-EBI and co-lead of the GA4GH Large Scale Genomics Work Stream.

Responsible data sharing

EMBL-EBI provides essential services to the scientific community by providing open data and tools for analysis. Its involvement in GA4GH will ensure it can address the needs of new end-users, and adapt its service delivery in response to the changing needs of clinical research communities.

“Healthcare is harnessing the power of genomics to make better diagnoses and treatment decisions in rare disease and cancer across the world,” says Birney. “We have a responsibility to enable this future for everyone, and to harness the resulting data for further research on human health and fundamental biology.”

Discover more

Read the GA4GH Roadmap here.


Birney E, Vamathevan J, Goodhand P (2017) Genomics in healthcare: GA4GH looks to 2022. bioRxiv (preprint). Published online 15 October; doi: 10.1101/203554

Videos and presentations

GA4GH Driver Projects in 2017

  1. NIH All of Us Research Program
  2. Australian Genomics
  3. BRCA Challenge
  4. CanDIG
  5. ClinGen
  6. ELIXIR Beacon
  7. EMBL-EBI archives: ENA / EVA / EGA
  8. Genomics England
  10. Matchmaker Exchange
  11. Monarch Initiative
  12. NCI Genomic Data Commons
  13. Variant Interpretation for Cancer Consortium
  14. Human Cell Atlas
  15. TopMed: Trans-Omics for Precision Medicine Program

Get involved

If you are interested in getting involved, start by looking through the GA4GH Work Streams, which provide guidance to GA4GH projects for regulatory and ethics, data security and standards development in genomics. 

Contact the news team

Oana Stroe
Communications Officer
+44 (0)1223 494 369

Subscribe to the e-mail newsletter
Get a monthly round-up of the hottest news and features from EMBL, straight to your inbox.
Or stay updated with the RSS feed (EMBL-EBI only).