EMBL-EBI’s genomic data resources are among 15 formal collaborations launched this year by the Global Alliance for Genomics and Health (GA4GH), the standards-setting body for genomics in healthcare. Other ‘Driver Projects’ include Genomics England, the Human Cell Atlas and the All of Us Research Program in the US. Chaired by EMBL-EBI Director Ewan Birney, GA4GH seeks to enable the responsible sharing of clinical-grade genomic data by 2022.
‘Htsget’, an application program interface (API) for genomic data retrieval developed with significant input from EMBL-EBI, is the first genomics standard under ‘GA4GH Connect’, the five-year strategic plan announced in October.
Htsget enables researchers to download sequencing data in bulk, selecting only the most relevant sections of their genome of interest. Htsget builds on existing standards and is a practical interface for anyone working with genomic data. It saves significant storage and compute power by avoiding the need to download whole files.
Most human-genome datasets have been generated in research, but now the vast majority are expected to be generated by healthcare. That is a paradigm shift for human genetics research.
"We have a responsibility to enable this future for everyone."
-Ewan Birney, GA4GH Chair
GA4GH Driver Projects are established international genomic data initiatives. They were selected to identify, develop and pilot data-sharing frameworks and standards in real-world settings. This ensures GA4GH efforts are rooted in the immediate, practical needs of the scientific and clinical communities.
One driver project is enabling a ‘federated’ approach to data sharing in healthcare. It focuses on interoperability between databases through open, standardised formats and APIs. It combines the efforts of three public data resources at EMBL-EBI:
These services already provide the infrastructure for sharing genetic data in research. Now, they are helping GA4GH create practical data-sharing standards for clinical research and healthcare around the world.
“Now, genomic data for healthcare is warehoused in many locations, in many countries. Our goal is to connect them, kind of like a postal service between data infrastructures – a secure, consistent delivery system that gets the right data to the right recipient,” says Thomas Keane, Team Leader for EGA infrastructure at EMBL-EBI and co-lead of the GA4GH Large Scale Genomics Work Stream.
EMBL-EBI provides essential services to the scientific community by providing open data and tools for analysis. Its involvement in GA4GH will ensure it can address the needs of new end-users, and adapt its service delivery in response to the changing needs of clinical research communities.
“Healthcare is harnessing the power of genomics to make better diagnoses and treatment decisions in rare disease and cancer across the world,” says Birney. “We have a responsibility to enable this future for everyone, and to harness the resulting data for further research on human health and fundamental biology.”
Read the GA4GH Roadmap here.
Birney E, Vamathevan J, Goodhand P (2017) Genomics in healthcare: GA4GH looks to 2022. bioRxiv (preprint). Published online 15 October; doi: 10.1101/203554
If you are interested in getting involved, start by looking through the GA4GH Work Streams, which provide guidance to GA4GH projects for regulatory and ethics, data security and standards development in genomics.