Life-science data: a global responsibility
Life-science data: a global responsibility
28 Mar 2017 - 17:10
"If we want science to continue advancing as swiftly as it has over the past decade, we will need a reliable, supported, core set of resources that operate worldwide." - Rolf Apweiler, Director of EMBL-EBI
“The long-term sustainability of publicly funded data resources has always been a challenge – even as they’ve become increasingly central to the way we do research,” says Rolf Apweiler, Director of the European Bioinformatics Institute (EMBL-EBI). “Over the years, investment has increased, but the user base is growing at a much faster rate. If we want science to continue advancing as swiftly as it has over the past decade, we will need a reliable, supported, core set of resources that operate worldwide.”
Apweiler is referring to the first meeting of the Global Life Science Data Resources Working Group on 18 November 2016 in Strasbourg, France, where he met with Eric Green of the National Human Genome Institute (NHGRI) in the US, Niklas Blomberg of ELIXIR Europe, Paul Lasko of the Canadian Institutes of Health Research, Warwick Anderson of the Human Frontier Science Program and many other international stakeholders. The meeting, which prompted a Commentary in Nature, also counted representatives of the European Commission, the UK’s Biotechnology and Biological Sciences Research Council (BBSRC) and other funders.
“A relatively small number of funders have been putting their weight behind research infrastructures that benefit scientists all over the world,” says Apweiler. “As researchers from more and more countries are accessing publicly available data, we are looking to engage funders in regions where public data resources are heavily used, but investment is missing.”
A global model for open data
ELIXIR provides an excellent model for how national research infrastructures can work together and transform into a united European operation, but its design and implementation was no easy task. Because of the wide-ranging diversity of life sciences, it took close to a decade of planning and consensus building to establish the permanent structure that now allows countries to fund bioinformatics infrastructures jointly, across boundaries.
“ELIXIR is a sound model, but there are many data infrastructures operating outside Europe we can also learn from,” explains Apweiler.
“Funding schemes in high-usage countries such as China are fundamentally different to those of Europe’s national and private funding agencies, and work to shorter timescales. Ideally, we would have funding agencies from the more robust economies working together on finding ways to finance this in a federated, distributed way, as ELIXIR and other European Research Infrastructures do.”
How would it work?
The global coalition would be dedicated to sustaining core data resources for the scientific community. How it plans to do so will emerge over the coming year.
“We have this diverse, distributed data infrastructure to fund, which is a much more complicated task than financing one big machine,” explains Apweiler. “We are essentially proposing to put all the ‘core’ data resources into one pot, so we can focus our support more efficiently. There are many ways to do this, with countries either contributing funding or building capacities to do their share of work – or perhaps a combination.”
It’s too early to tell what a truly international funding model would look like for a wider data infrastructure, but the coalition is not short on experience.
For example the European Nucleotide Archive (ENA), the DNA Data Bank of Japan and GenBank in the US all share the load of funding raw nucleotide sequence data, but each maintain their own portals. Similarly, the International Molecular Exchange (IMEx) Consortium partners share the work of curating molecular interaction records to a standard, but each provider is responsible for maintaining a separate resource.
What’s in the pot?
Agreeing to a funding model is one thing – agreeing on what to fund is a bit more complicated. Members with very different interests will need to decide howto select the most useful resources, how to evaluate their impact and how often to review their status.
“We want to ensure scientists can always ask the questions they want. We also want to collaborate with scientific communities to build the tools and solutions they need. The core resources should deliver a solid foundation, so any researcher can build from a strong starting point,” says Apweiler.
If usage and impact were the only criteria, selecting core public data resources would be fairly straightforward, as a small number of them get the most use by a wide margin.
“We can select, right now, the resources that are most heavily used and return the most impact – but what is important today won’t necessarily be important tomorrow,” says Apweiler. “If we want to support future research, we have to plan beyond what’s popular now and structure the way we evaluate and refine that selection of services and tools available.”
The Working Group recommends using a broad set of indicators for selecting resources, and to ensure they are transparent.
“We have faced similar challenges when developing selection criteria for the ELIXIR Core Data Resources,” says Niklas Blomberg, ELIXIR Director. “Our indicators comprise scientific quality, usage and impact indicators, as well as resource governance and management aspects. It is crucial that evaluation is ongoing, so we can react to new scientific developments. Identifying Core Data Resources provides a solid basis with which to begin discussions with funders.”
A follow-up meeting of the Working Group, on 5-6 June 2017 in London, will include a broader range of funders, and will focus on agreeing basic principles and a way towards an implementation plan.
“If we work with funders, get scientists engaged and build consensus that supporting infrastructure must be an ongoing commitment, that would be a great step forward,” concludes Apweiler.
Anderson W, Apweiler R, Bateman A, et al. (2017) Towards coordinated international support of core data resources for the life sciences. BioRxiv, published online 27 February. DOI: 10.1101/110825 [link: http://biorxiv.org/content/early/2017/02/27/110825]
Anderson W (2017) Data management: A global coalition to sustain core data. Nature 543:179; published online 09 March 2017. DOI: 10.1038/543179a [link: http://www.nature.com/nature/journal/v543/n7644/full/543179a.html]
Durinx C, McEntyre J, Appel R et al. Identifying ELIXIR Core Data Resources. F1000Research 2016, 5(ELIXIR):2422
News item from ELIXIR Europe: https://www.elixir-europe.org/news/elixir-among-signatories-call-action-...
News item from the Global Alliance for Genomics and Health: https://genomicsandhealth.org/news-blog/ga4gh-and-others-call-internatio...
More about research infrastructures
ELIXIR: a pan-European research infrastructure for biological information [link: http://www.elixir-europe.org]
The Global Alliance for Genomics and Health [http://genomicsandhealth.org/about-global-alliance]
European Strategy Forum on Research Infrastructures (ESFRI) [http://ec.europa.eu/research/infrastructures/index_en.cfm?pg=esfri]