Summary of EMBL-EBI COVID-19 Action Plan

The world is facing the worst public health crisis with all its consequences for society since many decades. To combat COVID-19 we need to intensify research efforts enabling the scientific community to understand the biology, epidemiology, transmission and evolution of the virus responsible for the outbreak, SARS-CoV-2.

EMBL’s European Bioinformatics Institute (EMBL-EBI) has recognised the urgency to respond to needs by creating a European COVID-19 Data Platform for data/information exchange, connected to the European Open Science Cloud (EOSC).

The goal is to collect and share rapidly available research data from different sources and of different types to enable synergies, cross-fertilisation and use of diverse data sets with different degrees of aggregation, validation and/or completeness so they can be accessed by the research community.

We envisage the European COVID-19 Data Platform to consist of two connected components, the SARS-CoV-2 Data Hubs organising the flow of SARS-CoV-2 outbreak sequence data and providing comprehensive open data sharing for the European and global research communities, and one broader COVID-19 Portal.

For rapid launch, we will assemble, extend and enhance existing elements of informatics infrastructure leveraging initially the key strengths of EMBL-EBI. These strengths are based upon molecular biology data infrastructure and services that EMBL-EBI provides, and its unique position in offering connectivity with national public health data infrastructures, to the EOSC and relevant European Research Infrastructures and research projects, as well as International and National Research organisations.

The SARS-CoV-2 Data Hubs will form one of the two components of the European COVID-19 Data Platform. These will be built upon existing EMBL-EBI infrastructure, and we will offer SARS-CoV-2 Data Hubs to those public health agencies and other scientific groups responsible for generating sequence from the virus at national or regional levels. It is expected that there will ultimately be numerous SARS-CoV-2 Data Hubs with similar configurations in terms of analysis, but different preferred submission tools.

Sequence data in the SARS-CoV-2 Data Hubs will be highly contextualised. Essential metadata, such as sampling tracking identifiers, sampling time, geographical location, method of sampling, health status of host and sequencing platform/strategy, will be captured alongside sequence data. The SARS-CoV-2 Data Hubs will also provide systematic data processing and analysis, visualisation and phylogenetic analysis tools.

We are currently in discussions with EMBL Member States on the SARS-CoV-2 Data Hubs and are expecting to launch the first of these new Data Hubs in the coming days.

As the second component of the European COVID-19 Data Platform, EMBL-EBI plans to launch a COVID-19 Portal to provide the primary entry point into the functions of the European COVID-19 Data Platform and the data and tools that it makes available. To rapidly populate the COVID-19 Portal and make it immediately useful we will bring together and continuously update in a first step all relevant COVID-19 datasets from EMBL-EBI data resources.

This will be followed by enriching the COVID-19 Portal with datasets and tools from EU projects, in which EMBL-EBI is a partner, and then through our ELIXIR network and last but not least, additional European stakeholders and on leveraging our international connections with major bioinformatics data and service providers across the globe.