SARS-CoV-2 open access data sharing through INSDC databases

Artist's impression of COVID-19 data sharing. Credit: Spencer Phillips

SARS-CoV-2 open access data sharing through INSDC databases

14 Aug 2020 - 10:15

Summary 

  • The INSDC databases allow open access sharing of SARS-CoV-2 sequence data
  • Open access to SARS-CoV-2 sequence data is vital for the international scientific community to understand the disease and develop treatments and a vaccine
  • INSDC has released a public statement asking researchers to share SARS-CoV-2 sequence data in its open databases

17 August 2020, Cambridge The International Nucleotide Sequence Database Collaboration (INSDC) has released a public statement entitled ‘INSDC Statement on SARS-CoV-2 sequence data sharing during COVID-19’, which highlights the importance of sharing SARS-CoV-2 sequence data within the international scientific community.

Open access and rapid international sharing of SARS-CoV-2 data are essential for our understanding of the biology and spread of COVID-19. This will help the international researcher community collaborate to understand the disease, and develop treatments and a vaccine. 

The INSDC has issued advice on this matter, demonstrating how researchers can submit their SARS-CoV-2 sequence data to INSDC databases and the benefits of using these resources. 

The advice issued by INSDC is also supported by the INSDC member institutions – the EMBL’s European Bioinformatics Institute (EMBL-EBI), the NIG DNA Data Bank of Japan (NIG-DDBJ) and the National Library of Medicine’s National Center for Biotechnology Information at NIH (NCBI).

Using INSDC databases

The INSDC recommends that all researchers working with SARS-CoV-2 sequence data submit both their raw and consensus – or assembled – SARS-CoV-2 data to the INSDC databases, which are freely available to the scientific community. This should be done in parallel with any submissions already accepted by other databases collecting SARS-CoV-2 data.

“The COVID-19 pandemic has made international data sharing more important than ever before,” says Guy Cochrane, Head of European Nucleotide Archive at EMBL-EBI. “For this to happen effectively, it is critical that researchers submit their data – both raw and assembled – to the open access INSDC databases alongside any submissions to other databases, such as GISAID.”

Why INSDC? 

All data submitted to the INSDC databases are rapidly and freely available to everyone. Raw sequence data submitted to the databases are linked to genome assemblies allowing researchers to easily monitor changes in the viral genome. It is also simple to use INSDC databases to compare SARS-CoV-2 sequences with those from other viral species to help further understand virus evolution. 

In addition to this, the INSDC member institutions have used these open access SARS-CoV-2 sequence data to develop other COVID-19 data resources. These are the European COVID-19 Data Portal from EMBL-EBI, the DDBJ’s Research Data Resources on New Coronavirus and the NCBI SARS-CoV-2 Resources. Continued submission of SARS-CoV-2 sequence data to the INSDC databases will help develop these bioinformatics initiatives and further support global COVID-19 research.

Read the full statement below

INSDC Statement on SARS-CoV-2 sequence data sharing during COVID-19

Guy Cochrane (EMBL-EBI), Ilene Karsch-Mizrachi (NCBI-NLM-NIH) and Masanori Arita (DDBJ) on behalf of the International Nucleotide Sequence Database Collaboration

The databases of the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) capture, organise, preserve and present nucleotide sequence data as part of the open scientific record. INSDC member institutions – the EMBL European Bioinformatics Institute (EMBL-EBI), the NIG DNA Data Bank of Japan (NIG-DDBJ) and the National Library of Medicine’s National Center for Biotechnology Information at NIH (NCBI) – are committed to the continued delivery of this critical element of scientific infrastructure.

The global COVID-19 crisis has brought an urgent need for the rapid open sharing of data relating to the outbreak. Most importantly, access to sequence data from the SARS-CoV-2 viral genome is essential for our understanding of the biology and spread of COVID-19. To aid in that effort, all three INSDC members have prioritized processing of SARS-CoV-2 sequence data and have streamlined the submission process.

Availability of data through INSDC databases provides:

  • Rapid open access – INSDC quickly makes submitted data freely available to everyone, without restrictions on reuse
  • Linkage of raw sequence read data to genome assemblies, providing researchers with the ability to validate the integrity of assemblies and investigate asserted mutations and changes in genome sequences
  • Integration of SARS-CoV-2 sequences with entirety of INSDC data, including related coronaviruses genome sequences, enabling comparison across species
  • Linkage of sequences to the published literature, enhancing the discovery process
  • Integrated data analysis tools, such as BLAST, to further understanding of the virus

In support of the global response to the COVID-19 crisis, the INSDC calls upon the research community to:

  • Submit raw SARS-CoV-2 data to the databases of the INSDC
  • Submit consensus/assembled SARS-CoV-2 data to the databases of the INSDC
  • Provide information relating to the sequenced isolate or sample as part of the sequence submission; minimally the time and place of isolation/sampling and an isolate/sample identifier should be provided to maximise the value of the sequences.
  • In cases where scientists have already established submissions to other databases, these submissions should continue in parallel to the INSDC submission

The integration of INSDC databases with the global bioinformatics data infrastructure, including tools, secondary databases, compute capacity and curation processes, assures the rapid dissemination of data and drives its maximal impact.

In addition to these fundamental roles of INSDC member institutions in the sharing of viral sequence data, each institution has rapidly established COVID-19-specific programmes and resources: the European COVID-19 Data Platform from EMBL-EBI, the DDBJ’s Research Data Resources on New Coronavirus and the NCBI SARS-CoV-2 Resources. These resources both demonstrate the connectedness of INSDC databases to broader bioinformatics initiatives and serve to add immediate value to COVID-19 research.

Contact the news team

Oana Stroe
Senior Communications Officer
stroe@ebi.ac.uk
+44 (0)1223 494 369

Subscribe to the email newsletter

Subscribe to our publications.

Sign up Or stay updated with the RSS feed (EMBL-EBI only).