Ensembl launches COVID-19 resource

Ensembl logo on bluebackground

Ensembl launches COVID-19 resource

18 May 2020 - 12:56

Today, Ensembl has joined the international scientific effort to tackle the COVID-19 pandemic. COVID-19 is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which has spread rapidly since emerging in late 2019. Ensembl's SARS-CoV-2 genome browser and related resources at covid-19.ensembl.org are intended to support both basic research and ongoing work to develop treatments, diagnostics and vaccines.

This initial release of the new Ensembl resource includes the following data:

  • Gene annotation of the reference genome ‘Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1’ (MN908947.3) using a modified Ensembl genebuild supported by protein evidence
  • Gene annotation from the Shanghai Public Health Clinical Center & School of Public Health, Fudan University, Shanghai, China, via ENA
  • Information on gene functions from Gene Ontology
  • Variation data from Nextstrain with consequence predictions and nucleotide frequencies for individual isolates, as well as grouped by clades and countries
  • Problematic variant sites as defined by De Maio et al
  • Protein and genomic features from InterProScan, which include SARS-CoV-2-oriented annotations from Pfam as well as annotations from other protein annotation resources (including Superfamily, SMART and Gene3D)
  • Links to RefSeq peptides, UniProt and INSDC proteins, PDBe protein structures
  • Links to the genome sequence in ENA and NCBI genes
  • Alignments of Rfam covariance models to the genome
  • UCSC community annotation tracks displaying annotations made via a public spreadsheet available here. Anyone can contribute freely to the spreadsheet.

Ensembl will continue to add new data and expand this resource in future releases. Like for all Ensembl data, there are no restrictions on the use of its COVID-19 resources. Users can download sequences, pictures and tables via the browser. They can also add further data by attaching custom tracks. Whole genome databases can be downloaded from Ensembl's FTP server.

In addition, the GENCODE project is updating the annotation of human protein-coding genes linked to COVID-19. See this blog post for details and how to access the data in Ensembl.

Ensembl's new COVID-19 resource is part of a wider effort at EMBL-EBI to advance SARS-CoV-2 research by open data sharing through the COVID-19 Data Portal.

The Ensembl project launched in 1999 to annotate the human genome. It has since broadened its scope into a comprehensive genomics resource that now includes model organisms and many vertebrates as well as bacteria, protists, invertebrate metazoa, plants and fungi available across the Ensembl and Ensembl Genomes websites. SARS-CoV-2 is the first virus Ensembl has added to their growing resource of more than 45,000 genomes. The team hopes this will be useful and welcomes your thoughts on it. Please email Ensembl to give feedback.

Subscribe to the e-mail newsletter
Get a monthly round-up of the hottest news and features from EMBL, straight to your inbox.
Or stay updated with the RSS feed (EMBL-EBI only).