Distinguishing coronavirus genome mutations from inadvertent errors

Artist's impression of the research. Credit: Holly Joynes, Rayne Zaayman-Gallant/EMBL; Adobe Stock

Distinguishing coronavirus genome mutations from inadvertent errors

26 May 2020 - 00:00

Summary 

  • When viruses produce copies of their genomes inside host cells, mutations can occur
  • Researchers in the Goldman group at EMBL-EBI analysed thousands of genome sequences from the novel coronavirus and found that many of the mutations that have been reported so far are likely to be technical artefacts rather than biological mutations
  • It is important to identify and catalogue mutations to better understand how viruses spread and evolve over time

When viruses produce copies of their genomes inside host cells, mutations – changes in their genome sequence – can occur. Mutations can affect the way viruses infect cells and replicate within them. They can lead to subtle changes in viral proteins, which can prevent existing antibodies in the immune system from recognising the virus. Mutations can also reduce the efficiency of antiviral treatments. It’s important to identify and catalogue mutations, to better understand how viruses – such as the SARS-CoV-2 coronavirus that causes COVID-19 – spread and evolve over time. When scientists sequence a virus’s genome and it seems to show changes, these could be the result of actual mutations – or they could be due to inadvertent errors during the experiment, called technical artefacts. Such artefacts can be caused by different ways of preparing virus samples for analysis, of determining their genomic sequences, and of analysing the data.

Scientists working in the Goldman Group at EMBL’s European Bioinformatics Institute (EMBL-EBI), together with colleagues in Vienna and Cambridge, systematically analysed over 4700 SARS-CoV-2 genome sequences from laboratories all over the world. They found that many of the most interesting changes in the SARS-CoV-2 genome that have been reported so far are likely to be technical artefacts, rather than biological mutations. Some changes were observed only in genome sequences that were reported by some laboratories, indicating that specific combinations of sample handling procedures, sequencing technology, and data analysis can cause recurrent errors. When the same mistake happens repeatedly in one lab, it can make it seem that the viruses studied there share an evolutionary origin that might not be real, or that the same mutation is happening repeatedly in just one part of the world.

Based on their analysis, the EMBL scientists and their colleagues developed a set of recommendations for filtering and masking specific parts of the SARS-CoV-2 genome when analysing sequence data. These recommendations, which they hope will be further refined by the research community as more information is shared, will help other researchers to interpret SARS-CoV-2 genome sequences and avoid potential pitfalls.This will ensure the mutations they identify are real, helping to drive forward coronavirus research.

Source article

DE MAIO et al. (2020). Issues with SARS-CoV-2 sequencing data. Virological.org. Published online 5 May.

Contact the news team

Oana Stroe
Senior Communications Officer
stroe@ebi.ac.uk
+44 (0)1223 494 369

Subscribe to the email newsletter

Subscribe to our publications.

Sign up Or stay updated with the RSS feed (EMBL-EBI only).