Virtual course

CABANA Virtual Workshop: Innovative methods for viral detection and discovery in genomic and metagenomic data

Welcome to the new EMBL-EBI Training site. Please tell us what you think!

​​Emerging viruses can cause new diseases in humans and animals and proper epidemiological surveillance is essential to detect and characterize such viruses. Given the enormous impact of the COVID-19 pandemics, this course will cover the assembly of the SARS-CoV-2 genome from human patients, characterization of viral variants, and identification of variants of concern (VOCs). The course will also present some innovative methods that can improve the detection of evolutionarily remote viruses. Among other methods, the course will cover the construction of profile HMMs and their application to screening metagenomic datasets for both known and novel viruses.

 

 

Virtual machines:

  • All students will receive an account in a virtual machine (VM) 
  • The VMs will be running Linux and all programs used throughout the course will be available to students
  • Access to the VMs will be granted throughout the entire duration of the course. 
  • No programs are needed on the local computer, except Zoom

Who is this course for?

This course is intended for graduate students, postdocs and young researchers working in the fields of metagenomics and viral discovery in the CABANA grand challenge areas of communicable diseases, protection of biodiversity, and/or sustainable crop production. 

Applicants must be employed within Latin America only. Additional we cannot accept applications from Chile or Uruguay due to funding restrictions.

 

Prerequisites

Please note this course will be taught in English, however the trainers are fluent in either Spanish/Portuguese, and can offer language support where feasible. Priority will also be given to those who have not attended a CABANA event yet.

Students should be familiar with using the Linux command line. As the course will be held remotely, all students must have Zoom previously installed on their computers. Also, as classes will be held synchronously, a good Internet connection is mandatory.

A knowledge of virology, especially from previous research experience, is also desirable.

 

Scientists from underrepresented ethnic and gender groups are especially encouraged to apply for this workshop, for example women and those with Black and/or Indigenous heritage.

 

What will I learn?

Learning outcomes

After this course you should be able to:  

  • Perform SARS-CoV-2 genome assemblies.
  • Run and interpret phylogenetic analyses using viral sequence data. 
  • Detect Variants of Concern.
  • Identify recombination events in SARS-CoV-2.
  • Design and apply profile HMMs for viral detection, classification and discovery.
Course content

During this course you will learn about: 

  • SARS-CoV-2: Pangolin and GISAID repositories, IQ-TREE, PhyML, Beast, RDP, and Simplot.
  • Viral discovery: TABAJARA (profile HMM construction), HMM-Prospector (metagenomic data screening), GenSeed-HMM (seed-driven progressive assembly) and e-Finder (multigene element finder).
  • EMBL-EBI resources including COVID-19 Data Portal and MGnify.

Trainers

Arthur Gruber
Institute of Biomedical Sciences, USP, Brazil
Liliane S. Oliveira Kashiwabara
UFTPR/EMBRAPA, Brazil
Felipe Naveca
FIOCRUZ Amazonas, Brazil
Guillermo Rangel-Pineros
Uni. Copenhagen, Denmark
Nadim Mahdi Rahman
EMBL-EBI
Piv Gopalasingam
EMBL-EBI
Robson Francisco de Souza
Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
Renato Oliveira
ITV, Belem, Brazil
Tulio de Lima Campos
Bioinformatics Core Facility, Instituto Aggeu Magalhães, IAM-Fiocruz, Recife, Pernambuco, Brazil
Antonio Marinho da Silva Neto
Laboratório de Imunopatologia Keizo Asami, Universidade Federal de Pernambuco, Pernambuco, Brazil

Programme

Time

Subject

Trainer

Activity

Day 1 – Monday 8th November 2021

08:30-9:00

Welcome announcements

Arthur Gruber / Felipe Naveca

 

9:00-10:15

SARS-CoV-2 genome sequencing and other viruses

Guilherme Oliveira / Renato Oliveira

Theoretical class

10:15-10:45

Coffee break, open time for discussion with instructors

   

10:45-12:00

The COVID-19 Data Portal

Nadim Rahman

Theoretical and Practical class

12:00-12:45 Introduction to EMBL-EBI, resources, services and tools Piv Gopalasingam Theoretical class

12:45-14:00

Lunch time

   

14:00-15:45

Using the Mgnify microbiome analysis resource 

Guillermo Rangel-Pineros

Theoretical and practical class

15:45-16:00

Coffee break, open time for discussion with instructors

   

16:00-17:30

Brainstorming: limitations and challenges for the use of bioinformatic tools for viral discovery

 

Arthur Gruber

Theoretical and practical class

17:30-18:00

Questions & Answers

   

18:00

End of day 1

   

Day 2 - Tuesday 9th November 2021

08:30-09:00

Open time for discussion with instructors

   

09:00-10:15

Challenges for viral detection and discovery

Arthur Gruber

Theoretical class

10:15-10:45

Coffee break, open time for discussion with instructors

   

10:45-12:00

Rational design of profile HMMs

Liliane S.O. Kashiwabara

Theoretical class

12:00-12:30

Questions & Answers

   

12:30-14:00

Lunch time

   

14:00-15:45

Viral profile HMM construction

Arthur Gruber / Liliane S.O. Kashiwabara / Robson F. Souza

Theoretical and practical class

15:45-16:00

Coffee break, open time for discussion with instructors

   

16:00-17:30

Screening datasets using profile HMMs

Arthur Gruber / Liliane S.O. Kashiwabara / Robson F. Souza

Theoretical and practical class

17:30-18:00

Questions & Answers

   

18:00

End of day 2

   

Day 3 - Wednesday 10th November 2021

08:30-09:00

Open time for discussion with instructors

   

09:00-10:15

Databases of viral profile HMMs. Seed-driven progressive assembly

Arthur Gruber

Theoretical class

10:15-10:45

Coffee break, open time for discussion with instructors

   

10:45-12:00

Finding multigene elements with profile HMMs. Viral discovery and classification using profile HMMs

Arthur Gruber

Theoretical class

12:00-12:30

Questions & Answers

   

12:30-14:00

Lunch time

   

14:00-15:45

Progressive assembly using profile HMMs as seeds

Arthur Gruber / Liliane S.O. Kashiwabara / Robson F. Souza

Theoretical and practical class

15:45-16:00

Coffee break, open time for discussion with instructors

   

16:00-17:30

Finding proviruses in bacterial genomes with pHMMs 

Arthur Gruber / Liliane S.O. Kashiwabara / Robson F. Souza

Theoretical and practical class

17:30-18:00

Questions & Answers

   

18:00

End of day 3

   

Day 4 - Thursday 11th November 2021

08:30-09:00

Open time for discussion with instructors

   

09:00-10:15

SARS-CoV-2: genome sequencing and variant detection

Tulio de Lima Campos

Theoretical class

10:15-10:45

Coffee break, open time for discussion with instructors

   

10:45-12:00

SARS-CoV-2: genome sequencing and variant detection

Tulio de Lima Campos

Theoretical class

12:00-12:30

Questions & Answers

   

12:30-14:00

Lunch time

   

14:00-15:45

ViralFlow: a hands-on tutorial of the SARS-CoV-2 genome sequencing and variant detection pipeline of FIOCRUZ genomic surveillance network

Antonio Marinho da Silva Neto

Theoretical and practical class

15:45-16:00

Coffee break, open time for discussion with instructors

   

16:00-17:30

ViralFlow : a practial guide to SARS-CoV-2 genome sequencing and variant detection of FIOCRUZ genomic survaillance 

Antonio Marinho da Silva Neto

Theoretical and practical class

17:30-18:00

Questions & Answers

   

18:00

End of day 4

   

Day 5 - Friday 12th November 2021

08:30-09:00

Open time for discussion with instructors

   

09:00-10:15

Introduction to Viral Phylogenomic Analysis

Felipe Naveca

Theoretical class

10:15-10:45

Coffee break, open time for discussion with instructors

   

10:45-12:00

Paper presentation: COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence

Felipe Naveca

Theoretical class

12:00-12:30

Questions & Answers

   

12:30-14:00

Lunch time

   

14:00-15:45

1) Running ML phylogenetic analysis with IQ-TREE
2) Initial Temporal Signal Data exploration with TempEst
3) Estimating Evolutionary Rates and Dates from Viral Sequences

Felipe Naveca

Theoretical and practical class

15:45-16:00

Coffee break, open time for discussion with instructors

   

16:00-17:30

4) Visualizing, analyzing, and summarizing BEAST output
5) Editing phylogenetic trees

Felipe Naveca

Theoretical and practical class

17:30-18:00

Questions & Answers

   

18:00

End of course

   

In order to be considered for a place on this course applicants must complete the online application form.

Incomplete applications will NOT be considered. 

If you have any general queries about the workshop application/registration process please email Guilherme Oliveira and Piv Gopalasingam.

 

For specific workshop enquiries please email Arthur Gruber 

 

Please note that this course is free, but unexplained absence will result in blacklisting for future courses and opportunities.

 

The course will have a maximum of 25 participants and the application will run from October 1st to 31st, 2021. Registration will be dependent on selection upon successful completion of the application process by order of arrival.

 

Application/Registration will close on 31 October 2021 at 12:00 (GMT)

 

A bibliography for this course is available to view below. Articles can be accessed here.

Part 1 - Viral discovery 

Alves JM, de Oliveira AL, Sandberg TO, Moreno-Gallego JL, de Toledo MA, de Moura EM, Oliveira LS, Durham AM, Mehnert DU, Zanotto PM, Reyes A, Gruber A. GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data. Front Microbiol. 2016 Mar 4;7:269. doi: 10.3389/fmicb.2016.00269. PMID: 26973638; PMCID: PMC4777721.

Cobbin JC, Charon J, Harvey E, Holmes EC, Mahar JE. Current challenges to virus discovery by meta-transcriptomics. Curr Opin Virol. 2021 Sep 27;51:48-55. doi: 10.1016/j.coviro.2021.09.007. Epub ahead of print. PMID: 34592710.

Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14(9):755-63. doi: 10.1093/bioinformatics/14.9.755. PMID: 9918945.

Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20. PMID: 22039361; PMCID: PMC3197634.

Fonseca P, Ferreira F, da Silva F, Oliveira LS, Marques JT, Goes-Neto A, Aguiar E, Gruber A. Characterization of a Novel Mitovirus of the Sand Fly Lutzomyia longipalpis Using Genomic and Virus-Host Interaction Signatures. Viruses. 2020 Dec 23;13(1):9. doi: 10.3390/v13010009. PMID: 33374584; PMCID: PMC7822452.

Oliveira LS, Gruber A. Rational Design of Profile Hidden Markov Models for Viral Classification and Discovery. In: Helder I. N, editor. Bioinformatics [Internet]. Brisbane (AU): Exon Publications; 2021 Mar 20. Chapter 9. PMID: 33877768.

Reyes, A, Alves JM, Durham AM, Gruber A. (2017). Use of profile hidden Markov models in viral discovery: current insights. Advances in Genomics and Genetics 7:29-45. https://doi.org/10.2147/AGG.S136574

 

Part 2 - SARS-CoV-2 

Dezordi FP, Campos TL, Jeronimo PMC, Aksenen CF, Almeida SP, Wallau GL. ViralFlow: an automated workflow for SARS-CoV-2 genome assembly, lineage assignment, mutations and intrahost variants detection. medRxiv 2021.10.01.21264424; doi: https://doi.org/10.1101/2021.10.01.21264424

Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017 Jun;14(6):587-589. doi: 10.1038/nmeth.4285. Epub 2017 May 8. PMID: 28481363; PMCID: PMC5453245.

Naveca FG et al. COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence. Nat Med. 2021 Jul;27(7):1230-1238. doi: 10.1038/s41591-021-01378-7. Epub 2021 May 25. PMID: 34035535.

Resende PC et al. A Potential SARS-CoV-2 Variant of Interest (VOI) Harboring Mutation E484K in the Spike Protein Was Identified within Lineage B.1.1.33 Circulating in Brazil. Viruses. 2021 Apr 21;13(5):724. doi: 10.3390/v13050724. PMID: 33919314; PMCID: PMC8143327.

Resende PC et al. The ongoing evolution of variants of concern and interest of SARS-CoV-2 in Brazil revealed by convergent indels in the amino (N)-terminal domain of the spike protein. Virus Evol. 2021 Aug 14;7(2):veab069. doi: 10.1093/ve/veab069. PMID: 34532067; PMCID: PMC8438916.

Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018 Jun 8;4(1):vey016. doi: 10.1093/ve/vey016. PMID: 29942656; PMCID: PMC6007674.

This course has ended

08 - 12 November 2021
Online
Free - $0
Contact
Arthur Gruber

Organisers
  • Arthur Gruber
    Institute of Biomedical Sciences, USP, Brazil
  • Felipe Naveca
    FIOCRUZ Amazonas, Brazil
  • Piv Gopalasingam
    EMBL-EBI
  • Guilherme Oliveira
    ITV, Belem, Brazil

In association with:


Share this event with: