Virtual course
CABANA Virtual Workshop: Innovative methods for viral detection and discovery in genomic and metagenomic data
Emerging viruses can cause new diseases in humans and animals and proper epidemiological surveillance is essential to detect and characterize such viruses. Given the enormous impact of the COVID-19 pandemics, this course will cover the assembly of the SARS-CoV-2 genome from human patients, characterization of viral variants, and identification of variants of concern (VOCs). The course will also present some innovative methods that can improve the detection of evolutionarily remote viruses. Among other methods, the course will cover the construction of profile HMMs and their application to screening metagenomic datasets for both known and novel viruses.
Virtual machines:
- All students will receive an account in a virtual machine (VM)
- The VMs will be running Linux and all programs used throughout the course will be available to students
- Access to the VMs will be granted throughout the entire duration of the course.
- No programs are needed on the local computer, except Zoom
Who is this course for?
This course is intended for graduate students, postdocs and young researchers working in the fields of metagenomics and viral discovery in the CABANA grand challenge areas of communicable diseases, protection of biodiversity, and/or sustainable crop production.
Applicants must be employed within Latin America only. Additional we cannot accept applications from Chile or Uruguay due to funding restrictions.
Prerequisites
Please note this course will be taught in English, however the trainers are fluent in either Spanish/Portuguese, and can offer language support where feasible. Priority will also be given to those who have not attended a CABANA event yet.
Students should be familiar with using the Linux command line. As the course will be held remotely, all students must have Zoom previously installed on their computers. Also, as classes will be held synchronously, a good Internet connection is mandatory.
A knowledge of virology, especially from previous research experience, is also desirable.
Scientists from underrepresented ethnic and gender groups are especially encouraged to apply for this workshop, for example women and those with Black and/or Indigenous heritage.
What will I learn?
Learning outcomes
After this course you should be able to:
- Perform SARS-CoV-2 genome assemblies.
- Run and interpret phylogenetic analyses using viral sequence data.
- Detect Variants of Concern.
- Identify recombination events in SARS-CoV-2.
- Design and apply profile HMMs for viral detection, classification and discovery.
Course content
During this course you will learn about:
- SARS-CoV-2: Pangolin and GISAID repositories, IQ-TREE, PhyML, Beast, RDP, and Simplot.
- Viral discovery: TABAJARA (profile HMM construction), HMM-Prospector (metagenomic data screening), GenSeed-HMM (seed-driven progressive assembly) and e-Finder (multigene element finder).
- EMBL-EBI resources including COVID-19 Data Portal and MGnify.
Trainers
Arthur Gruber
Institute of Biomedical Sciences, USP, Brazil Liliane S. Oliveira Kashiwabara
UFTPR/EMBRAPA, Brazil Felipe Naveca
FIOCRUZ Amazonas, Brazil Guillermo Rangel-Pineros
Uni. Copenhagen, Denmark Nadim Mahdi Rahman
EMBL-EBI Piv Gopalasingam
EMBL-EBI Robson Francisco de Souza
Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil Renato Oliveira
ITV, Belem, Brazil Tulio de Lima Campos
Bioinformatics Core Facility, Instituto Aggeu Magalhães, IAM-Fiocruz, Recife, Pernambuco, Brazil Antonio Marinho da Silva Neto
Laboratório de Imunopatologia Keizo Asami, Universidade Federal de Pernambuco, Pernambuco, Brazil
Programme
Time
Subject
Trainer
Activity
Day 1 – Monday 8th November 2021
08:30-9:00
Welcome announcements
Arthur Gruber / Felipe Naveca
9:00-10:15
SARS-CoV-2 genome sequencing and other viruses
Guilherme Oliveira / Renato Oliveira
Theoretical class
10:15-10:45
Coffee break, open time for discussion with instructors
10:45-12:00
The COVID-19 Data Portal
Nadim Rahman
Theoretical and Practical class
12:00-12:45
Introduction to EMBL-EBI, resources, services and tools
Piv Gopalasingam
Theoretical class
12:45-14:00
Lunch time
14:00-15:45
Using the Mgnify microbiome analysis resource
Guillermo Rangel-Pineros
Theoretical and practical class
15:45-16:00
Coffee break, open time for discussion with instructors
16:00-17:30
Brainstorming: limitations and challenges for the use of bioinformatic tools for viral discovery
Arthur Gruber
Theoretical and practical class
17:30-18:00
Questions & Answers
18:00
End of day 1
Day 2 - Tuesday 9th November 2021
08:30-09:00
Open time for discussion with instructors
09:00-10:15
Challenges for viral detection and discovery
Arthur Gruber
Theoretical class
10:15-10:45
Coffee break, open time for discussion with instructors
10:45-12:00
Rational design of profile HMMs
Liliane S.O. Kashiwabara
Theoretical class
12:00-12:30
Questions & Answers
12:30-14:00
Lunch time
14:00-15:45
Viral profile HMM construction
Arthur Gruber / Liliane S.O. Kashiwabara / Robson F. Souza
Theoretical and practical class
15:45-16:00
Coffee break, open time for discussion with instructors
16:00-17:30
Screening datasets using profile HMMs
Arthur Gruber / Liliane S.O. Kashiwabara / Robson F. Souza
Theoretical and practical class
17:30-18:00
Questions & Answers
18:00
End of day 2
Day 3 - Wednesday 10th November 2021
08:30-09:00
Open time for discussion with instructors
09:00-10:15
Databases of viral profile HMMs. Seed-driven progressive assembly
Arthur Gruber
Theoretical class
10:15-10:45
Coffee break, open time for discussion with instructors
10:45-12:00
Finding multigene elements with profile HMMs. Viral discovery and classification using profile HMMs
Arthur Gruber
Theoretical class
12:00-12:30
Questions & Answers
12:30-14:00
Lunch time
14:00-15:45
Progressive assembly using profile HMMs as seeds
Arthur Gruber / Liliane S.O. Kashiwabara / Robson F. Souza
Theoretical and practical class
15:45-16:00
Coffee break, open time for discussion with instructors
16:00-17:30
Finding proviruses in bacterial genomes with pHMMs
Arthur Gruber / Liliane S.O. Kashiwabara / Robson F. Souza
Theoretical and practical class
17:30-18:00
Questions & Answers
18:00
End of day 3
Day 4 - Thursday 11th November 2021
08:30-09:00
Open time for discussion with instructors
09:00-10:15
SARS-CoV-2: genome sequencing and variant detection
Tulio de Lima Campos
Theoretical class
10:15-10:45
Coffee break, open time for discussion with instructors
10:45-12:00
SARS-CoV-2: genome sequencing and variant detection
Tulio de Lima Campos
Theoretical class
12:00-12:30
Questions & Answers
12:30-14:00
Lunch time
14:00-15:45
ViralFlow: a hands-on tutorial of the SARS-CoV-2 genome sequencing and variant detection pipeline of FIOCRUZ genomic surveillance network
Antonio Marinho da Silva Neto
Theoretical and practical class
15:45-16:00
Coffee break, open time for discussion with instructors
16:00-17:30
ViralFlow : a practial guide to SARS-CoV-2 genome sequencing and variant detection of FIOCRUZ genomic survaillance
Antonio Marinho da Silva Neto
Theoretical and practical class
17:30-18:00
Questions & Answers
18:00
End of day 4
Day 5 - Friday 12th November 2021
08:30-09:00
Open time for discussion with instructors
09:00-10:15
Introduction to Viral Phylogenomic Analysis
Felipe Naveca
Theoretical class
10:15-10:45
Coffee break, open time for discussion with instructors
10:45-12:00
Paper presentation: COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence
Felipe Naveca
Theoretical class
12:00-12:30
Questions & Answers
12:30-14:00
Lunch time
14:00-15:45
1) Running ML phylogenetic analysis with IQ-TREE
2) Initial Temporal Signal Data exploration with TempEst
3) Estimating Evolutionary Rates and Dates from Viral Sequences
Felipe Naveca
Theoretical and practical class
15:45-16:00
Coffee break, open time for discussion with instructors
16:00-17:30
4) Visualizing, analyzing, and summarizing BEAST output
5) Editing phylogenetic trees
Felipe Naveca
Theoretical and practical class
17:30-18:00
Questions & Answers
18:00
End of course
In order to be considered for a place on this course applicants must complete the online application form.
Incomplete applications will NOT be considered.
If you have any general queries about the workshop application/registration process please email Guilherme Oliveira and Piv Gopalasingam.
For specific workshop enquiries please email Arthur Gruber
Please note that this course is free, but unexplained absence will result in blacklisting for future courses and opportunities.
The course will have a maximum of 25 participants and the application will run from October 1st to 31st, 2021. Registration will be dependent on selection upon successful completion of the application process by order of arrival.
Application/Registration will close on 31 October 2021 at 12:00 (GMT)
A bibliography for this course is available to view below. Articles can be accessed here.
Part 1 - Viral discovery
Alves JM, de Oliveira AL, Sandberg TO, Moreno-Gallego JL, de Toledo MA, de Moura EM, Oliveira LS, Durham AM, Mehnert DU, Zanotto PM, Reyes A, Gruber A. GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data. Front Microbiol. 2016 Mar 4;7:269. doi: 10.3389/fmicb.2016.00269. PMID: 26973638; PMCID: PMC4777721.
Cobbin JC, Charon J, Harvey E, Holmes EC, Mahar JE. Current challenges to virus discovery by meta-transcriptomics. Curr Opin Virol. 2021 Sep 27;51:48-55. doi: 10.1016/j.coviro.2021.09.007. Epub ahead of print. PMID: 34592710.
Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14(9):755-63. doi: 10.1093/bioinformatics/14.9.755. PMID: 9918945.
Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20. PMID: 22039361; PMCID: PMC3197634.
Fonseca P, Ferreira F, da Silva F, Oliveira LS, Marques JT, Goes-Neto A, Aguiar E, Gruber A. Characterization of a Novel Mitovirus of the Sand Fly Lutzomyia longipalpis Using Genomic and Virus-Host Interaction Signatures. Viruses. 2020 Dec 23;13(1):9. doi: 10.3390/v13010009. PMID: 33374584; PMCID: PMC7822452.
Oliveira LS, Gruber A. Rational Design of Profile Hidden Markov Models for Viral Classification and Discovery. In: Helder I. N, editor. Bioinformatics [Internet]. Brisbane (AU): Exon Publications; 2021 Mar 20. Chapter 9. PMID: 33877768.
Reyes, A, Alves JM, Durham AM, Gruber A. (2017). Use of profile hidden Markov models in viral discovery: current insights. Advances in Genomics and Genetics 7:29-45. https://doi.org/10.2147/AGG.S136574
Part 2 - SARS-CoV-2
Dezordi FP, Campos TL, Jeronimo PMC, Aksenen CF, Almeida SP, Wallau GL. ViralFlow: an automated workflow for SARS-CoV-2 genome assembly, lineage assignment, mutations and intrahost variants detection. medRxiv 2021.10.01.21264424; doi: https://doi.org/10.1101/2021.10.01.21264424
Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017 Jun;14(6):587-589. doi: 10.1038/nmeth.4285. Epub 2017 May 8. PMID: 28481363; PMCID: PMC5453245.
Naveca FG et al. COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence. Nat Med. 2021 Jul;27(7):1230-1238. doi: 10.1038/s41591-021-01378-7. Epub 2021 May 25. PMID: 34035535.
Resende PC et al. A Potential SARS-CoV-2 Variant of Interest (VOI) Harboring Mutation E484K in the Spike Protein Was Identified within Lineage B.1.1.33 Circulating in Brazil. Viruses. 2021 Apr 21;13(5):724. doi: 10.3390/v13050724. PMID: 33919314; PMCID: PMC8143327.
Resende PC et al. The ongoing evolution of variants of concern and interest of SARS-CoV-2 in Brazil revealed by convergent indels in the amino (N)-terminal domain of the spike protein. Virus Evol. 2021 Aug 14;7(2):veab069. doi: 10.1093/ve/veab069. PMID: 34532067; PMCID: PMC8438916.
Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018 Jun 8;4(1):vey016. doi: 10.1093/ve/vey016. PMID: 29942656; PMCID: PMC6007674.
Institute of Biomedical Sciences, USP, Brazil
UFTPR/EMBRAPA, Brazil
FIOCRUZ Amazonas, Brazil
Uni. Copenhagen, Denmark
EMBL-EBI
EMBL-EBI
Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
ITV, Belem, Brazil
Bioinformatics Core Facility, Instituto Aggeu Magalhães, IAM-Fiocruz, Recife, Pernambuco, Brazil
Laboratório de Imunopatologia Keizo Asami, Universidade Federal de Pernambuco, Pernambuco, Brazil
Programme
Time |
Subject |
Trainer |
Activity |
Day 1 – Monday 8th November 2021 |
|||
08:30-9:00 |
Welcome announcements |
Arthur Gruber / Felipe Naveca |
|
9:00-10:15 |
SARS-CoV-2 genome sequencing and other viruses |
Guilherme Oliveira / Renato Oliveira |
Theoretical class |
10:15-10:45 |
Coffee break, open time for discussion with instructors |
||
10:45-12:00 |
The COVID-19 Data Portal |
Nadim Rahman |
Theoretical and Practical class |
12:00-12:45 | Introduction to EMBL-EBI, resources, services and tools | Piv Gopalasingam | Theoretical class |
12:45-14:00 |
Lunch time |
||
14:00-15:45 |
Using the Mgnify microbiome analysis resource |
Guillermo Rangel-Pineros |
Theoretical and practical class |
15:45-16:00 |
Coffee break, open time for discussion with instructors |
||
16:00-17:30 |
Brainstorming: limitations and challenges for the use of bioinformatic tools for viral discovery
|
Arthur Gruber |
Theoretical and practical class |
17:30-18:00 |
Questions & Answers |
||
18:00 |
End of day 1 |
||
Day 2 - Tuesday 9th November 2021 |
|||
08:30-09:00 |
Open time for discussion with instructors |
||
09:00-10:15 |
Challenges for viral detection and discovery |
Arthur Gruber |
Theoretical class |
10:15-10:45 |
Coffee break, open time for discussion with instructors |
||
10:45-12:00 |
Rational design of profile HMMs |
Liliane S.O. Kashiwabara |
Theoretical class |
12:00-12:30 |
Questions & Answers |
||
12:30-14:00 |
Lunch time |
||
14:00-15:45 |
Viral profile HMM construction |
Arthur Gruber / Liliane S.O. Kashiwabara / Robson F. Souza |
Theoretical and practical class |
15:45-16:00 |
Coffee break, open time for discussion with instructors |
||
16:00-17:30 |
Screening datasets using profile HMMs |
Arthur Gruber / Liliane S.O. Kashiwabara / Robson F. Souza |
Theoretical and practical class |
17:30-18:00 |
Questions & Answers |
||
18:00 |
End of day 2 |
||
Day 3 - Wednesday 10th November 2021 |
|||
08:30-09:00 |
Open time for discussion with instructors |
||
09:00-10:15 |
Databases of viral profile HMMs. Seed-driven progressive assembly |
Arthur Gruber |
Theoretical class |
10:15-10:45 |
Coffee break, open time for discussion with instructors |
||
10:45-12:00 |
Finding multigene elements with profile HMMs. Viral discovery and classification using profile HMMs |
Arthur Gruber |
Theoretical class |
12:00-12:30 |
Questions & Answers |
||
12:30-14:00 |
Lunch time |
||
14:00-15:45 |
Progressive assembly using profile HMMs as seeds |
Arthur Gruber / Liliane S.O. Kashiwabara / Robson F. Souza |
Theoretical and practical class |
15:45-16:00 |
Coffee break, open time for discussion with instructors |
||
16:00-17:30 |
Finding proviruses in bacterial genomes with pHMMs |
Arthur Gruber / Liliane S.O. Kashiwabara / Robson F. Souza |
Theoretical and practical class |
17:30-18:00 |
Questions & Answers |
||
18:00 |
End of day 3 |
||
Day 4 - Thursday 11th November 2021 |
|||
08:30-09:00 |
Open time for discussion with instructors |
||
09:00-10:15 |
SARS-CoV-2: genome sequencing and variant detection |
Tulio de Lima Campos |
Theoretical class |
10:15-10:45 |
Coffee break, open time for discussion with instructors |
||
10:45-12:00 |
SARS-CoV-2: genome sequencing and variant detection |
Tulio de Lima Campos |
Theoretical class |
12:00-12:30 |
Questions & Answers |
||
12:30-14:00 |
Lunch time |
||
14:00-15:45 |
ViralFlow: a hands-on tutorial of the SARS-CoV-2 genome sequencing and variant detection pipeline of FIOCRUZ genomic surveillance network |
Antonio Marinho da Silva Neto |
Theoretical and practical class |
15:45-16:00 |
Coffee break, open time for discussion with instructors |
||
16:00-17:30 |
ViralFlow : a practial guide to SARS-CoV-2 genome sequencing and variant detection of FIOCRUZ genomic survaillance |
Antonio Marinho da Silva Neto |
Theoretical and practical class |
17:30-18:00 |
Questions & Answers |
||
18:00 |
End of day 4 |
||
Day 5 - Friday 12th November 2021 |
|||
08:30-09:00 |
Open time for discussion with instructors |
||
09:00-10:15 |
Introduction to Viral Phylogenomic Analysis |
Felipe Naveca |
Theoretical class |
10:15-10:45 |
Coffee break, open time for discussion with instructors |
||
10:45-12:00 |
Paper presentation: COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence |
Felipe Naveca |
Theoretical class |
12:00-12:30 |
Questions & Answers |
||
12:30-14:00 |
Lunch time |
||
14:00-15:45 |
1) Running ML phylogenetic analysis with IQ-TREE |
Felipe Naveca |
Theoretical and practical class |
15:45-16:00 |
Coffee break, open time for discussion with instructors |
||
16:00-17:30 |
4) Visualizing, analyzing, and summarizing BEAST output |
Felipe Naveca |
Theoretical and practical class |
17:30-18:00 |
Questions & Answers |
||
18:00 |
End of course |
In order to be considered for a place on this course applicants must complete the online application form.
Incomplete applications will NOT be considered.
If you have any general queries about the workshop application/registration process please email Guilherme Oliveira and Piv Gopalasingam.
For specific workshop enquiries please email Arthur Gruber
Please note that this course is free, but unexplained absence will result in blacklisting for future courses and opportunities.
The course will have a maximum of 25 participants and the application will run from October 1st to 31st, 2021. Registration will be dependent on selection upon successful completion of the application process by order of arrival.
Application/Registration will close on 31 October 2021 at 12:00 (GMT)
A bibliography for this course is available to view below. Articles can be accessed here.
Part 1 - Viral discovery
Alves JM, de Oliveira AL, Sandberg TO, Moreno-Gallego JL, de Toledo MA, de Moura EM, Oliveira LS, Durham AM, Mehnert DU, Zanotto PM, Reyes A, Gruber A. GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data. Front Microbiol. 2016 Mar 4;7:269. doi: 10.3389/fmicb.2016.00269. PMID: 26973638; PMCID: PMC4777721.
Cobbin JC, Charon J, Harvey E, Holmes EC, Mahar JE. Current challenges to virus discovery by meta-transcriptomics. Curr Opin Virol. 2021 Sep 27;51:48-55. doi: 10.1016/j.coviro.2021.09.007. Epub ahead of print. PMID: 34592710.
Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14(9):755-63. doi: 10.1093/bioinformatics/14.9.755. PMID: 9918945.
Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20. PMID: 22039361; PMCID: PMC3197634.
Fonseca P, Ferreira F, da Silva F, Oliveira LS, Marques JT, Goes-Neto A, Aguiar E, Gruber A. Characterization of a Novel Mitovirus of the Sand Fly Lutzomyia longipalpis Using Genomic and Virus-Host Interaction Signatures. Viruses. 2020 Dec 23;13(1):9. doi: 10.3390/v13010009. PMID: 33374584; PMCID: PMC7822452.
Oliveira LS, Gruber A. Rational Design of Profile Hidden Markov Models for Viral Classification and Discovery. In: Helder I. N, editor. Bioinformatics [Internet]. Brisbane (AU): Exon Publications; 2021 Mar 20. Chapter 9. PMID: 33877768.
Reyes, A, Alves JM, Durham AM, Gruber A. (2017). Use of profile hidden Markov models in viral discovery: current insights. Advances in Genomics and Genetics 7:29-45. https://doi.org/10.2147/AGG.S136574
Part 2 - SARS-CoV-2
Dezordi FP, Campos TL, Jeronimo PMC, Aksenen CF, Almeida SP, Wallau GL. ViralFlow: an automated workflow for SARS-CoV-2 genome assembly, lineage assignment, mutations and intrahost variants detection. medRxiv 2021.10.01.21264424; doi: https://doi.org/10.1101/2021.10.01.21264424
Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017 Jun;14(6):587-589. doi: 10.1038/nmeth.4285. Epub 2017 May 8. PMID: 28481363; PMCID: PMC5453245.
Naveca FG et al. COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence. Nat Med. 2021 Jul;27(7):1230-1238. doi: 10.1038/s41591-021-01378-7. Epub 2021 May 25. PMID: 34035535.
Resende PC et al. A Potential SARS-CoV-2 Variant of Interest (VOI) Harboring Mutation E484K in the Spike Protein Was Identified within Lineage B.1.1.33 Circulating in Brazil. Viruses. 2021 Apr 21;13(5):724. doi: 10.3390/v13050724. PMID: 33919314; PMCID: PMC8143327.
Resende PC et al. The ongoing evolution of variants of concern and interest of SARS-CoV-2 in Brazil revealed by convergent indels in the amino (N)-terminal domain of the spike protein. Virus Evol. 2021 Aug 14;7(2):veab069. doi: 10.1093/ve/veab069. PMID: 34532067; PMCID: PMC8438916.
Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018 Jun 8;4(1):vey016. doi: 10.1093/ve/vey016. PMID: 29942656; PMCID: PMC6007674.