Computational Molecular Evolution

Date:

 Monday 8 Friday 19 May 2017

Venue: 

European Bioinformatics Institute (EMBL-EBI) - Training Room 2 - Wellcome Genome Campus, Hinxton, Cambridge,  CB10 1SD, United Kingdom

Application opens: 

Friday 02 September 2016

Application deadline: 

Friday 13 January 2017

Participation: 

Open application with selection

Contact: 

Yvonne Thornton

Registration fee: 

£1660

Registration closed

Overview

The need for phylogenetic comparisons of molecular sequences has been increasing steadily with the explosive growth of genomic sequence data.  Estimation of species phylogenies and species divergence times, inference of population demographic processes and migration patterns, and delineation of species boundaries are central to our understanding of biodiversity and to interpreting genomic sequence data. Furthermore, molecular evolutionary analyses can provide important insights into the evolutionary process of sequences and genes: for example, detecting adaptive molecular evolution may be useful to disentangle viral infections and dynamics.  These processes can be analyzed via sophisticated statistical inference methods by means of efficient  algorithms that are implemented in a plethora of software packages.  However, empirical biologists often find it challenging to make effective use of those computational tools, partly due to the challenges in understanding their underlying statistical and computational principles.

Run biennially at the Genome Campus (and jointly with Wellcome genome Campus advanced courses and conferences) this hands-on computational course aims to provide early-career stage researchers with the theoretical knowledge and practical skills to carry out molecular evolutionary analyses on sequence data.  The extensive programme comprises a mixture of lectures and computer practicals, and covers: data retrieval and assembly, alignment techniques, phylogeny reconstruction methods including maximum likelihood and Bayesian methods, hypothesis testing, and coalescent-based inference methods at the interface of phylogenetics and population genetics.  Besides acquiring the skills to properly deploy major software packages such as PhyML, RaXML, MrBayes, BEAST, BPP, etc., the course also focuses on statistical inference methods and algorithms. This will allow the participants to attain a thorough understanding of the underlying principles of the software they use.  

Audience

The course is aimed primarily at biology and bioinformatics PhD students or postdocs in the early stages of their research career who already have some familiarity with phylogenetic methods (i.e., have already used some of the computer programs).  Programming experience is not required, although knowledge of R and experience in a scripting language such as python or perl will be very useful.  Candidates without prior experience with the Unix/Linux command line will be required to acquire these skills prior to the course. Training materials and exercises for improving Unix/Linux skills of participants will be provided before the course.

Outcomes

At the end of the course participants should be able to:

  • Interpret evolutionary trees and recognise / discuss the power of molecular phylogenies for understanding real-world biological questions, relating to evolutionary history, current-day biodiversity and future diversification of living organisms
  • Browse, query and extract genome sequence from public databases, and create multiple sequence alignments.
  • Employ appropriate bioinformatics skills that also allow for the analysis of large genome-scale datasets, including command line use of specialist software, simple scripting, compiling programs and submitting jobs on multi-core servers and compute clusters. 
  • Select and apply appropriate commonly used phylogenetic software packages (such as PhyML, RAxML, PAML, MrBayes, BEAST) to infer phylogenetic trees, estimate divergence times, and test phylogenetic hypotheses.
  • Explain the underlying principles of major phylogenetic methods such as distance matrix-based, maximum likelihood, and Bayesian methods, including the MCMC method. 
  • Explain the use of Markov models of nucleotide, amino acid and codon substitution, hypothesis testing using the likelihood ratio test, coalescent and multispecies coalescent models in species tree estimation and species delimitation.
  • Apply likelihood ratio tests to infer the existence and location of molecular adaptation affecting protein-coding genes.