BEAST (Bayesian Evolutionary Analysis Sampling Trees) is a powerful software package widely used for inferring time-measured phylogenies from molecular sequence data. It employs a Bayesian framework to simultaneously estimate phylogenetic trees, evolutionary parameters, and divergence times, allowing researchers to integrate molecular sequence data with temporal information to reconstruct the evolutionary history of species and pathogens. BEAST is particularly valuable for timed phylogeny building due to its ability to model complex evolutionary processes, account for uncertainty in phylogenetic inference, and incorporate prior knowledge about the evolutionary rates and divergence times.
One of the key features that makes BEAST an important tool for timed phylogeny building is its flexibility in modeling evolutionary processes. BEAST allows users to specify sophisticated evolutionary models, including substitution models, molecular clock models, demographic models, and phylogeographic models, tailored to the specific characteristics of the data and the biological question of interest. Additionally, BEAST provides rigorous statistical methods, such as Bayesian Markov Chain Monte Carlo (MCMC) sampling, for estimating posterior distributions of model parameters, allowing for the quantification of uncertainty in the inferred phylogenies and divergence times. Overall, BEAST’s combination of flexibility, sophistication, and statistical rigor makes it an indispensable tool for timed phylogeny building and advancing our understanding of evolutionary history across diverse taxa.
The data we will be using in this exercise are:
TB_cluster.fasta – A FASTA alignment file of concatenated SNPs from 37 M. tuberculosis samples collected between 2005 - 2014 in British Columbia. These isolates all share a MIRU-VNTR type, suggesting they may be linked by transmission.
TB_cluster.txt – A text file with two columns, the name of the 37 M. tuberculosis samples and their collection dates.
To run BEAST2, we first need to create XML file in BEAUti that can be read by the main tool - BEAST. BEAUti is included in the BEAST2 suite of software. BEAUti can also be run using the command line but here we will take you through the user interface.
Other types of data can be included, such as binary character data, and multiple data sources for the same samples can be used together with different models applied. For example, you could use SNPs in the form of nucleotides and the presence/absence of indels as binary data to estimate the phylogeny, with different evolutionary models applied to each type of data.
This can be set to ‘strict’ if you believe that the molecular clock is constant across all branches of the tree or relaxed if you want a more flexible model that allows for clock rates to vary across branches of the tree. Here we can also set a prior value for our molecular clock in the box below. To read more about molecular clocks see here:
This parameter sets the population demographic model, which are used to infer historical changes in population size over time based on molecular sequence data. These models allow researchers to estimate parameters related to population dynamics, such as changes in effective population size, population growth rates, demographic bottlenecks, and migration rates.
Here we will set the population model as ‘Coalescent Constant Model’. Please read more about the different population demographic models here.
The number of chains to run the model for is dependent on the complexity of the data and the underlying models; longer runs with more likely lead to the MCMC chains to converge (reach an equilibrium):
This includes an Effective Sample Size (ESS) of each parameter. Your results will look slightly different to these, and to each other, as BEAST is a stochastic program. A small ESS (< 100) shows that then the estimate of the posterior distribution of that parameter is likely poor, whereas a larger ESS (> 200) is accepted as good. More information on ESS and how to improve scores can be found here.
Convergence refers to the property of an MCMC algorithm whereby it reaches a stationary distribution that accurately represents the posterior distribution of the model parameters. In simpler terms, it indicates that the chain has explored the parameter space sufficiently and is sampling from the true underlying distribution.
Discussion: Explore the different posterior parameters in Tracer. How long ago was the last common ancestor of all sequences in the tree inferred to have been present? What is the confidence around this value?
Extra exercises: Re-run the BEAST analysis changing the ‘clock model’ parameter in BEAUti to “Optimised relaxed clock” and examine the log file in Tracer. Which run appears optimal? You can also change the nucleotide substitution model - try changing it to HKY and examine the log files in Tracer, does this improve the posterior ESS scores?
Question: What is the posterior support for the node representing the most recent common ancestor?
Along with inferring timed phylogenies, we can use BEAST (Bayesian Evolutionary Analysis Sampling Trees) to carry out a range of phylogenetic and phylodynamic analyses. BEAST2 is a newer and more advanced version of BEAST, developed by the same team but with a redesigned architecture to provide a modular and extensible framework. As such, there are a variety of packages available in BEAST2 that have been developed for a range of different analyses, such as jointly reconstructing transmission networks with a phylogeny and inferring recombination. Furthermore, BEAST2 allows users to write their own packages to use their underlying Bayesian architecture, as well as providing tutorials to write these packages.
Here, we will run a phylodynamics analysis using the BEAST2 package ‘BDSKY’ to infer past population dynamics of our TB cluster.
Have all parameters reached convergence? How could we improve these if not?
We can now view the results of Skyline analysis. Click ‘Analysis -> Bayesian Skyline Reconstruction’:
BEAST is very well documented with numerous basic and advanced tutorials. The excellent ‘Taming the BEAST’ workshop has many different walkthoughs and tutorials to guide you through all aspects of runnnig BEAST.
In addition, BEAST2 contains many other packages and analysis types that you can explore. Again the Taming the BEAST workshop is a great resource to learn more, including this tutorial on skyline plots.