Inspecting the temporality of sequences using TempEst

Before building a timed phylogeny, it can be important to test for temporality in your data. Temporality in this context refers to the presence of a molecular clock signal, meaning that genetic changes accumulate at a roughly consistent rate over time. This is essential for reliably estimating divergence times in a phylogenetic tree. This analysis can inform the clock model to select when building a timed phylogeny or identify sequences that may be problematic, such as those with incorrect dates or significant rate variation.

Tools like TempEst can be used to plot the root-to-tip distances for each sequence from an un-timed phylogeny against sequence dates. In a dataset where a molecular clock holds, you would expect a positive correlation between these distances and time. The root-to-tip distance in a phylogenetic tree is the evolutionary distance from the root of the tree (representing the most recent common ancestor of all the sequences in the tree) to the tips (representing the observed sequences).

The data we will be using in this exercise are:

Open TempEst:

Description1


1. You will be prompted to open a file, select the TB_cluster_ML.tree file. This will open the following screen:

Description1


2. Select Import Dates and open the “TB_cluster.txt” file to import the dates. This will prompt the following screen to parse the dates in the correct format. The dates are in the “yyyy-MM-dd” format:

Description1


3. You should see the following screen showing that the dates have been loaded correctly:

Description1


4. Finally, we can click on the ‘Root-to-tip’ tab. This brings up a plot of the root-to-tip distances against the sequence dates. We should see a positive correlation if there is temporality in the data:

Description1


If your analysis shows a good temporal signal, you can proceed with more confidence to build a timed phylogeny using software like BEAST, which can estimate divergence times and evolutionary rates.

If there’s no clear temporal signal, consider reviewing your data and methods. It might be necessary to exclude problematic sequences, refine your sampling strategy, or reevaluate the assumption of a molecular clock for your dataset.

Questions to discuss:

What is the correlation in our data?
Do you think there is temporality?
What can be do to increase the temporal signal in our dataset?
If we cannot increase the temporality, can we still build a timed phylogeny?

Back to the main activity