Local Branching Index (LBI)

The Local Branching Index (LBI) is a quantitative measure of the recent evolutionary expansion within a phylogenetic tree at the nodes. It estimates how densely the local neighborhood of a node is populated with nearby descendants, reflecting recent diversification.

LBI is often used to identify rapidly expanding lineages, particularly in bacterial and viral populations.

In this exercise, we will calculate and compare the LBI of lineage 2 and lineage 4 strains in the TB dataset from Moldova. The script to calculate LBI has been provided to you with the “lbi.R” script in the data folder - this is the LBI calculation taken from the TreeImbalance package in R.

We will the following data for this exercise:

1. First we will load the LBI script and set the packages required:

# Set packages
require(ape)
require(ggplot2)
source("lbi.R")

2. Next, we can load our tree “TB_Moldova.tree” and assign it to the variable ‘tree’:

tree <- read.tree("TB_Large.tree")

3. We want to compare the LBI estimates between lineage 2 and lineage 4 lineages in our tree. Therefore, we must run the LBI estimate on trees of only lineage 2 strains and only lineage 4 strains independently. Here we will use the load the text files with the names of the lineage 2 and lineage 4 tips, and use the ‘keep.tip()’ function in ‘ape’ package to extract trees of only lineage 2 and lineage 4 strains from the input phylogeny:

lineage2 <- read.table("TB_Lineage2.txt")[,1] # the [,1] at the end will transform the single columntext files into a character string
lineage4 <- read.table("TB_Lineage4.txt")[,1]

lineage2_tree <- keep.tip(tree, lineage2)
lineage4_tree <- keep.tip(tree, lineage4)

4. We can now use the ‘lbi’ function to calculate the LBI for each node in the lineage 2 and lineage 4 trees

lineage2_lbi <- lbi(lineage2_tree)
lineage4_lbi <- lbi(lineage4_tree)

5. This will give us a string of values for an estimate of the LBI in each tree. We can compare the LBI of lineage 2 and lineage 4 strains in our dataset by plotting these density plots. First we can combine these data into a two-column table for plotting, first column being the lineage (“Lineage”), second column as the the estimated LBI (“LBI”).

names(lineage2_lbi) <- rep("Lineage2", times = length(lineage2_lbi))
names(lineage4_lbi) <- rep("Lineage4", times = length(lineage4_lbi))

results <- data.frame(Lineage = c(names(lineage2_lbi), names(lineage4_lbi)),
                      LBI = c(lineage2_lbi, lineage4_lbi))

results <- results[!is.na(results$LBI),] ## remove NA values

6. Finally, we can use the 'geom_density' function in ggplot2 to plot our LBI densities.

ggplot(results, aes(x = LBI, fill = Lineage, color = Lineage)) +
  geom_density(alpha = 0.4) +
  theme_minimal() +
  labs(title = "LBI Density Plot by Lineage",
       x = "LBI",
       y = "Density")

Questions:

  1. Which lineage has the highest LBI?

  2. Why might you see multiple peaks in the density plots of one of the lineages?

Next activity: Detecting homoplasy