Recombination

Recombination plays a crucial role in bacterial evolution by enabling the horizontal exchange of genetic material, often leading to the rapid spread of antimicrobial resistance and virulence traits.

Detecting recombination in whole genome sequencing (WGS) data typically involves identifying regions of elevated sequence divergence or patterns inconsistent with inheritance by descent. For example, this can manifest as sections of the genome or mutations that are inconsistent with a constructed phylogeny. Sections of the genome that are likely recombination hotspots can then be masked in alignments to reduce bias caused by these events.

Tools such as Gubbins, ClonalFrameML, and BRATNextGen analyze genome assemblies to detect recombination hotspots and estimate the relative contribution of recombination versus mutation.

When working with genome assemblies, recombination detection often requires accurate alignment and consideration of reference bias, as fragmented or misassembled regions can obscure true recombination signals.

Here we will use ClonalFrameML to conduct a recombination analysis on Klebiella pneumoniae data from Taiwan.

The data we will use for this exercise are:


  1. Install the required dependencies into your conda environment.
  2. conda install -c conda-forge -c bioconda -c defaults clonalframeml
    
  3. Run ClonalFrameML to estimate recombination events that are present in the genome. This will be run with the following command:
  4. ClonalFrameML Klebsiella.tree Klebsiella.fasta Klebsiella_recomb
    

    This can take around 15 minutes to run so please put this on before lunch.

    After ClonalFrameML has completed, it will have produced the following files:

  5. Open the "Klebsiella_recomb.importation_status.txt" file. This will show you estimated positions in the alignment of recombination events and on which branches these fall.

  6. We can also use an R script that is provided with ClonalFrameML to view the recombination events along the alignment and where this is on the tree. I have provided the R script - cfml_results.R. You can run the script with the following command:

  7. Rscript cfml_results.R Klebsiella_recomb
    
  8. Open the result PDF file - Klebsiella_recomb.cfml.pdf. Discuss what you think this plot shows.

This analysis will identify potential recombination events in your alignments and produce the plots. If you want to determine in which genes the recombination events are found, you will need to refer to the annotation from the pangenome construction.


Next activity: Average nucleotide identity