Recombination
Recombination plays a crucial role in bacterial evolution by enabling the horizontal exchange of genetic material, often leading to the rapid spread of antimicrobial resistance and virulence traits.
Detecting recombination in whole genome sequencing (WGS) data typically involves identifying regions of elevated sequence divergence or patterns inconsistent with inheritance by descent. For example, this can manifest as sections of the genome or mutations that are inconsistent with a constructed phylogeny. Sections of the genome that are likely recombination hotspots can then be masked in alignments to reduce bias caused by these events.
Tools such as Gubbins, ClonalFrameML, and BRATNextGen analyze genome assemblies to detect recombination hotspots and estimate the relative contribution of recombination versus mutation.
When working with genome assemblies, recombination detection often requires accurate alignment and consideration of reference bias, as fragmented or misassembled regions can obscure true recombination signals.
Here we will use ClonalFrameML to conduct a recombination analysis on Klebiella pneumoniae data from Taiwan.
The data we will use for this exercise are:
-
Klebsiella.fasta - A FASTA file containing a core genome alignment from 11 de novo assembled Klebsiella pneumoniae strains from Taiwan.
-
Klebsiella.tree - A newick format tree file constructed using the Klebsiella.fasta file in IQtree.
- Install the required dependencies into your conda environment.
- Run ClonalFrameML to estimate recombination events that are present in the genome. This will be run with the following command:
- Klebsiella_recomb.ML_sequence.fasta - Sequence reconstructed by maximum likelihood for all internal nodes of the phylogeny.
- Klebsiella_recomb.labelled_tree.newick – Tree with nodes labelled on the tree.
- Klebsiella_recomb.em.txt – The point estimates for R/theta, nu, delta and the branch lengths.
- Klebsiella_recomb.importation_status.txt – Table of recombination events, one line for each event, columns indicate the branch on which the event was found and the first and last genomic positions affected by the recombination event.
- Klebsiella_recomb.position_cross_reference.txt - Comma-separated values indicating for each location in the input sequence file the corresponding position in the ML_sequences.fasta file.
-
Open the "Klebsiella_recomb.importation_status.txt" file. This will show you estimated positions in the alignment of recombination events and on which branches these fall.
-
We can also use an R script that is provided with ClonalFrameML to view the recombination events along the alignment and where this is on the tree. I have provided the R script - cfml_results.R. You can run the script with the following command:
- Open the result PDF file - Klebsiella_recomb.cfml.pdf. Discuss what you think this plot shows.
conda install -c conda-forge -c bioconda -c defaults clonalframeml
ClonalFrameML Klebsiella.tree Klebsiella.fasta Klebsiella_recomb
This can take around 15 minutes to run so please put this on before lunch.
After ClonalFrameML has completed, it will have produced the following files:
Rscript cfml_results.R Klebsiella_recomb