ANI

Average Nucleotide Identity

The average nucleotide identity (ANI) measures the mean sequence identity of all shared orthologous regions between two genomes. It is widely used to define species boundaries in prokaryotes, with a typical cutoff of ~95–96% ANI indicating the same species.

ANI provides higher resolution than 16S rRNA comparisons and is computed using tools like fastANI or pyANI, though these can be quite complex to run. It also can give you a more formal measure how close an assembly is to known reference strains, and potentially whether they are different species.

Here we will use a custom function in Python (computeANI.py) to calculate the ANI from an assembly to different reference strains that have been provided to you.

The data we will use for this exercise are:

E.coli_K12.fasta - A FASTA file containing a reference strain of E. coli K-12 species.
E.coli_O157H7.fasta - A FASTA file containing a reference strain of E. coli O157:H7 species.
S.flexneri.fasta - A FASTA file containing a reference strain of Shigella flenxeri species.
S.sonnei.fasta - A FASTA file containing a reference strain of Shigella sonnei species.
H37Rv.fasta - A FASTA file containing a reference strain of Mycobacterium tuberculosis species.
unknown.fasta - A FASTA file containing an unknown assembled bacterial genome.

Install the required dependencies into your conda environment.

conda install -c bioconda blast
pip install biopython

These should already be downloaded but we want to make sure they are available to run the next command.

We can now calculate the ANI between our unknown assembly and each reference strain, one at time, with the following command (this will calculate ANI for the unknown assembly and the H37Rv M. tuberculosis strain):

python computeANI.py Unknown.fasta H37Rv.fasta

Run the analysis again for each of the reference strains.

Questions

Which species is the unknown sample most likely to belong to?
Can you gain any more insights from this analysis about the relatedness among all the species tested?

Average Nucleotide Identity

Run the analysis again for each of the reference strains.

Questions

This is the end of the activities in practical session 5. Navigate back to the homepage for other activities here.