Average Nucleotide Identity

The average nucleotide identity (ANI) measures the mean sequence identity of all shared orthologous regions between two genomes. It is widely used to define species boundaries in prokaryotes, with a typical cutoff of ~95–96% ANI indicating the same species.

ANI provides higher resolution than 16S rRNA comparisons and is computed using tools like fastANI or pyANI, though these can be quite complex to run. It also can give you a more formal measure how close an assembly is to known reference strains, and potentially whether they are different species.

Here we will use a custom function in Python (computeANI.py) to calculate the ANI from an assembly to different reference strains that have been provided to you.

The data we will use for this exercise are:


  1. Install the required dependencies into your conda environment.
  2. conda install -c bioconda blast
    pip install biopython
    

    These should already be downloaded but we want to make sure they are available to run the next command.

  3. We can now calculate the ANI between our unknown assembly and each reference strain, one at time, with the following command (this will calculate ANI for the unknown assembly and the H37Rv M. tuberculosis strain):
  4. python computeANI.py Unknown.fasta H37Rv.fasta
    

Run the analysis again for each of the reference strains.

Questions

  1. Which species is the unknown sample most likely to belong to?

  2. Can you gain any more insights from this analysis about the relatedness among all the species tested?



This is the end of the activities in practical session 5. Navigate back to the homepage for other activities here.