anvi-estimate-scg-taxonomy

Estimates taxonomy at genome and metagenome level. This program is the entry point to estimate taxonomy for a given set of contigs (i.e., all contigs in a contigs database, or contigs described in collections as bins). For this, it uses single-copy core gene sequences and the GTDB database.

🔙 To the main page of anvi’o programs and artifacts.

Authors

Can consume

profile-db contigs-db scgs-taxonomy collection bin metagenomes

Can provide

genome-taxonomy genome-taxonomy-txt

Usage

This program makes quick taxonomy estimates for genomes, metagenomes, or bins stored in your contigs-db using single-copy core genes.

You can run this program on an anvi’o contigs database only if you already have setup the necessary databases to assign taxonomy on your computer by running anvi-setup-scg-taxonomy and annotated the contigs-db you are working with using anvi-run-scg-taxonomy, which are described in greater detail in this document), which also offers a comprehensive overview of what anvi-estimate-scg-taxonomy can do.

Keep in mind that the scg-taxonomy framework currently uses single-copy core genes found in GTDB genomes, thus it will not work well for low-completion, viral, or eukaryotic genomes.

This same functionality anvi-estimate-scg-taxonomy is implicitly accessed thorugh the anvi’o interactive interface, when you turn on real-time taxonomy estimation for bins. So, if you’ve ever wondered where those estimates were coming from, now you know.

So, what can this program do?

1. Estimate the taxonomy of a single genome

By default, this program wll assume your contigs-db contains only one genome, and will try to use the single-copy core genes (that were associated with taxonomy when you ran anvi-run-scg-taxonomy) to try to identify the taxonomy of your genome.

When you run

anvi-estimate-scg-taxonomy -c contigs-db

It will give you the best taxonomy hit for your genome. If you would like to see how it got there (by looking at the hits for each of the single-copy core genes), just use the --debug flag to see more information, as so:

anvi-estimate-scg-taxonomy -c contigs-db \ --debug

2. Estimate the taxa within a metagenome

By running this program in metagenome mode, it will assume that your contigs-db contains multiple genomes and will try to give you an overview of the taxa within it. To do this, it will determine which single-copy core gene has the most hits in your contigs (for example Ribosomal_S6), and then will look at the taxnomy hits for that gene across your contigs. The output will be this list of taxonomy results.

anvi-estimate-scg-taxonomy -c contigs-db \ --metagenome-mode

If you want to look at a specific gene (instead of the one with the most hits), you can also tell it to do that. For example, to tell it to look at Ribosomal_S9, run

anvi-estimate-scg-taxonomy -c contigs-db \ --metagenome-mode \ --scg-name Ribosomal_S9

3. Look at relative abundance of taxa across samples

If you provide a merged profile-db or single-profile-db, then you’ll be able to look at the relative abundance of your taxonomy hits (through a single-copy core gene) across your samples. Essentially, this adds additional columns to your output (one per sample) that descrbe the relative abundance of each hit in each sample.

Running this will look something like this,

anvi-estimate-scg-taxonomy -c contigs-db \ --metagenome-mode \ -p profile-db \ --compute-scg-coverages

For an example output, take a look at this page.

4. Estimate the taxonomy of your bins

This program basically looks at each of the bins in your collection as a single genome and tries to assign it taxonomy information. To do this, simply provide a collection, like this:

anvi-estimate-scg-taxonomy -c contigs-db \ -C collection

You can also look at the relative abundances across your samples at the same time, by running something like this:

anvi-estimate-scg-taxonomy -c contigs-db \ -C collection \ -p profile-db \ --compute-scg-coverages

Pro tip: you can use the output that emerges from the following output,

anvi-estimate-scg-taxonomy -c contigs-db \ -p profile-db \ -C collection \ -o TAXONOMY.txt

to display the taxonomy of your bins in the anvi’o interactive interface in collection mode:

anvi-interactive -c contigs-db \ -p profile-db \ -C collection \ --additional-layers TAXONOMY.txt

That simple.

5. Look at multiple metagenomes at the same time

You can even use this program to look at multiple metagenomes by providing a metagenomes artifact. This is useful to get an overview of what kinds of taxa might be in your metagenomes, and what kinds of taxa they share.

Running this

anvi-estimate-scg-taxonomy --metagenomes metagenomes \ --output-file-prefix EXAMPLE

will give you an output file containing all taxonomic levels found and their coverages in each of your metagenomes.

For a concrete example, check out this page.

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.