anvi-compute-metabolic-enrichment [program]

A program that computes metabolic enrichment acros groups of genomes and metagenomes.

🔙 To the main page of anvi’o programs and artifacts.


Iva Veseli

Can consume

kegg-metabolism groups-txt external-genomes internal-genomes functions

Can provide



This program computes metabolic module enrichment across groups of genomes or metagenomes and returns a functional-enrichment-txt file (throughout this text, we will use the term genome to describe both for simplicity).

For its sister programs, see anvi-compute-functional-enrichment-in-pan and anvi-compute-functional-enrichment-across-genomes.

Module enrichment

To run this program, you must already have estimated the completeness of metabolic modules in your genomes using the program anvi-estimate-metabolism and obtained a “modules” mode output file (which is the default output mode of that program). In addition to that, you will need to provide a groups-txt file to declare which genome belongs to which group for enrichment analysis to consider.

How does it work?

  1. Determine the presence of modules. Each module in the “modules” mode output has a completeness score associated with it in each genome, and any module with a completeness score over a given threshold (set by --module-completion-threshold) will be considered to be present in that genome.

  2. Quantify the distribution of modules in each group of genomes. The distribution of a given module across genomes in each group will determine its enrichment. This is done by fitting a generalized linear model (GLM) with a logit linkage function in anvi-script-enrichment-stats, and it produces a functional-enrichment-txt file.

The script anvi-script-enrichment-stats was implemented by Amy Willis, and described first in this paper.

Basic usage

See kegg-metabolism for more information on how to generate a “modules” mode output format from anvi-estimate-metabolism. Please note that the genome names in the modules file must match those that you will mention in the groups-txt file.

anvi-compute-metabolic-enrichment -M MODULES.TXT \ -G groups-txt \ -o functional-enrichment-txt

Additional parameters

The default completeness threshold for a module to be considered ‘present’ in a genome is 0.75 (=75%). If you wish to change this, you can do so by providing a different threshold between (0, 1], using the --module-completion-threshold parameter:

anvi-compute-metabolic-enrichment -M MODULES.TXT \ -G groups-txt \ -o functional-enrichment-txt \ --module-completion-threshold 0.9

By default, the column containing genome names in your MODULES.TXT file will have the header db_name, but there are certain cases in which you might have them in a different column name for your genomes or metagenomes (such as those cases where you did not run anvi-estimate-metabolism in multi-mode). In those cases, you can tell this program to look for a different column name to find your genomes or metagenomes using the --sample-header. For example, if your metagenome names are listed under the metagenome_name column, you would do the following:

anvi-compute-metabolic-enrichment -M MODULES.TXT \ -G groups-txt \ -o functional-enrichment-txt \ --sample-header metagenome_name

If you ran anvi-estimate-metabolism on a bunch of extra genomes but only want to include a subset of them in the groups-txt, that is fine. By default, any samples from the MODULES.TXT file that are missing from the groups-txt will be ignored. However, there is also an option to include those missing samples in the analysis, as one big group called ‘UNGROUPED’. To do this, you can use the --include-samples-missing-from-groups-txt parameter. Just be careful that if you are also using the --include-ungrouped flag (see below), any samples without a specified group in the groups-txt will also be included in the ‘UNGROUPED’ group.

anvi-compute-metabolic-enrichment -M MODULES.TXT \ -G groups-txt \ -o functional-enrichment-txt \ --include-samples-missing-from-groups-txt

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.