A program that computes metabolic enrichment across groups of genomes and metagenomes.
š To the main page of anviāo programs and artifacts.
kegg-metabolism user-metabolism groups-txt external-genomes internal-genomes functions
This program computes metabolic module enrichment across groups of genomes or metagenomes and returns a functional-enrichment-txt file (throughout this text, we will use the term genome to describe both for simplicity).
For its sister programs, see anvi-compute-functional-enrichment-in-pan and anvi-compute-functional-enrichment-across-genomes.
To run this program, you must already have estimated the completeness of metabolic modules in your genomes using the program anvi-estimate-metabolism and obtained a āmodulesā mode output file (which is the default output mode of that program). In addition to that, you will need to provide a groups-txt file to declare which genome belongs to which group for enrichment analysis to consider.
Determine the presence of modules. Each module in the āmodulesā mode output has a completeness score associated with it in each genome, and any module with a completeness score over a given threshold (set by --module-completion-threshold
) will be considered to be present in that genome.
Quantify the distribution of modules in each group of genomes. The distribution of a given module across genomes in each group will determine its enrichment. This is done by fitting a generalized linear model (GLM) with a logit linkage function in anvi-script-enrichment-stats
, and it produces a functional-enrichment-txt file.
The script anvi-script-enrichment-stats
was implemented by Amy Willis, and described first in this paper.
See kegg-metabolism or user-metabolism for more information on how to generate a āmodulesā mode output format from anvi-estimate-metabolism. Please note that the genome names in the modules file must match those that you will mention in the groups-txt file.
anvi-compute-metabolic-enrichment -M MODULES.TXT \ -G groups-txt \ -o functional-enrichment-txt
The default completeness threshold for a module to be considered āpresentā in a genome is 0.75 (=75%). If you wish to change this, you can do so by providing a different threshold between (0, 1], using the --module-completion-threshold
parameter:
anvi-compute-metabolic-enrichment -M MODULES.TXT \ -G groups-txt \ -o functional-enrichment-txt \ --module-completion-threshold 0.9
By default, this program uses the pathwise completeness score to determine which modules are āpresentā in a genome, but you can ask it to use stepwise completeness instead by using the --use-stepwise-completeness
flag.
anvi-compute-metabolic-enrichment -M MODULES.TXT \ -G groups-txt \ -o functional-enrichment-txt \ --use-stepwise-completeness
By default, the column containing genome names in your MODULES.TXT file will have the header db_name
, but there are certain cases in which you might have them in a different column name for your genomes or metagenomes (such as those cases where you did not run anvi-estimate-metabolism in multi-mode). In those cases, you can tell this program to look for a different column name to find your genomes or metagenomes using the --sample-header
. For example, if your metagenome names are listed under the metagenome_name
column, you would do the following:
anvi-compute-metabolic-enrichment -M MODULES.TXT \ -G groups-txt \ -o functional-enrichment-txt \ --sample-header metagenome_name
If you ran anvi-estimate-metabolism on a bunch of extra genomes but only want to include a subset of them in the groups-txt, that is fine. By default, any samples from the MODULES.TXT
file that are missing from the groups-txt will be ignored. However, there is also an option to include those missing samples in the analysis, as one big group called āUNGROUPEDā. To do this, you can use the --include-samples-missing-from-groups-txt
parameter.
anvi-compute-metabolic-enrichment -M MODULES.TXT \ -G groups-txt \ -o functional-enrichment-txt \ --include-samples-missing-from-groups-txt
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__
tag in this file to see an example.