A TXT-type anvi’o artifact. This artifact is typically provided by the user for anvi’o to import into its databases, process, and/or use.
🔙 To the main page of anvi’o programs and artifacts.
There are no anvi’o tools that generate this artifact, which means it is most likely provided to the anvi’o ecosystem by the user.
This artifact is a TAB-delimited file that describes a set of enzymes.
The user can generate this file to define an arbitrary set of enzymes that they want to estimate metabolism on, using the program anvi-estimate-metabolism.
Each row (besides the header) in this file represents one enzyme in the set. At minimum, the file must contain three columns:
gene_idcolumn containing a unique value to identify a gene for the enzyme. The value can be either a string (like a gene name) or an integer (like a gene callers id), but it has to be unique because sometimes multiple genes can have the same enzyme annotation.
enzyme_accessioncolumn containing the accession of the enzyme, such as a KEGG Ortholog accession for KOfams, a COG accession for NCBI COGs, a Pfam, etc
sourcecolumn containing the name of the database that would be used to annotate the enzyme. For example, “KOfam”, “COG20_FUNCTION”, “Pfam”, etc.
Ideally, all annotation sources in this column would match to those used to define the metabolic pathways you are estimating completeness for (whether those are KEGG Modules or user-defined modules as in user-modules-data, but in practice, we don’t currently check for this. If you include some enzymes that are not part of any metabolic modules, they simply will not contribute to the completeness scores of any pathways, and you would therefore only see them in “hits” mode output files.
source column is (at this time) mostly for you to make sure you know which database these enzymes are coming from and that at least some (hopefully most) will actually be part of the metabolic pathways you are interested, because otherwise the results from anvi-estimate-metabolism might not make much sense. However, you do have complete freedom to define the ‘source’ value arbitrarily, if you want. But please keep in mind that this may change in the very near future - one day these
source values might actually matter for the functioning of anvi-estimate-metabolism (in which case this documentation will be updated to reflect that). So it is best to get used to setting them properly. :)
Here is an example file with the minimum set of columns:
If you want downstream programs like anvi-estimate-metabolism to have access to the coverage and detection data for each enzyme (well, technically, its gene), then you can add two additional columns to this file:
coveragecolumn should contain the numerical coverage value for the gene encoding the enzyme
detectioncolumn should contain the numerical detection value for the gene encoding the enzyme
If these columns are included, you can use the
--add-coverage flag with anvi-estimate-metabolism so that this data is included in the output for each metabolic pathway and/or enzyme. However, you do need to include both of the columns - that program does not currently support adding just coverage or just detection.
Edit this file to update this information.