A DB-type anvi’o artifact. This artifact is typically generated, used, and/or exported by anvi’o (and not provided by the user)..
🔙 To the main page of anvi’o programs and artifacts.
A directory of data downloaded from the KEGG database resource for use in function annotation and metabolism estimation.
It is created by running the program anvi-setup-kegg-data. Not everything from KEGG is included in this directory, only the information relevant to downstream programs. The most critical components of this directory are KOfam HMM profiles and the modules-db which contains information on metabolic pathways as described in the KEGG MODULES resource, as well as functional classification hierarchies from KEGG BRITE.
The default location of this data is in the anvi’o folder, at
You can change this location when you run anvi-setup-kegg-data by providing a different path to the
anvi-setup-kegg-data --kegg-data-dir /path/to/directory/KEGG
If you do this, you will need to provide this path to downstream programs that require this data as well.
Here is a schematic of how the kegg-data folder will look after setup:
KEGG |- MODULES.db |- ko_list.txt |- modules.keg |- hierarchies.json |- HMMs | |- Kofam.hmm | |- Kofam.hmm.h3f | |- (....) | |- modules | |- M00001 | |- M00002 | |- (....) | |- BRITE | |- ko00001 | |- ko00194 | |- (....) | |- orphan_data |- 01_ko_fams_with_no_threshold.txt |- 02_hmm_profiles_with_ko_fams_with_no_threshold.hmm
Typically, users will not have to work directly with any of these files, as downstream programs will interface directly with the modules-db.
However, for the curious, here is a description of each component in this data directory:
ko_list.txt: a tab-delimited file from the KEGG KOfam resource that describes the KOfam profile for each KEGG Ortholog (KO). It contains information like the bitscore threshold (used to differentiate between ‘good’ and ‘bad’ hits when annotating sequences), the function definition, and various data about the sequences used to generate the profile.
HMMssubfolder: contains a file of concatentated KOfam profiles (also originally downloaded from KEGG), as well as the indexes for this file.
orphan_datasubfolder: contains KOfam profiles for KOs that do not have a bitscore threshold in the
ko_list.txtfile (in the
.hmmfile) and their corresponding entries in from the
01_ko_fams_with_no_threshold.txt). Please note that KOs from the
orphan_datadirectory will not be annotated in your contigs-db when you run anvi-run-kegg-kofams. However, if you ever need to take a look at these profiles or use them in any way, here they are. :)
modules.keg: a flat text file describing all metabolic modules available in the KEGG MODULE resource. This includes pathway and signature modules, but not reaction modules.
modulessubfolder: contains flat text files, one for each metabolic module, downloaded using the KEGG REST API. Each file describes a metabolic module’s definition, classification, component orthologs, metabolic reactions, compounds, and any miscellaneous data like references and such. For an example, see the module file for M00001.
hierarchies.json: a JSON-formatted file describing the available functional hierarchies in the KEGG BRITE resource.
BRITEsubfolder: contains JSON-formatted files, each one of which describes a BRITE hierarchy.
MODULES.db: a SQLite database containing data parsed from the module files and BRITE hierarchies. See modules-db.
The KOfam profiles are used directly by anvi-run-kegg-kofams for annotating genes with KEGG Orthologs. The MODULE and BRITE data in the above files are processed and organized into the modules-db for easier programmatic access. anvi-run-kegg-kofams uses this database to annotate genes with BRITE categories and with the modules they participate in, when relevant. anvi-estimate-metabolism uses this database to get module information when computing completeness scores for each metabolic module.
Edit this file to update this information.