A DB-type anvi’o artifact. This artifact is typically generated, used, and/or exported by anvi’o (and not provided by the user)..
🔙 To the main page of anvi’o programs and artifacts.
A database containing information from the KEGG MODULE database for use in metabolic reconstruction and functional annotation of KEGG Orthologs (KOs).
This database is part of the kegg-data directory. You can get it on your computer by running anvi-setup-kegg-kofams. Programs that rely on this database include anvi-run-kegg-kofams and anvi-estimate-metabolism.
Most users will never have to interact directly with this database. However, for the brave few who want to try this (or who are figuring out how anvi’o works under the hood), there is some relevant information below.
In the current implementation, data about each metabolic pathway from the KEGG MODULE database is present in the
kegg_modules table, which looks like this:
|M00001||NAME||Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate||NULL||2|
|M00001||DEFINITION||(K00844,K12407,K00845,K00886,K08074,K00918) (K01810,K06859,K13810,K15916) (K00850,K16370,K21071,K00918) (K01623,K01624,K11645,K16305,K16306) K01803 ((K00134,K00150) K00927,K11389) (K01834,K15633,K15634,K15635) K01689 (K00873,K12406)||NULL||3|
|M00001||ORTHOLOGY||K00844||hexokinase/glucokinase [EC:184.108.40.206 220.127.116.11] [RN:R01786]||4|
|M00001||ORTHOLOGY||K12407||hexokinase/glucokinase [EC:18.104.22.168 22.214.171.124] [RN:R01786]||4|
These data correspond to the information that can be found on the KEGG website for each metabolic module - for an example, you can see the page for M00001 (or, alternatively, its flat text file version from the KEGG REST API).
module column indicates the module ID number while the
data_name column indicates what type of data the row is describing about the module. These data names are usually fairly self-explanatory - for instance, the
DEFINITION rows describe the module definition and the
ORTHOLOGY rows describe the KEGG Orthologs (KOs) belonging to the module - however, for an official explanation, you can check the KEGG help page.
data_definition columns hold the information corresponding to the row’s
ORTHOLOGY fields these are the KO number and the KO’s functional annotation, respectively. Not all rows have a
Finally, some rows of data originate from the same line in the original KEGG MODULE text file; these rows will have the same number in the
line column. Perhaps this is a useless field. But it is there.
self table of this database, there is an entry called
hash. This string is a hash of the contents of the database, and it allows us to identify the version of the data within the database. This value is important for ensuring that the same MODULES.db is used both for annotating a contigs database with anvi-run-kegg-kofams and for estimating metabolism on that contigs database with anvi-estimate-metabolism.
You can easily check the hash value by running the following:
It will appear in the
DB Info section of the output, like so:
DB Info (no touch also) =============================================== num_modules ..................................: 443 total_entries ................................: 13720 creation_date ................................: 1608740335.30248 hash .........................................: 45b7cc2e4fdc
If you have annotated a contigs-db using anvi-run-kegg-kofams, you would find that the corresponding hash in that contigs database matches to this one:
DB Info (no touch also) =============================================== [....] modules_db_hash ..............................: 45b7cc2e4fdc
If you want to extract information directly from this database, you can do it with a bit of SQL :)
Here is one example, which obtains the name of every module in the database:
# learn where the MODULES.db is: export ANVIO_MODULES_DB=`python -c "import anvio; import os; print(os.path.join(os.path.dirname(anvio.__file__), 'data/misc/KEGG/MODULES.db'))"` # get module names: sqlite3 $ANVIO_MODULES_DB "select module,data_value from kegg_modules where data_name='NAME'" | \ tr '|' '\t' > module_names.txt
Edit this file to update this information.