A DB-type anvi’o artifact. This artifact is typically generated, used, and/or exported by anvi’o (and not provided by the user)..
🔙 To the main page of anvi’o programs and artifacts.
A database containing information from the KEGG MODULE database for use in metabolic reconstruction and functional annotation of KEGG Orthologs (KOs).
This database is part of the kegg-data directory. You can get it on your computer by running anvi-setup-kegg-kofams. Programs that rely on this database include anvi-run-kegg-kofams and anvi-estimate-metabolism.
Most users will never have to interact directly with this database. However, for the brave few who want to try this (or who are figuring out how anvi’o works under the hood), there is some relevant information below.
In the current implementation, data about each metabolic pathway from the KEGG MODULE database is present in the kegg_modules
table, which looks like this:
module | data_name | data_value | data_definition | line |
---|---|---|---|---|
M00001 | ENTRY | M00001 | Pathway | 1 |
M00001 | NAME | Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate | NULL | 2 |
M00001 | DEFINITION | (K00844,K12407,K00845,K00886,K08074,K00918) (K01810,K06859,K13810,K15916) (K00850,K16370,K21071,K00918) (K01623,K01624,K11645,K16305,K16306) K01803 ((K00134,K00150) K00927,K11389) (K01834,K15633,K15634,K15635) K01689 (K00873,K12406) | NULL | 3 |
M00001 | ORTHOLOGY | K00844 | hexokinase/glucokinase [EC:2.7.1.1 2.7.1.2] [RN:R01786] | 4 |
M00001 | ORTHOLOGY | K12407 | hexokinase/glucokinase [EC:2.7.1.1 2.7.1.2] [RN:R01786] | 4 |
(…) | (…) | (…) | (…) | (…) |
These data correspond to the information that can be found on the KEGG website for each metabolic module - for an example, you can see the page for M00001 (or, alternatively, its flat text file version from the KEGG REST API).
The module
column indicates the module ID number while the data_name
column indicates what type of data the row is describing about the module. These data names are usually fairly self-explanatory - for instance, the DEFINITION
rows describe the module definition and the ORTHOLOGY
rows describe the KEGG Orthologs (KOs) belonging to the module - however, for an official explanation, you can check the KEGG help page.
The data_value
and data_definition
columns hold the information corresponding to the row’s data_name
; for ORTHOLOGY
fields these are the KO number and the KO’s functional annotation, respectively. Not all rows have a data_definition
field.
Finally, some rows of data originate from the same line in the original KEGG MODULE text file; these rows will have the same number in the line
column. Perhaps this is a useless field. But it is there.
In the self
table of this database, there is an entry called hash
. This string is a hash of the contents of the database, and it allows us to identify the version of the data within the database. This value is important for ensuring that the same MODULES.db is used both for annotating a contigs database with anvi-run-kegg-kofams and for estimating metabolism on that contigs database with anvi-estimate-metabolism.
You can easily check the hash value by running the following:
anvi-db-info modules-db
It will appear in the DB Info
section of the output, like so:
DB Info (no touch also)
===============================================
num_modules ..................................: 443
total_entries ................................: 13720
creation_date ................................: 1608740335.30248
hash .........................................: 45b7cc2e4fdc
If you have annotated a contigs-db using anvi-run-kegg-kofams, you would find that the corresponding hash in that contigs database matches to this one:
anvi-db-info contigs-db
DB Info (no touch also)
===============================================
[....]
modules_db_hash ..............................: 45b7cc2e4fdc
If you want to extract information directly from this database, you can do it with a bit of SQL :)
Here is one example, which obtains the name of every module in the database:
# learn where the MODULES.db is:
export ANVIO_MODULES_DB=`python -c "import anvio; import os; print(os.path.join(os.path.dirname(anvio.__file__), 'data/misc/KEGG/MODULES.db'))"`
# get module names:
sqlite3 $ANVIO_MODULES_DB "select module,data_value from kegg_modules where data_name='NAME'" | \
tr '|' '\t' > module_names.txt
Edit this file to update this information.