This program deals with populating tables that store HMM hits in an anvi'o contigs database.
🔙 To the main page of anvi’o programs and artifacts.
Stores hmm-hits for a given hmm-source in a contigs-db. In short, this is the program that will do a search for HMMs against a contigs-db and store that information into the contigs-db’s hmm-hits.
This is one of the programs that users commonly run on newly generated contigs-db, along with anvi-scan-trnas, anvi-run-ncbi-cogs, anvi-run-scg-taxonomy, and so on.
In a nutshell, hidden Markov models are statistical models typically generated from known genes which enable ‘searching’ for similar genes in other sequence contexts.
The default anvi’o distribution includes numerous curated HMM profiles for single-copy core genes and ribosomal RNAs, and anvi’o can work with custom HMM profiles provided by the user. In anvi’o lingo, each of these HMM profiles, whether they are built-in or user defined, is called an hmm-source.
To run this program with all default settings (against all default anvi’o hmm-source), you only need to provide a contigs-db:
anvi-run-hmms -c contigs-db
Multithreading will dramatically improve the performance of anvi-run-hmms
. If you have multiple CPUs or cores, you may parallelize your search:
anvi-run-hmms -c contigs-db \ --num-threads 6
You can also run this program on a specific built-in hmm-source:
anvi-run-hmms -c contigs-db \ -I Bacteria_71
Running anvi-run-hmms
with a custom model is easy. All you need to do is to create a directory with necessary files:
anvi-run-hmms -c contigs-db \ -H MY_HMM_PROFILE
See the relevant section in the artifact hmm-source for details.
By default, HMM hits are not considered functional annotations and are kept in a distinct table (the ‘hmm_hits’ table) in the contigs database. However, there are certain cases when you may want them to be considered as functions instead. For instance, if you want to run anvi-estimate-metabolism on a set of user-defined metabolic pathways and you have a set of custom HMMs for their enzymes.
To treat the HMM hits as functional annotations and add them to the ‘gene_functions’ table in your database, you must use the --add-to-functions-table
flag:
anvi-run-hmms -c contigs-db \ -H MY_HMM_PROFILE \ --add-to-functions-table
By default, anvi-run-hmms
will use HMMER’s hmmscan
for amino acid HMM profiles, but you can use hmmsearch
if you are searching a very large number of models against a relatively smaller number of sequences:
anvi-run-hmms -c contigs-db \ --hmmer-program hmmsearch
This flag has no effect when your HMM profile source is for nucleotide sequences (like any of the Ribosomal RNA sources). In those cases anvi’o will use nhmmscan
exclusively.
If you want to see the output from the HMMER program (eg, hmmscan
) used to annotate your data, you can request that it be saved in a directory of your choosing. Please note that this only works when you are running on a single HMM source, as in the example below:
anvi-run-hmms -c contigs-db \ -I Bacteria_71 \ --hmmer-output-dir OUTPUT_DIR
If you do this, file(s) with the prefix hmm
will appear in that directory, with the file extension indicating the format of the output file. For example, the table output format would be called hmm.table
.
These resulting files are not exactly the raw output of HMMER because anvi’o does quite a bit of pre-processing on the raw input and output file(s) while jumping through some hoops to make the HMM searches multi-threaded. If this is causing you a lot of headache, please let us know.
Please also see anvi-script-filter-hmm-hits-table
No matter what, anvi’o will use the regular table output to annotate your contigs database. However, if you are using the --hmmer-output-dir
to store the HMMER output, you can also request a domain table output using the flag --domain-hits-table
.
anvi-run-hmms -c contigs-db \ -I Bacteria_71 \ --hmmer-output-dir OUTPUT_DIR \ --domain-hits-table
In this case anvi’o will run HMMER using the --domtblout
flag to generate this output file.
This flag will only work with HMM profiles made for amino acid sequences. Profiles for nucleotide sequences require the use of the program nhmmscan
, which does not have an option to store domain output.
Please note that this output won’t be used to filter hits to be added to the contigs database. But it will give you the necessary output file to investigate the coverage of HMM hits. But you can use the program anvi-script-filter-hmm-hits-table with this file to remove weak hits from your HMM hits table later.
--also-scan-trnas
to basically run anvi-scan-trnas for you at the same time. It’s very convenient. (But it only works if you are not using the -I
or -H
flags at the same time because reasons.)Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__
tag in this file to see an example.