anvi-run-hmms

This program deals with populating tables that store HMM hits in an anvi'o contigs database.

🔙 To the main page of anvi’o programs and artifacts.

Authors

Can consume

contigs-db hmm-source

Can provide

hmm-hits

Usage

Stores hmm-hits for a given hmm-source in a contigs-db. In short, this is the program that will do a search for HMMs against a contigs-db and store that information into the contigs-db’s hmm-hits.

This is one of the programs that users commonly run on newly generated contigs-db, along with anvi-scan-trnas, anvi-run-ncbi-cogs, anvi-run-scg-taxonomy, and so on.

HMMs in the context of anvi’o

In a nutshell, hidden Markov models are statistical models typically generated from known genes which enable ‘searching’ for similar genes in other sequence contexts.

The default anvi’o distribution includes numerous curated HMM profiles for single-copy core genes and ribosomal RNAs, and anvi’o can work with custom HMM profiles provided by the user. In anvi’o lingo, each of these HMM profiles, whether they are built-in or user defined, is called an hmm-source.

Default Usage

To run this program with all default settings (against all default anvi’o hmm-source), you only need to provide a contigs-db:

anvi-run-hmms -c contigs-db

Multithreading will dramatically improve the performance of anvi-run-hmms. If you have multiple CPUs or cores, you may parallelize your search:

anvi-run-hmms -c contigs-db \ --num-threads 6

You can also run this program on a specific built-in hmm-source:

anvi-run-hmms -c contigs-db \ -I Bacteria_71

User-defined HMMs

Running anvi-run-hmms with a custom model is easy. All you need to do is to create a directory with necessary files:

anvi-run-hmms -c contigs-db \ -H MY_HMM_PROFILE

See the relevant section in the artifact hmm-source for details.

Adding HMM hits as a functional annotation source

By default, HMM hits are not considered functional annotations and are kept in a distinct table (the ‘hmm_hits’ table) in the contigs database. However, there are certain cases when you may want them to be considered as functions instead. For instance, if you want to run anvi-estimate-metabolism on a set of user-defined metabolic pathways and you have a set of custom HMMs for their enzymes.

To treat the HMM hits as functional annotations and add them to the ‘gene_functions’ table in your database, you must use the --add-to-functions-table flag:

anvi-run-hmms -c contigs-db \ -H MY_HMM_PROFILE \ --add-to-functions-table

Changing the HMMER program

By default, anvi-run-hmms will use HMMER’s hmmscan for amino acid HMM profiles, but you can use hmmsearch if you are searching a very large number of models against a relatively smaller number of sequences:

anvi-run-hmms -c contigs-db \ --hmmer-program hmmsearch

This flag has no effect when your HMM profile source is for nucleotide sequences (like any of the Ribosomal RNA sources). In those cases anvi’o will use nhmmscan exclusively.

Saving the HMMER output

If you want to see the output from the HMMER program (eg, hmmscan) used to annotate your data, you can request that it be saved in a directory of your choosing. Please note that this only works when you are running on a single HMM source, as in the example below:

anvi-run-hmms -c contigs-db \ -I Bacteria_71 \ --hmmer-output-dir OUTPUT_DIR

If you do this, file(s) with the prefix hmm will appear in that directory, with the file extension indicating the format of the output file. For example, the table output format would be called hmm.table.

These resulting files are not exactly the raw output of HMMER because anvi’o does quite a bit of pre-processing on the raw input and output file(s) while jumping through some hoops to make the HMM searches multi-threaded. If this is causing you a lot of headache, please let us know.

Requesting domain table output

Please also see anvi-script-filter-hmm-hits-table

No matter what, anvi’o will use the regular table output to annotate your contigs database. However, if you are using the --hmmer-output-dir to store the HMMER output, you can also request a domain table output using the flag --domain-hits-table.

anvi-run-hmms -c contigs-db \ -I Bacteria_71 \ --hmmer-output-dir OUTPUT_DIR \ --domain-hits-table

In this case anvi’o will run HMMER using the --domtblout flag to generate this output file.

This flag will only work with HMM profiles made for amino acid sequences. Profiles for nucleotide sequences require the use of the program nhmmscan, which does not have an option to store domain output.

Please note that this output won’t be used to filter hits to be added to the contigs database. But it will give you the necessary output file to investigate the coverage of HMM hits. But you can use the program anvi-script-filter-hmm-hits-table with this file to remove weak hits from your HMM hits table later.

Other things anvi-run-hmms can do

  • Add the tag --also-scan-trnas to basically run anvi-scan-trnas for you at the same time. It’s very convenient. (But it only works if you are not using the -I or -H flags at the same time because reasons.)

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.