Run InteracDome on a contigs database.
🔙 To the main page of anvi’o programs and artifacts.
binding-frequencies-txt misc-data-amino-acids
This program predicts per-residue binding scores for genes in your contigs-db via the InteracDome database.
The full process is detailed in this blog post. In fact, ideally, all of that information should really be in this very document, but because the blogpost has preceded this document, it hasn’t been translated over yet. So really, you should really be reading that blogpost if you want to get into the nitty gritty details. Otherwise, the quick reference herein should be sufficient.
In summary, this program runs an HMM search of the genes in your contigs-db to all the Pfam gene families that have been annotated with InteracDome binding frequencies. Then, it parses and filters results, associates binding frequencies of HMM match states to the user’s genes of interest, and then stores the resulting per-residue binding frequencies for each gene into the contigs-db as misc-data-amino-acids.
Before running this program, you’ll have to run anvi-setup-interacdome to set up a local copy of InteracDome’s tab-separated files.
A basic run of this program looks like this:
anvi-run-interacdome -c contigs-db -T 4
In addition to storing per-residue binding frequencies as misc-data-amino-acids in your contigs-db, this also outputs additional files prefixed with INTERACDOME
by default (the prefix can be changed with -O
). These are provided as binding-frequencies-txt files named INTERACDOME-match_state_contributors.txt
and INTERACDOME-domain_hits.txt
. See binding-frequencies-txt for details.
InteracDome offers two different binding frequency datasets that can be chosen with --interacdome-dataset
. Choose ‘representable’ to include Pfams that correspond to domain-ligand interactions that had nonredundant instances across three or more distinct PDB structures. InteracDome authors recommend using this collection to learn more about domain binding properties. Choose ‘confident’ to include Pfams that correspond to domain-ligand interactions that had nonredundant instances across three or more distinct PDB entries and achieved a cross-validated precision of at least 0.5. The default is ‘representable’, and you can change it like so:
anvi-run-interacdome -c contigs-db \ --interacdome-dataset confident
This progarm is multi-threaded, so be sure to make use of it:
anvi-run-interacdome -c contigs-db \ --interacdome-dataset confident \ -T 8
Additionally, there are numerous thresholds that you can set:
--min-binding-frequency
to ignore very low frequencies. The InteracDome scale is from 0 (most likely not involved in binding) to 1 (most likely involved in binding). The default cutoff is 0.200000.--min-hit-fraction
to remove poor quality HMM hits. The default value is 0.5, so at least half of a profile HMM’s length must align to your gene, otherwise the hit will be discarded.--information-content-cutoff
to ignore low-qulaity domain hits. The default value is 4, which means every amino acid of your gene must match the consensus amino acid of the match state for each mate state with information content greater than 4. Decreasing this cutoff yields an increasingly stringent filter.Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__
tag in this file to see an example.