A program to find one or more sequence motifs in contig or gene sequences, and store their frequencies.
🔙 To the main page of anvi’o programs and artifacts.
profile-db contigs-db genes-db
misc-data-items misc-data-layers
anvi-search-sequence-motifs will search one or more sequence motifs in applicable anvi’o databases and will report their frequency. If you have more than one motif to search, you can list them as comma-separated sequences
In this context we assume a motif is a 4 to 10 nucleotide-long string, although, anvi’o will not impose any limit to length, and will search any motif it is given along with its reverse-complement across all sequences and report frequencies.
The most primitive output is a TAB-delimited text file, but anvi’o will store frequency information also into your databases like a pro if you use the --store-in-db
flag.
The following subsections include some examples.
The minimum amount of stuff you need to run this program is a motif sequence and a contigs-db:
anvi-search-sequence-motifs -c contigs-db \ --motifs ATCG,TAAAT \ --output-file motifs.txt
Running this will yield an output file with as many columns as the number of sequence motifs that show their frequencies across each contig found in the contigs-db. Here is an example:
contig_name | ATCG | TAAAT |
---|---|---|
204_10M_contig_1720 | 101 | 159 |
204_10M_contig_6515 | 64 | 31 |
204_10M_contig_878 | 435 | 3 |
If you provide this program with a profile-db, this time it will count your motif sequences in split sequences rather than contigs,
anvi-search-sequence-motifs -c contigs-db \ -p profile-db --motifs ATCG,TAAAT \ --output-file motifs.txt
And the output will look like this:
split_name | ATCG | TAAAT |
---|---|---|
204_10M_contig_1720_split_00001 | 14 | 22 |
204_10M_contig_1720_split_00002 | 2 | 6 |
204_10M_contig_1720_split_00003 | 14 | 23 |
204_10M_contig_1720_split_00004 | 8 | 18 |
204_10M_contig_1720_split_00005 | 9 | 17 |
204_10M_contig_1720_split_00006 | 19 | 28 |
204_10M_contig_1720_split_00007 | 4 | 8 |
204_10M_contig_1720_split_00008 | 31 | 32 |
204_10M_contig_1720_split_00009 | 0 | 5 |
204_10M_contig_6515_split_00001 | 7 | 5 |
204_10M_contig_6515_split_00002 | 5 | 2 |
204_10M_contig_6515_split_00003 | 5 | 4 |
204_10M_contig_6515_split_00004 | 25 | 8 |
204_10M_contig_6515_split_00005 | 6 | 2 |
204_10M_contig_6515_split_00006 | 8 | 3 |
204_10M_contig_6515_split_00007 | 3 | 3 |
204_10M_contig_6515_split_00008 | 5 | 3 |
204_10M_contig_878_split_00001 | 17 | 0 |
204_10M_contig_878_split_00002 | 14 | 0 |
204_10M_contig_878_split_00003 | 108 | 1 |
204_10M_contig_878_split_00004 | 35 | 0 |
204_10M_contig_878_split_00005 | 7 | 0 |
204_10M_contig_878_split_00006 | 18 | 0 |
204_10M_contig_878_split_00007 | 42 | 0 |
204_10M_contig_878_split_00008 | 12 | 1 |
204_10M_contig_878_split_00009 | 13 | 0 |
204_10M_contig_878_split_00010 | 18 | 0 |
204_10M_contig_878_split_00011 | 28 | 0 |
204_10M_contig_878_split_00012 | 0 | 1 |
204_10M_contig_878_split_00013 | 24 | 0 |
204_10M_contig_878_split_00014 | 11 | 0 |
204_10M_contig_878_split_00015 | 33 | 0 |
204_10M_contig_878_split_00016 | 13 | 0 |
204_10M_contig_878_split_00017 | 2 | 0 |
204_10M_contig_878_split_00018 | 40 | 0 |
This output format may enable you to bin your splits based on their motif composition and use anvi-import-collection to import them as a new collection into your profile database, or use anvi-matrix-to-newick to cluster them based on this information to organize splits in the interface based on their motif composition.
You can also store this information into your profile database using the flag --store-in-db
. When you do that, running anvi-interactive on this profile database will include additional layers where these frequencies are displayed. Here is an example:
anvi-search-sequence-motifs -c contigs-db \ -p profile-db --motifs ATCG,TAAAT \ --store-in-db
And this is how things will look like in the interface:
anvi-interactive -c contigs-db \ -p profile-db
Layers for sequence motif frequencies will be automatically colored to a shade of blue (although the user can change this through the interactive interface and/or through state files).
Instead of a profile database, this program can also run on an anvi’o genes-db and search sequence motifs for each gene rather than split or contig sequences.
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__
tag in this file to see an example.