anvi-search-sequence-motifs [program]

A program to find one or more sequence motifs in contig or gene sequences, and store their frequencies.

🔙 To the main page of anvi’o programs and artifacts.

Authors

A. Murat Eren (Meren)

Can consume

profile-db contigs-db genes-db

Can provide

misc-data-items misc-data-layers

Usage

anvi-search-sequence-motifs will search one or more sequence motifs in applicable anvi’o databases and will report their frequency. If you have more than one motif to search, you can list them as comma-separated sequences

In this context we assume a motif is a 4 to 10 nucleotide-long string, although, anvi’o will not impose any limit to length, and will search any motif it is given along with its reverse-complement across all sequences and report frequencies.

The most primitive output is a TAB-delimited text file, but anvi’o will store frequency information also into your databases like a pro if you use the --store-in-db flag.

The following subsections include some examples.

A contigs database

The minimum amount of stuff you need to run this program is a motif sequence and a contigs-db:

anvi-search-sequence-motifs -c contigs-db \ --motifs ATCG,TAAAT \ --output-file motifs.txt

Running this will yield an output file with as many columns as the number of sequence motifs that show their frequencies across each contig found in the contigs-db. Here is an example:

contig_name ATCG TAAAT
204_10M_contig_1720 101 159
204_10M_contig_6515 64 31
204_10M_contig_878 435 3

Contigs database + profile database

If you provide this program with a profile-db, this time it will count your motif sequences in split sequences rather than contigs,

anvi-search-sequence-motifs -c contigs-db \ -p profile-db --motifs ATCG,TAAAT \ --output-file motifs.txt

And the output will look like this:

split_name ATCG TAAAT
204_10M_contig_1720_split_00001 14 22
204_10M_contig_1720_split_00002 2 6
204_10M_contig_1720_split_00003 14 23
204_10M_contig_1720_split_00004 8 18
204_10M_contig_1720_split_00005 9 17
204_10M_contig_1720_split_00006 19 28
204_10M_contig_1720_split_00007 4 8
204_10M_contig_1720_split_00008 31 32
204_10M_contig_1720_split_00009 0 5
204_10M_contig_6515_split_00001 7 5
204_10M_contig_6515_split_00002 5 2
204_10M_contig_6515_split_00003 5 4
204_10M_contig_6515_split_00004 25 8
204_10M_contig_6515_split_00005 6 2
204_10M_contig_6515_split_00006 8 3
204_10M_contig_6515_split_00007 3 3
204_10M_contig_6515_split_00008 5 3
204_10M_contig_878_split_00001 17 0
204_10M_contig_878_split_00002 14 0
204_10M_contig_878_split_00003 108 1
204_10M_contig_878_split_00004 35 0
204_10M_contig_878_split_00005 7 0
204_10M_contig_878_split_00006 18 0
204_10M_contig_878_split_00007 42 0
204_10M_contig_878_split_00008 12 1
204_10M_contig_878_split_00009 13 0
204_10M_contig_878_split_00010 18 0
204_10M_contig_878_split_00011 28 0
204_10M_contig_878_split_00012 0 1
204_10M_contig_878_split_00013 24 0
204_10M_contig_878_split_00014 11 0
204_10M_contig_878_split_00015 33 0
204_10M_contig_878_split_00016 13 0
204_10M_contig_878_split_00017 2 0
204_10M_contig_878_split_00018 40 0

This output format may enable you to bin your splits based on their motif composition and use anvi-import-collection to import them as a new collection into your profile database, or use anvi-matrix-to-newick to cluster them based on this information to organize splits in the interface based on their motif composition.

You can also store this information into your profile database using the flag --store-in-db. When you do that, running anvi-interactive on this profile database will include additional layers where these frequencies are displayed. Here is an example:

anvi-search-sequence-motifs -c contigs-db \ -p profile-db --motifs ATCG,TAAAT \ --store-in-db

And this is how things will look like in the interface:

motifs

Layers for sequence motif frequencies will be automatically colored to a shade of blue (although the user can change this through the interactive interface and/or through state files).

Contigs database + genes database

Instead of a profile database, this program can also run on an anvi’o genes-db and search sequence motifs for each gene rather than split or contig sequences.

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.