anvi-search-sequence-motifs

Authors
Can consume
Can provide
Usage
A contigs database
Contigs database + profile database
Contigs database + genes database
Additional Resources

A program to find one or more sequence motifs in contig or gene sequences, and store their frequencies.

🔙 To the main page of anvi’o programs and artifacts.

Authors

A. Murat Eren (Meren)

Can consume

profile-db contigs-db genes-db

Can provide

misc-data-items misc-data-layers

Usage

anvi-search-sequence-motifs will search one or more sequence motifs in applicable anvi’o databases and will report their frequency. If you have more than one motif to search, you can list them as comma-separated sequences

In this context we assume a motif is a 4 to 10 nucleotide-long string, although, anvi’o will not impose any limit to length, and will search any motif it is given along with its reverse-complement across all sequences and report frequencies.

The most primitive output is a TAB-delimited text file, but anvi’o will store frequency information also into your databases like a pro if you use the --store-in-db flag.

The following subsections include some examples.

A contigs database

The minimum amount of stuff you need to run this program is a motif sequence and a contigs-db:

anvi-search-sequence-motifs -c contigs-db \ --motifs ATCG,TAAAT \ --output-file motifs.txt

Running this will yield an output file with as many columns as the number of sequence motifs that show their frequencies across each contig found in the contigs-db. Here is an example:

contig_name	ATCG	TAAAT
204_10M_contig_1720	101	159
204_10M_contig_6515	64	31
204_10M_contig_878	435	3

Contigs database + profile database

If you provide this program with a profile-db, this time it will count your motif sequences in split sequences rather than contigs,

anvi-search-sequence-motifs -c contigs-db \ -p profile-db --motifs ATCG,TAAAT \ --output-file motifs.txt

And the output will look like this:

split_name	ATCG	TAAAT
204_10M_contig_1720_split_00001	14	22
204_10M_contig_1720_split_00002	2	6
204_10M_contig_1720_split_00003	14	23
204_10M_contig_1720_split_00004	8	18
204_10M_contig_1720_split_00005	9	17
204_10M_contig_1720_split_00006	19	28
204_10M_contig_1720_split_00007	4	8
204_10M_contig_1720_split_00008	31	32
204_10M_contig_1720_split_00009	0	5
204_10M_contig_6515_split_00001	7	5
204_10M_contig_6515_split_00002	5	2
204_10M_contig_6515_split_00003	5	4
204_10M_contig_6515_split_00004	25	8
204_10M_contig_6515_split_00005	6	2
204_10M_contig_6515_split_00006	8	3
204_10M_contig_6515_split_00007	3	3
204_10M_contig_6515_split_00008	5	3
204_10M_contig_878_split_00001	17	0
204_10M_contig_878_split_00002	14	0
204_10M_contig_878_split_00003	108	1
204_10M_contig_878_split_00004	35	0
204_10M_contig_878_split_00005	7	0
204_10M_contig_878_split_00006	18	0
204_10M_contig_878_split_00007	42	0
204_10M_contig_878_split_00008	12	1
204_10M_contig_878_split_00009	13	0
204_10M_contig_878_split_00010	18	0
204_10M_contig_878_split_00011	28	0
204_10M_contig_878_split_00012	0	1
204_10M_contig_878_split_00013	24	0
204_10M_contig_878_split_00014	11	0
204_10M_contig_878_split_00015	33	0
204_10M_contig_878_split_00016	13	0
204_10M_contig_878_split_00017	2	0
204_10M_contig_878_split_00018	40	0

This output format may enable you to bin your splits based on their motif composition and use anvi-import-collection to import them as a new collection into your profile database, or use anvi-matrix-to-newick to cluster them based on this information to organize splits in the interface based on their motif composition.

You can also store this information into your profile database using the flag --store-in-db. When you do that, running anvi-interactive on this profile database will include additional layers where these frequencies are displayed. Here is an example:

anvi-search-sequence-motifs -c contigs-db \ -p profile-db --motifs ATCG,TAAAT \ --store-in-db

And this is how things will look like in the interface:

anvi-interactive -c contigs-db \ -p profile-db

Layers for sequence motif frequencies will be automatically colored to a shade of blue (although the user can change this through the interactive interface and/or through state files).

Contigs database + genes database

Instead of a profile database, this program can also run on an anvi’o genes-db and search sequence motifs for each gene rather than split or contig sequences.

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.