anvi-analyze-synteny [program]

Extract ngrams, as in 'co-occurring genes in synteny', from genomes.

Go back to the main page of anvi’o programs and artifacts.

Can provide

ngrams

Can consume

genomes-storage-db functions pan-db

Usage

Briefly, anvi-analyze-synteny counts ngrams by converting contigs into strings of annotations for a given user-defined source of gene annotation. A source annotation for functions must be provided to create ngrams, upon which anvi’o will use a sliding window of size N to deconstruct the loci of interest into ngrams and count their frequencies.

Run for a given function annotation source

anvi-analyze-synteny -g genomes-storage-db \ --annotation-source functions \ --ngram-window-range 2:3 \ -o ngrams

For instance, if you have run anvi-run-ncbi-cogs on each contigs-db you have used to generate your genomes-storage-db, your --annotation-source can be NCBI_COGS:

anvi-analyze-synteny -g genomes-storage-db \ --annotation-source NCBI_COGS \ --ngram-window-range 2:3 \ -o ngrams

Handling genes with unknown functions

By default, anvi-analyze-synteny will ignore genes with unknown functions based on the annotation source of interest. However, this can be circumvented either by providing a pan-db, so the program would use gene cluster identities as function names:

anvi-analyze-synteny -g genomes-storage-db \ -p pan-db \ --ngram-window-range 2:3 \ -o ngrams

or by explicitly asking the program to consider unknown functions, in which case the program would not discard ngrams that include genes without functions:

anvi-analyze-synteny -g genomes-storage-db \ --annotation-source functions \ --ngram-window-range 2:3 \ -o ngrams \ --analyze-unknown-functions

The disadvantage of the latter strategy is that since all genes with unknown functions will be considered the same, the frequency of ngrams that contain genes with unknown functions may be inflated in your final results.

Run with multiple annotations

If multiple gene annotation sources are provided (i.e., a pangenome for gene clusters identities as well as a functional annotation source), the user must define which annotation source will be used to create the ngrams using the parameter --ngram-source. The resulting ngrams will then be re-annotated with the second annotation source and also reported.

anvi-analyze-synteny -g genomes-storage-db \ -p pan-db \ --annotation-source functions \ --ngram-source gene_clusters \ --ngram-window-range 2:3 \ -o ngrams

Test cases for developers

If you are following the anvi’o master branch on your computer, you can create a test case for this program.

First, go to your source code directory. Then run the following commands:

cd anvio/anvio/tests
./run_all_tests.sh

# set output dir
output_dir=sandbox/test-output

# make a external-genomesfile
echo -e "name\tcontigs_db_path\ng01\t$output_dir/01.db\ng02\t$output_dir/02.db\ng03\t$output_dir/03.db" > $output_dir/external-genomes-file.txt

Run one or more alternative scenarios and check output files:

anvi-analyze-synteny -e $output_dir/external-genomes-file.txt \
                     --annotation-source COG_FUNCTION \
                     --window-range 2:3 \
                     -o $output_dir/synteny_output_no_unknowns.tsv

anvi-analyze-synteny -e $output_dir/external-genomes-file.txt \
                     --annotation-source COG_FUNCTION \
                     --window-range 2:3 \
                     -o $output_dir/synteny_output_with_unknowns.tsv \
                     --analyze-unknown-functions

anvi-analyze-synteny -e $output_dir/external-genomes-cps.txt \
                     --annotation-source COG_FUNCTION \
                     --window-range 2:3 \
                     -o $output_dir/tsv.txt \
                     --analyze-unknown-functions

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.