Extract ngrams, as in 'co-occurring genes in synteny', from genomes.
🔙 To the main page of anvi’o programs and artifacts.
Briefly, anvi-analyze-synteny counts ngrams by converting contigs into strings of annotations for a given user-defined source of gene annotation. A source annotation for functions must be provided to create ngrams, upon which anvi’o will use a sliding window of size
N to deconstruct the loci of interest into ngrams and count their frequencies.
By default, anvi-analyze-synteny will ignore genes with unknown functions based on the annotation source of interest. However, this can be circumvented either by providing a pan-db, so the program would use gene cluster identities as function names:
or by explicitly asking the program to consider unknown functions, in which case the program would not discard ngrams that include genes without functions:
The disadvantage of the latter strategy is that since all genes with unknown functions will be considered the same, the frequency of ngrams that contain genes with unknown functions may be inflated in your final results.
If multiple gene annotation sources are provided (i.e., a pangenome for gene clusters identities as well as a functional annotation source), the user must define which annotation source will be used to create the ngrams using the parameter
--ngram-source. The resulting ngrams will then be re-annotated with the second annotation source and also reported.
If you are following the anvi’o master branch on your computer, you can create a test case for this program.
First, go to any work directory, and run the following commands:
anvi-self-test --suite metagenomics-full \ --output-dir TEST_OUTPUT
Run one or more alternative scenarios and check output files:
anvi-analyze-synteny -g TEST_OUTPUT/TEST-GENOMES.db \ --annotation-source COG20_FUNCTION \ --ngram-window-range 2:3 \ -o TEST_OUTPUT/synteny_output_no_unknowns.tsv anvi-analyze-synteny -g TEST_OUTPUT/TEST-GENOMES.db \ --annotation-source COG20_FUNCTION \ --ngram-window-range 2:3 \ -o TEST_OUTPUT/synteny_output_with_unknowns.tsv \ --analyze-unknown-functions anvi-analyze-synteny -g TEST_OUTPUT/TEST-GENOMES.db \ --annotation-source COG20_FUNCTION \ --ngram-window-range 2:3 \ -o TEST_OUTPUT/tsv.txt \ --analyze-unknown-functions
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the
__resources__ tag in this file to see an example.