Summarizer for anvi'o pan, pan-graph, or profile databases. Depending on the input, the program produces a output directory that contaisn flat files for rigorous downstream analyses by humans 🧠or LLMs 🤖..
🔙 To the main page of anvi’o programs and artifacts.
profile-db
contigs-db
collection
pan-db
pan-graph-db
genomes-storage-db ![]()
pan-summary
pan-graph-summary
profile-summary ![]()
Anvi-summarize lets you export a comprehensive overview of your data from an anvi’o database. Depending on the input, it can summarize a collection of binned contigs (from a profile-db), a collection of binned gene clusters (from a pan-db), or the full contents of a pan-graph-db. The output is a directory of flat files and an HTML index that conveniently displays them for you. This makes the program useful for sharing information with collaborators, generating supplementary files for manuscripts, and exporting data for use in downstream analyses.
See also anvi-summarize-blitz.
What this program produces as output depends on its inputs:
anvi-summarize -c contigs-db \ -p profile-db \ -o MY_SUMMARY \ -C collection
When running on a profile database, you also have options to:
--init-gene-coverages)--reformat-contig-names)--report-aa-seqs-for-gene-calls)A collection is optional when summarizing a pan-db. Without one, anvi’o will still export the full gene clusters table with all functional annotations — the bin_name column will simply be empty. If you have organized your gene clusters into bins using anvi-interactive or anvi-import-collection, passing the collection name will populate bin_name with the bin each gene cluster belongs to, which makes downstream filtering by bin straightforward.
Run without a collection (exports everything):
anvi-summarize -g genomes-storage-db \ -p pan-db
Run with a collection (adds bin_name to the output):
anvi-summarize -g genomes-storage-db \ -p pan-db \ -C collection
You can display DNA sequences instead of amino acid sequences with --report-DNA-sequences.
A collection is optional when summarizing a pan-graph-db. Without one, all output files are still produced in full — the bin_name column in SYNGCs.txt and GENESxSYNGCs.txt will simply be empty. If you have organized your SynGCs into bins, passing the collection name will populate bin_name in both files, making it easy to filter the output to any bin of interest with a single column filter.
Run without a collection (exports everything):
anvi-summarize -g genomes-storage-db \ --pan-graph-db pan-graph-db
Run with a collection (adds bin_name to SYNGCs.txt and GENESxSYNGCs.txt):
anvi-summarize -g genomes-storage-db \ --pan-graph-db pan-graph-db \ -C collection
If you are unsure what collections are in your database, you can run this program with the flag --list-collections or by running anvi-show-collections-and-bins.
You can also use the flag --quick-summary to get a less comprehensive summary with a much shorter processing time. For profile-db summaries it skips several heavier computations; for pan-db summaries it omits sequences and annotation text from the gene clusters file; for pan-graph-db summaries it omits sequences from GENESxSYNGCs.txt.
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.