Summarizer for anvi'o pan or profile db's. Essentially, this program takes a collection id along with either a profile database and a contigs database or a pan database and a genomes storage and generates a static HTML output for what is described in a given collection. The output directory will contain almost everything any downstream analysis may need, and can be displayed using a browser without the need for an anvi'o installation. For this reason alone, reporting summary outputs as supplementary data with publications is a great idea for transparency and reproducibility.
🔙 To the main page of anvi’o programs and artifacts.
profile-db
contigs-db
collection
pan-db
genomes-storage-db ![]()
Anvi-summarize lets you look at a comprehensive overview of your collection and its many statistics that anvi’o has calculated.
It will create a folder (by default called SUMMARY) that contains many different summary files, including an HTML output that conviently displays them all for you. The exact contents of this folder will depend on whether you run the program on a profile-db (i.e., to summarize a collection of binned contigs, such as metagenome-assembled genomes) or on a pan-db (i.e., to summarize a collection of binned gene clusters, such as when you want to compare accessory vs core genome). Due to the extensive set of output files it produces, this program can be useful for sharing information with collaborators, generating supplementary files for manuscripts, and exporting data for use as input to downstream programs/scripts.
Regardless of input type, this program always produces an index.html file, which you can open in a web browser to view all the summary information in a nicely-formatted interactive webpage.
When run on a profile-db, this program will:
bins_summary.txt) like length, GC content, completion and redundancybin_by_bin), including:
bins_summary.txt, like length, percent completeness, and redundancybins_across_samples), including read-recruitment statistics and number of Ribosomal RNA annotations per bin (the rRNA info is not described across samples, but happens to live with the other matrix files regardless)misc_data_items and misc_data_layers)Confused about the read-recruitment statistics?
In case you want to learn about the definitions of statistics like coverage, detection, abundance, variability, and so on, you should first read Mike Lee’s explanation of these statistics. Our vocabulary page might also be helpful. Then, keep in mind that anvi’o computes these values on a per-contig (and per-split) basis. When you run anvi-summarize, the program will summarize this information for a given bin by taking the average of a statistic’s value across all splits in the bin, weighting that average by split length.
When run on a pan-db, this program will:
[NAME]_gene_clusters_summary.txt) describing every gene in every gene cluster of your pangenome (even those not in the specified collection), including:
misc_data_items and misc_data_layers)A standard run of anvi-summarize on a profile-db will look something like this:
anvi-summarize -c contigs-db \ -p profile-db \ -o MY_SUMMARY \ -C collection
This will name the output directory MY_SUMMARY instead of the standard SUMMARY.
When running on a profile database, you also have options to
--init-gene-coverages)--reformat-contig-names)--report-aa-seqs-for-gene-calls)When running on a pan-db, you’ll want to instead provide the associated genomes storage database.
anvi-summarize -g genomes-storage-db \ -p pan-db \ -C collection
You can also choose to display DNA sequences for your gene clusters instead of amino acid sequences with the flag --report-DNA-sequences
If you’re unsure what collections are in your database, you can run this program with the flag --list-collections or by running anvi-show-collections-and-bins. Don’t have a collection at all? If you want to summarize everything in the database, you can generate a default collection of everything by running anvi-script-add-default-collection.
You can also use the flag --quick-summary to get a less comprehensive summary with a much shorter processing time.
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.