anvi-summarize

Summarizer for anvi'o pan or profile db's. Essentially, this program takes a collection id along with either a profile database and a contigs database or a pan database and a genomes storage and generates a static HTML output for what is described in a given collection. The output directory will contain almost everything any downstream analysis may need, and can be displayed using a browser without the need for an anvi'o installation. For this reason alone, reporting summary outputs as supplementary data with publications is a great idea for transparency and reproducibility.

🔙 To the main page of anvi’o programs and artifacts.

Authors

Can consume

profile-db contigs-db collection pan-db genomes-storage-db

Can provide

summary

Usage

Anvi-summarize lets you look at a comprehensive overview of your collection and its many statistics that anvi’o has calculated.

It will create a folder called SUMMARY that contains many different summary files, including an HTML output that conviently displays them all for you. This folder will contain anything a future user might use to import your collection, so it’s useful to send to others or transfer an entire anvi’o collection and all of its data.

In a little more detail, this program will

  • generate fasta files containing your original contigs.
  • estimate various stats about each of your bins, including competition, redundacy, and information about all of your hmm-hits
  • generate various tab-delimited matrix files with information about your bins across your samples, including various statistics.

Running anvi-summarize

Running on a profile database

A standard run of anvi-summarize on a profile-db will look something like this:

anvi-summarize -c contigs-db \ -p profile-db \ -o MY_SUMMARY \ -C collection

This will name the output directory MY_SUMMARY instead of the standard SUMMARY.

When running on a profile database, you also have options to

  • output very accurate (but intensely processed) coverage and detection data for each gene (using --init-gene-coverages)
  • edit your contig names so that they contain the name of the bin that the contig is in (using --reformat-contig-names)
  • also display the amino acid sequeunces for your gene calls. (using --report-aa-seqs-for-gene-calls)

Running on a pan database

When running on a pan-db, you’ll want to instead provide the associated genomes storage database.

anvi-summarize -g genomes-storage-db \ -p pan-db \ -C collection

You can also choose to display DNA sequences for your gene clusters instead of amino acid sequences with the flag --report-DNA-sequences

Other notes

If you’re unsure what collections are in your database, you can run this program with the flag --list-collections or by running anvi-show-collections-and-bins.

You can also use the flag --quick-summary to get a less comprehensive summary with a much shorter processing time.

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.