Help pages for anvi'o programs and artifacts

Here you will find a list of all anvi’o programs and artifacts that enable constructing workflows for integrated multi β€˜omics investigations.

If you need an introduction to the terminology used in β€˜omics research or in anvi’o, please take a look at our vocabulary page. The anvi’o community is with you! If you have practical, technical, or science questions this page to learn about resources available to you. If you are feeling overwhelmed, you can always scream towards the anvi’o

The help contents were last updated on 30 Sep 24 11:36:48 for anvi’o version 8-dev (marie).

The latest version of anvi’o is v8. See the release notes.

Anvi’o workflows

Anvi’o workflows are dynamic recipes for easy-to-use, scalable, and reproducible bioinformatics analyses through orchestrated use of anvi’o programs as well as third-party software. These workflows typically start with raw data files and a workflow-config and produce anvi’o artifacts, which enable you to outsource rudimentary and relatively well-understood initial steps of your β€˜omics analyses so you can focus on more critical downstream research questions by further analyzing these data products inside or outside of the anvi’o software ecosystem.

The anvi’o 8-dev (marie) contains 5 workflows:

Anvi’o artifacts

Anvi’o artifacts represent concepts, file types, or data types anvi’o programs can work with. A given anvi’o artifact can be provided by the user (such as a FASTA file), produced by anvi’o (such as a profile database), or both (such as phylogenomic trees). Anvi’o artifacts link anvi’o programs to each other to build novel workflows.

Listed below a total of 137 artifacts.

pan-db contigs-db trnaseq-db trnaseq-contigs-db trnaseq-profile-db modules-db structure-db pdb-db kegg-data user-modules-data reaction-ref-data single-profile-db profile-db genes-db genomes-storage-db
fasta contigs-fasta trnaseq-fasta concatenated-gene-alignment-fasta short-reads-fasta genes-fasta locus-fasta
dna-sequence
configuration-ini external-gene-calls external-structures bam-stats-txt bams-and-profiles-txt markdown-txt protein-structure-txt samples-txt primers-txt fasta-txt collection-txt pfam-accession hmm-file misc-data-items-txt misc-data-layers-txt misc-data-nucleotides-txt misc-data-amino-acids-txt misc-data-layer-orders-txt misc-data-items-order-txt linkmers-txt palindromes-txt inversions-txt gene-calls-txt binding-frequencies-txt functions-txt functional-enrichment-txt functions-across-genomes-txt hmm-hits-across-genomes-txt view-data layer-taxonomy-txt gene-taxonomy-txt genome-taxonomy-txt external-genomes internal-genomes metagenomes hmm-list coverages-txt detection-txt variability-profile-txt codon-frequencies-txt aa-frequencies-txt fixation-index-matrix trnaseq-seed-txt seeds-specific-txt seeds-non-specific-txt modifications-txt quick-summary kegg-metabolism user-metabolism augustus-gene-calls vcf blast-table splits-txt genbank-file groups-txt splits-taxonomy-txt clustering-configuration gene-clusters-txt enzymes-txt enzymes-list-for-module contig-rename-report-txt
paired-end-fastq
bam-file raw-bam-file
contigs-stats
svg
bin
collection
hmm-source
hmm-hits completion misc-data-items misc-data-layers misc-data-nucleotides misc-data-amino-acids genome-similarity misc-data-layer-orders misc-data-items-order metapangenome oligotypes functions kegg-functions reaction-network layer-taxonomy gene-taxonomy genome-taxonomy scgs-taxonomy-db scgs-taxonomy trna-taxonomy-db trna-taxonomy variability-profile split-bins state ngrams pn-ps-data gene-clusters metabolic-independence-score
cogs-data pfams-data cazyme-data interacdome-data
dendrogram phylogeny
reaction-network-json state-json workflow-config
kegg-pathway-map interactive trnaseq-plot contig-inspection gene-cluster-inspection
variability-profile-xml
summary
workflow

Anvi’o programs

Anvi’o programs perform atomic tasks that can be weaved together to implement complete β€˜omics workflows. Please note that there may be programs that are not listed on this page. You can type β€˜anvi-β€˜ in your terminal, and press the TAB key twice to see the full list of programs available to you on your system, and type anvi-program-name --help to read the full list of command line options.

Listed below a total of 155 programs.

πŸ”₯ anvi-analyze-synteny. Extract ngrams, as in 'co-occurring genes in synteny', from genomes.
πŸ§€ genomes-storage-db functions pan-db
πŸ• ngrams
🧠
πŸ”₯ anvi-cluster-contigs. A program to cluster items in a merged anvi'o profile using automatic binning algorithms.
πŸ§€ profile-db contigs-db collection
πŸ• collection bin
🧠
πŸ”₯ anvi-compute-completeness. A script to generate completeness info for a given list of splits.
πŸ§€ contigs-db splits-txt hmm-source
πŸ• n/a
🧠
πŸ”₯ anvi-compute-functional-enrichment-across-genomes. A program that computes functional enrichment across groups of genomes..
πŸ§€ groups-txt genomes-storage-db external-genomes internal-genomes functions
πŸ• functional-enrichment-txt
🧠
πŸ”₯ anvi-compute-functional-enrichment-in-pan. A program that computes functional enrichment within a pangenome..
πŸ§€ misc-data-layers pan-db genomes-storage-db functions
πŸ• functional-enrichment-txt
🧠
πŸ”₯ anvi-compute-gene-cluster-homogeneity. Compute homogeneity for gene clusters.
πŸ§€ pan-db genomes-storage-db
πŸ• n/a
🧠
πŸ”₯ anvi-compute-genome-similarity. Export sequences from sequence sources and compute a similarity metric (e.g. ANI). If a Pan Database is given anvi'o will write computed output to misc data tables of Pan Database.
πŸ§€ external-genomes internal-genomes pan-db
πŸ• genome-similarity
🧠
πŸ”₯ anvi-compute-metabolic-enrichment. A program that computes metabolic enrichment across groups of genomes and metagenomes.
πŸ§€ kegg-metabolism user-metabolism groups-txt external-genomes internal-genomes functions
πŸ• functional-enrichment-txt
🧠
πŸ”₯ anvi-db-info. Access self tables, display values, or set new ones totally on your own risk.
πŸ§€ pan-db profile-db contigs-db genomes-storage-db structure-db genes-db
πŸ• n/a
🧠
πŸ”₯ anvi-delete-collection. Remove a collection from a given profile database.
πŸ§€ profile-db collection
πŸ• n/a
🧠
πŸ”₯ anvi-delete-functions. Remove functional annotation sources from an anvi'o contigs database.
πŸ§€ contigs-db functions
πŸ• n/a
🧠
πŸ”₯ anvi-delete-hmms. Remove HMM hits from an anvi'o contigs database.
πŸ§€ contigs-db hmm-source hmm-hits
πŸ• n/a
🧠
πŸ”₯ anvi-delete-misc-data. Remove stuff from 'additional data' or 'order' tables for either items or layers in either pan or profile databases. OR, remove stuff from the 'additional data' tables for nucleotides or amino acids in contigs databases.
πŸ§€ pan-db profile-db misc-data-items misc-data-layers misc-data-layer-orders misc-data-nucleotides misc-data-amino-acids
πŸ• n/a
🧠
πŸ”₯ anvi-delete-state. Delete an anvi'o state from a pan or profile database.
πŸ§€ pan-db profile-db state
πŸ• n/a
🧠
πŸ”₯ anvi-dereplicate-genomes. Identify redundant (highly similar) genomes.
πŸ§€ external-genomes internal-genomes fasta genome-similarity
πŸ• fasta
🧠
πŸ”₯ anvi-display-contigs-stats. Start the anvi'o interactive interface for viewing or comparing contigs statistics..
πŸ§€ contigs-db
πŸ• contigs-stats interactive svg
🧠
πŸ”₯ anvi-display-functions. Start an anvi'o interactive display to see functions across genomes.
πŸ§€ functions genomes-storage-db internal-genomes external-genomes groups-txt
πŸ• interactive functional-enrichment-txt
🧠
πŸ”₯ anvi-display-metabolism. Start the anvi'o interactive interactive for viewing KEGG metabolism data.
πŸ§€ contigs-db kegg-data kegg-functions profile-db collection bin
πŸ• interactive
🧠
πŸ”₯ anvi-display-pan. Start an anvi'o server to display a pan-genome.
πŸ§€ pan-db genomes-storage-db
πŸ• collection bin interactive svg gene-cluster-inspection
🧠
πŸ”₯ anvi-display-structure. Interactively visualize sequence variants on protein structures.
πŸ§€ structure-db variability-profile-txt contigs-db profile-db splits-txt
πŸ• interactive
🧠
πŸ”₯ anvi-draw-kegg-pathways. Write KEGG pathway map files incorporating data sourced from anvi'o databases..
πŸ§€ contigs-db external-genomes pan-db genomes-storage-db kegg-data
πŸ• kegg-pathway-map
🧠
πŸ”₯ anvi-estimate-genome-completeness. Estimate completion and redundancy using domain-specific single-copy core genes.
πŸ§€ contigs-db profile-db external-genomes collection
πŸ• completion
🧠
πŸ”₯ anvi-estimate-metabolism. Reconstructs metabolic pathways and estimates pathway completeness for a given set of contigs.
πŸ§€ contigs-db kegg-data kegg-functions profile-db collection bin external-genomes internal-genomes metagenomes user-modules-data enzymes-txt pan-db genomes-storage-db
πŸ• kegg-metabolism user-metabolism
🧠
πŸ”₯ anvi-estimate-scg-taxonomy. Estimates taxonomy at genome and metagenome level. This program is the entry point to estimate taxonomy for a given set of contigs (i.e., all contigs in a contigs database, or contigs described in collections as bins). For this, it uses single-copy core gene sequences and the GTDB database.
πŸ§€ profile-db contigs-db scgs-taxonomy collection bin metagenomes
πŸ• genome-taxonomy genome-taxonomy-txt
🧠
πŸ”₯ anvi-estimate-trna-taxonomy. Estimates taxonomy at genome and metagenome level using tRNA sequences..
πŸ§€ profile-db contigs-db trna-taxonomy collection bin metagenomes dna-sequence
πŸ• genome-taxonomy genome-taxonomy-txt
🧠
πŸ”₯ anvi-experimental-organization. Create an experimental clustering dendrogram..
πŸ§€ clustering-configuration
πŸ• dendrogram
🧠
πŸ”₯ anvi-export-collection. Export a collection from an anvi'o database.
πŸ§€ profile-db collection
πŸ• collection-txt
🧠
πŸ”₯ anvi-export-contigs. Export contigs (or splits) from an anvi'o contigs database.
πŸ§€ contigs-db
πŸ• contigs-fasta
🧠
πŸ”₯ anvi-export-functions. Export functions of genes from an anvi'o contigs database for a given annotation source.
πŸ§€ contigs-db functions
πŸ• functions-txt
🧠
πŸ”₯ anvi-export-gene-calls. Export gene calls from an anvi'o contigs database.
πŸ§€ contigs-db
πŸ• gene-calls-txt
🧠
πŸ”₯ anvi-export-gene-clusters. Export gene clusters in a pan-db as a three-column, TAB-delimited file that associates each gene call in each genome with a gene cluster.
πŸ§€ pan-db
πŸ• gene-clusters-txt
🧠
πŸ”₯ anvi-export-gene-coverage-and-detection. Export gene coverage and detection data for all genes associated with contigs described in a profile database.
πŸ§€ profile-db contigs-db
πŸ• coverages-txt detection-txt
🧠
πŸ”₯ anvi-export-items-order. Export an item order from an anvi'o database.
πŸ§€ pan-db profile-db
πŸ• misc-data-items-order-txt dendrogram phylogeny
🧠
πŸ”₯ anvi-export-locus. This program helps you cut a 'locus' from a larger genetic context (e.g., contigs, genomes). By default, anvi'o will locate a user-defined anchor gene, extend its selection upstream and downstream based on the –num-genes argument, then extract the locus to create a new contigs database. The anchor gene must be provided as –search-term, –gene-caller-ids, or –hmm-sources. If –flank-mode is designated, you MUST provide TWO flanking genes that define the locus region (Please see –flank-mode help for more information). If everything goes as plan, anvi'o will give you individual locus contigs databases for every matching anchor gene found in the original contigs database provided. Enjoy your mini contigs databases!.
πŸ§€ contigs-db
πŸ• locus-fasta
🧠
πŸ”₯ anvi-export-misc-data. Export additional data or order tables in pan or profile databases for items or layers.
πŸ§€ pan-db profile-db contigs-db misc-data-items misc-data-layers misc-data-layer-orders misc-data-nucleotides misc-data-amino-acids
πŸ• misc-data-items-txt misc-data-layers-txt misc-data-layer-orders-txt misc-data-nucleotides-txt misc-data-amino-acids-txt
🧠
πŸ”₯ anvi-export-splits-and-coverages. Export split or contig sequences and coverages across samples stored in an anvi'o profile database. This program is especially useful if you would like to 'bin' your splits or contigs outside of anvi'o and import the binning results into anvi'o using anvi-import-collection program.
πŸ§€ profile-db contigs-db
πŸ• contigs-fasta coverages-txt
🧠
πŸ”₯ anvi-export-splits-taxonomy. Export taxonomy for splits found in an anvi'o contigs database.
πŸ§€ contigs-db
πŸ• splits-taxonomy-txt
🧠
πŸ”₯ anvi-export-state. Export an anvi'o state into a profile database.
πŸ§€ pan-db profile-db state
πŸ• state-json
🧠
πŸ”₯ anvi-export-structures. Export .pdb structure files from a structure database.
πŸ§€ structure-db
πŸ• protein-structure-txt
🧠
πŸ”₯ anvi-gen-contigs-database. Generate a new anvi'o contigs database.
πŸ§€ contigs-fasta external-gene-calls
πŸ• contigs-db
🧠
πŸ”₯ anvi-gen-fixation-index-matrix. Generate a pairwise matrix of a fixation indices between samples.
πŸ§€ contigs-db profile-db structure-db bin variability-profile-txt splits-txt
πŸ• fixation-index-matrix
🧠
πŸ”₯ anvi-gen-gene-consensus-sequences. Collapse variability for a set of genes across samples.
πŸ§€ profile-db contigs-db
πŸ• genes-fasta
🧠
πŸ”₯ anvi-gen-gene-level-stats-databases. A program to compute genes databases for a ginen set of bins stored in an anvi'o collection. Genes databases store gene-level coverage and detection statistics, and they are usually computed and generated automatically when they are required (such as running anvi-interactive with --gene-mode flag). This program allows you to pre-compute them if you don't want them to be done all at once.
πŸ§€ profile-db contigs-db collection bin
πŸ• genes-db
🧠
πŸ”₯ anvi-gen-genomes-storage. Create a genome storage from internal and/or external genomes for a pangenome analysis.
πŸ§€ external-genomes internal-genomes
πŸ• genomes-storage-db
🧠
πŸ”₯ anvi-gen-phylogenomic-tree. Generate phylogenomic tree from aligment file.
πŸ§€ concatenated-gene-alignment-fasta
πŸ• phylogeny
🧠
πŸ”₯ anvi-gen-structure-database. Creates a database of protein structures. Predict protein structures using template-based homology modelling of genes in your contigs database, or import pre-computed PDB structures you already have..
πŸ§€ contigs-db pdb-db
πŸ• structure-db
🧠
πŸ”₯ anvi-gen-variability-network. Generate a network description from an anvi'o variability profile..
πŸ§€ variability-profile-txt
πŸ• variability-profile-xml
🧠
πŸ”₯ anvi-gen-variability-profile. Generate a table that comprehensively summarizes the variability of nucleotide, codon, or amino acid positions. We call these single nucleotide variants (SNVs), single codon variants (SCVs), and single amino acid variants (SAAVs), respectively.
πŸ§€ contigs-db profile-db structure-db bin variability-profile splits-txt
πŸ• variability-profile-txt
🧠
πŸ”₯ anvi-get-aa-counts. Fetches the number of times each amino acid occurs from a contigs database in a given bin, set of contigs, or set of genes.
πŸ§€ splits-txt contigs-db profile-db collection
πŸ• aa-frequencies-txt
🧠
πŸ”₯ anvi-get-codon-frequencies. Get codon or amino acid frequency statistics from genomes, genes, and functions..
πŸ§€ contigs-db profile-db collection bin internal-genomes external-genomes
πŸ• codon-frequencies-txt aa-frequencies-txt
🧠
πŸ”₯ anvi-get-codon-usage-bias. Get codon usage bias (CUB) statistics of genes and functions..
πŸ§€ contigs-db profile-db collection bin internal-genomes external-genomes
πŸ• n/a
🧠
πŸ”₯ anvi-get-metabolic-model-file. This program exports a metabolic reaction network to a file suitable for flux balance analysis..
πŸ§€ contigs-db reaction-network
πŸ• reaction-network-json
🧠
πŸ”₯ anvi-get-pn-ps-ratio. Calculate the rates of non-synonymous and synonymous polymorphism for genes across environmetns using the output of anvi-gen-variability-profile..
πŸ§€ contigs-db variability-profile-txt
πŸ• pn-ps-data
🧠
πŸ”₯ anvi-get-sequences-for-gene-calls. A script to get back sequences for gene calls.
πŸ§€ contigs-db genomes-storage-db
πŸ• genes-fasta external-gene-calls
🧠
πŸ”₯ anvi-get-sequences-for-gene-clusters. Do cool stuff with gene clusters in anvi'o pan genomes.
πŸ§€ pan-db genomes-storage-db
πŸ• genes-fasta concatenated-gene-alignment-fasta misc-data-items
🧠
πŸ”₯ anvi-get-sequences-for-hmm-hits. Get sequences for HMM hits from many inputs.
πŸ§€ contigs-db profile-db external-genomes internal-genomes hmm-source hmm-hits
πŸ• genes-fasta concatenated-gene-alignment-fasta
🧠
πŸ”₯ anvi-get-short-reads-from-bam. Get short reads back from a BAM file with options for compression, splitting of forward and reverse reads, etc.
πŸ§€ profile-db contigs-db bin bam-file
πŸ• short-reads-fasta
🧠
πŸ”₯ anvi-get-short-reads-mapping-to-a-gene. Recover short reads from BAM files that were mapped to genes you are interested in. It is possible to work with a single gene call, or a bunch of them. Similarly, you can get short reads from a single BAM file, or from many of them.
πŸ§€ contigs-db bam-file
πŸ• short-reads-fasta
🧠
πŸ”₯ anvi-get-split-coverages. Export splits and the coverage table from database.
πŸ§€ profile-db contigs-db collection bin
πŸ• coverages-txt
🧠
πŸ”₯ anvi-get-tlen-dist-from-bam. Report the distribution of template lengths from a BAM file. The purpose of this is to get an idea about the insert size distribution in a BAM file rapidly by summarizing distances between each paired-end read in a given read recruitment experiment..
πŸ§€ bam-file
πŸ• n/a
🧠
πŸ”₯ anvi-import-collection. Import an external binning result into anvi'o.
πŸ§€ contigs-db profile-db pan-db collection-txt
πŸ• collection
🧠
πŸ”₯ anvi-import-functions. Parse and store functional annotation of genes.
πŸ§€ contigs-db functions-txt
πŸ• functions
🧠
πŸ”₯ anvi-import-items-order. Import a new items order into an anvi'o database.
πŸ§€ pan-db profile-db misc-data-items-order-txt dendrogram phylogeny
πŸ• misc-data-items-order
🧠
πŸ”₯ anvi-import-metabolite-profile. This program imports metabolite abundance data and stores it in a profile database..
πŸ§€ profile-db
πŸ• n/a
🧠
πŸ”₯ anvi-import-misc-data. Populate additional data or order tables in pan or profile databases for items and layers, OR additional data in contigs databases for nucleotides and amino acids (the Swiss army knife-level serious stuff).
πŸ§€ pan-db profile-db contigs-db misc-data-items-txt dendrogram phylogeny misc-data-layers-txt misc-data-layer-orders-txt misc-data-nucleotides-txt misc-data-amino-acids-txt
πŸ• misc-data-items misc-data-layers misc-data-layer-orders misc-data-nucleotides misc-data-amino-acids
🧠
πŸ”₯ anvi-import-protein-profile. This program imports protein abundance data into a profile database..
πŸ§€ profile-db
πŸ• n/a
🧠
πŸ”₯ anvi-import-state. Import an anvi'o state into a profile database.
πŸ§€ pan-db profile-db state-json
πŸ• state
🧠
πŸ”₯ anvi-import-taxonomy-for-genes. Import gene-level taxonomy into an anvi'o contigs database.
πŸ§€ contigs-db gene-taxonomy-txt
πŸ• gene-taxonomy
🧠
πŸ”₯ anvi-import-taxonomy-for-layers. Import layers-level taxonomy into an anvi'o additional layer data table in an anvi'o single-profile database.
πŸ§€ single-profile-db layer-taxonomy-txt
πŸ• layer-taxonomy
🧠
πŸ”₯ anvi-init-bam. Sort/Index BAM files.
πŸ§€ raw-bam-file
πŸ• bam-file
🧠
πŸ”₯ anvi-inspect. Start an anvi'o inspect interactive interface.
πŸ§€ profile-db contigs-db bin
πŸ• interactive contig-inspection
🧠
πŸ”₯ anvi-interactive. Start an anvi'o server for the interactive interface.
πŸ§€ profile-db single-profile-db contigs-db genes-db bin view-data dendrogram phylogeny
πŸ• collection bin interactive svg contig-inspection
🧠
πŸ”₯ anvi-matrix-to-newick. Takes a distance matrix, returns a newick tree.
πŸ§€ view-data
πŸ• dendrogram
🧠
πŸ”₯ anvi-merge. Merge multiple anvio profiles.
πŸ§€ single-profile-db contigs-db
πŸ• profile-db misc-data-items-order
🧠
πŸ”₯ anvi-merge-bins. Merge a given set of bins in an anvi'o collection.
πŸ§€ pan-db profile-db collection bin
πŸ• n/a
🧠
πŸ”₯ anvi-merge-trnaseq. This program processes one or more anvi'o tRNA-seq databases produced by anvi-trnaseq and outputs anvi'o contigs and merged profile databases accessible to other tools in the anvi'o ecosystem. Final tRNA "seed sequences" are determined from a set of samples. Each sample yields a set of tRNA predictions stored in a tRNA-seq database, and these tRNAs may be shared among the samples. tRNA may be 3' fragments and thereby subsequences of longer tRNAs from other samples which would become seeds. The profile database produced by this program records the coverages of seeds in each sample. This program finalizes predicted nucleotide modification sites using tunable substitution rate parameters..
πŸ§€ trnaseq-db
πŸ• trnaseq-contigs-db trnaseq-profile-db
🧠
πŸ”₯ anvi-meta-pan-genome. Convert a pangenome into a metapangenome.
πŸ§€ internal-genomes pan-db genomes-storage-db
πŸ• metapangenome
🧠
πŸ”₯ anvi-migrate. Migrates any anvi'o artifact, whether it is a database or a config file, to a newer version. Pure magic..
πŸ§€ contigs-db profile-db pan-db genes-db genomes-storage-db structure-db modules-db workflow-config
πŸ• n/a
🧠
πŸ”₯ anvi-oligotype-linkmers. Takes an anvi'o linkmers report, generates an oligotyping output.
πŸ§€ linkmers-txt
πŸ• oligotypes
🧠
πŸ”₯ anvi-pan-genome. An anvi'o program to compute a pangenome from an anvi'o genome storage.
πŸ§€ genomes-storage-db gene-clusters-txt
πŸ• pan-db misc-data-items-order gene-clusters
🧠
πŸ”₯ anvi-plot-trnaseq. A program to write plots of coverage and modification data from flexible groups of tRNA-seq seeds.
πŸ§€ trnaseq-contigs-db trnaseq-seed-txt modifications-txt
πŸ• trnaseq-plot
🧠
πŸ”₯ anvi-profile. The flagship anvi'o program to profile a BAM file. Running this program on a BAM file will quantify coverages per nucleotide position in read recruitment results and will average coverage and detection data per contig. It will also calculate single-nucleotide, single-codon, and single-amino acid variants, as well as structural variants, such as insertion and deletions, to eventually stores all data into a single anvi'o profile database. For very large projects, this program can demand a lot of time, memory, and storage resources. If all you want is to learn coverages of your nutleotides, genes, contigs, or your bins collections from BAM files very rapidly, and/or you do not need anvi'o single profile databases for your project, please see other anvi'o programs that profile BAM files, anvi-script-get-coverage-from-bam and anvi-profile-blitz.
πŸ§€ bam-file contigs-db
πŸ• single-profile-db misc-data-items-order variability-profile
🧠
πŸ”₯ anvi-profile-blitz. FAST profiling of BAM files to get contig- or gene-level coverage and detection stats. Unlike anvi-profile, which is another anvi'o program that can profile BAM files, this program is designed to be very quick and only report long-format files for various read recruitment statistics per item. Plase also see the program anvi-script-get-coverage-from-bam for recovery of data from BAM files without an anvi'o contigs database.
πŸ§€ bam-file contigs-db
πŸ• bam-stats-txt
🧠
πŸ”₯ anvi-reaction-network. This program generates a metabolic reaction network in an anvi'o contigs or pan database..
πŸ§€ contigs-db kegg-functions reaction-ref-data kegg-data
πŸ• reaction-network
🧠
πŸ”₯ anvi-refine. Start an anvi'o interactive interactive to manually curate or refine a genome, whether it is a metagenome-assembled, single-cell, or an isolate genome.
πŸ§€ profile-db contigs-db bin
πŸ• bin
🧠
πŸ”₯ anvi-rename-bins. Rename all bins in a given collection (so they have pretty names).
πŸ§€ collection bin profile-db contigs-db
πŸ• collection bin
🧠
πŸ”₯ anvi-report-inversions. Reports inversions.
πŸ§€ bams-and-profiles-txt
πŸ• inversions-txt
🧠
πŸ”₯ anvi-report-linkmers. Reports sequences stored in one or more BAM files that cover one of more specific nucleotide positions in a reference.
πŸ§€ bam-file
πŸ• linkmers-txt
🧠
πŸ”₯ anvi-run-cazymes. Run dbCAN CAZymes on contigs-db.
πŸ§€ contigs-db cazyme-data
πŸ• functions
🧠
πŸ”₯ anvi-run-hmms. This program deals with populating tables that store HMM hits in an anvi'o contigs database.
πŸ§€ contigs-db hmm-source
πŸ• hmm-hits
🧠
πŸ”₯ anvi-run-interacdome. Run InteracDome on a contigs database.
πŸ§€ contigs-db interacdome-data
πŸ• binding-frequencies-txt misc-data-amino-acids
🧠
πŸ”₯ anvi-run-kegg-kofams. Run KOfam HMMs on an anvi'o contigs database.
πŸ§€ contigs-db kegg-data
πŸ• kegg-functions functions
🧠
πŸ”₯ anvi-run-ncbi-cogs. This program runs NCBI's COGs to associate genes in an anvi'o contigs database with functions. This program can also run NCBI's COGs to annotate an amino acid sequence with function. COGs database was been designed as an attempt to classify proteins from completely sequenced genomes on the basis of the orthology concept..
πŸ§€ cogs-data contigs-db fasta
πŸ• functions functions-txt
🧠
πŸ”₯ anvi-run-pfams. Run Pfam on Contigs Database.
πŸ§€ contigs-db pfams-data
πŸ• functions
🧠
πŸ”₯ anvi-run-scg-taxonomy. The purpose of this program is to affiliate single-copy core genes in an anvi'o contigs database with taxonomic names. A properly setup local SCG taxonomy database is required for this program to perform properly. After its successful run, anvi-estimate-scg-taxonomy will be useful to estimate taxonomy at genome-, collection-, or metagenome-level).
πŸ§€ contigs-db scgs-taxonomy-db hmm-hits
πŸ• scgs-taxonomy
🧠
πŸ”₯ anvi-run-trna-taxonomy. The purpose of this program is to affiliate tRNA gene sequences in an anvi'o contigs database with taxonomic names. A properly setup local tRNA taxonomy database is required for this program to perform properly. After its successful run, anvi-estimate-trna-taxonomy will be useful to estimate taxonomy at genome-, collection-, or metagenome-level)..
πŸ§€ contigs-db trna-taxonomy-db
πŸ• trna-taxonomy
🧠
πŸ”₯ anvi-run-workflow. Execute, manage, parallelize, and troubleshoot entire 'omics workflows and chain together anvi'o and third party programs.
πŸ§€ workflow-config
πŸ• workflow
🧠
πŸ”₯ anvi-scan-trnas. Identify and store tRNA genes in a contigs database.
πŸ§€ contigs-db
πŸ• hmm-hits
🧠
πŸ”₯ anvi-search-functions. Search functions in an anvi'o contigs database or genomes storage. Basically, this program searches for one or more search terms you define in functional annotations of genes in an anvi'o contigs database, and generates multiple reports. The default report simply tells you which contigs contain genes with functions matching to serach terms you used, useful for viewing in the interface. You can also request a much more comprehensive report, which gives you anything you might need to know for each hit and serach term.
πŸ§€ contigs-db genomes-storage-db
πŸ• functions-txt
🧠
πŸ”₯ anvi-search-palindromes. A program to find palindromes in sequences.
πŸ§€ dna-sequence fasta contigs-db
πŸ• palindromes-txt
🧠
πŸ”₯ anvi-search-primers. You provide this program with FASTQ files for one or more samples AND one or more primer sequences, and it collects reads from FASTQ files that matches to your primers. This tool can be most powerful if you want to collect all short reads from one or more metagenomes that are downstream to a known sequence. Using the comprehensive output files you can analyze the diversity of seuqences visually, manually, or using established strategies such as oligotyping..
πŸ§€ samples-txt primers-txt
πŸ• short-reads-fasta
🧠
πŸ”₯ anvi-search-sequence-motifs. A program to find one or more sequence motifs in contig or gene sequences, and store their frequencies.
πŸ§€ profile-db contigs-db genes-db
πŸ• misc-data-items misc-data-layers
🧠
πŸ”₯ anvi-self-test. A program for anvi'o to test itself.
πŸ§€ n/a
πŸ• n/a
🧠
πŸ”₯ anvi-setup-cazymes. Download and setup Pfam data from the EBI.
πŸ§€ n/a
πŸ• cazyme-data
🧠
πŸ”₯ anvi-setup-interacdome. Setup InteracDome data.
πŸ§€ n/a
πŸ• interacdome-data
🧠
πŸ”₯ anvi-setup-kegg-data. Download and setup various databases from KEGG.
πŸ§€ n/a
πŸ• kegg-data modules-db
🧠
πŸ”₯ anvi-setup-modelseed-database. This program downloads and sets up the ModelSEED Biochemistry database..
πŸ§€ functions
πŸ• reaction-ref-data
🧠
πŸ”₯ anvi-setup-ncbi-cogs. Download and setup NCBI's Clusters of Orthologous Groups database.
πŸ§€ n/a
πŸ• cogs-data
🧠
πŸ”₯ anvi-setup-pdb-database. Setup or update an offline database of representative PDB structures clustered at 95%.
πŸ§€ n/a
πŸ• pdb-db
🧠
πŸ”₯ anvi-setup-pfams. Download and setup Pfam data from the EBI.
πŸ§€ n/a
πŸ• pfams-data
🧠
πŸ”₯ anvi-setup-scg-taxonomy. The purpose of this program is to download necessary information from GTDB (https://gtdb.ecogenomic.org/), and set it up in such a way that your anvi'o installation is able to assign taxonomy to single-copy core genes using anvi-run-scg-taxonomy and estimate taxonomy for genomes or metagenomes using anvi-estimate-scg-taxonomy).
πŸ§€ n/a
πŸ• scgs-taxonomy-db
🧠
πŸ”₯ anvi-setup-trna-taxonomy. The purpose of this program is to setup necessary databases for tRNA genes collected from GTDB (https://gtdb.ecogenomic.org/), genomes in your local anvi'o installation so taxonomy information for a given set of tRNA sequences can be identified using anvi-run-trna-taxonomy and made sense of via anvi-estimate-trna-taxonomy).
πŸ§€ n/a
πŸ• trna-taxonomy-db
🧠
πŸ”₯ anvi-setup-user-modules. Set up user-defined metabolic pathways into an anvi'o-compatible database.
πŸ§€ user-modules-data
πŸ• modules-db user-modules-data
🧠
πŸ”₯ anvi-show-collections-and-bins. A script to display collections stored in an anvi'o profile or pan database.
πŸ§€ pan-db profile-db
πŸ• n/a
🧠
πŸ”₯ anvi-show-misc-data. Show all misc data keys in all misc data tables.
πŸ§€ pan-db profile-db contigs-db
πŸ• n/a
🧠
πŸ”₯ anvi-split. Split an anvi'o pan or profile database into smaller, self-contained projects. Black magic..
πŸ§€ profile-db contigs-db genomes-storage-db pan-db collection
πŸ• split-bins
🧠
πŸ”₯ anvi-summarize. Summarizer for anvi'o pan or profile db's. Essentially, this program takes a collection id along with either a profile database and a contigs database or a pan database and a genomes storage and generates a static HTML output for what is described in a given collection. The output directory will contain almost everything any downstream analysis may need, and can be displayed using a browser without the need for an anvi'o installation. For this reason alone, reporting summary outputs as supplementary data with publications is a great idea for transparency and reproducibility.
πŸ§€ profile-db contigs-db collection pan-db genomes-storage-db
πŸ• summary
🧠
πŸ”₯ anvi-summarize-blitz. FAST summary of many anvi'o single profile databases (without having to use the program anvi-merge)..
πŸ§€ single-profile-db contigs-db
πŸ• quick-summary
🧠
πŸ”₯ anvi-tabulate-trnaseq. A program to write standardized tab-delimited files of tRNA-seq seed coverage and modification results.
πŸ§€ trnaseq-contigs-db trnaseq-profile-db
πŸ• trnaseq-seed-txt modifications-txt
🧠
πŸ”₯ anvi-trnaseq. A program to process reads from a tRNA-seq dataset to generate an anvi'o tRNA-seq database.
πŸ§€ trnaseq-fasta
πŸ• trnaseq-db
🧠
πŸ”₯ anvi-update-db-description. Update the description in an anvi'o database.
πŸ§€ pan-db profile-db contigs-db genomes-storage-db
πŸ• n/a
🧠
πŸ”₯ anvi-update-structure-database. Add or re-run genes from an already existing structure database. All settings used to generate your database will be used in this program.
πŸ§€ contigs-db structure-db
πŸ• n/a
🧠
πŸ”₯ anvi-script-add-default-collection. A script to add a 'DEFAULT' collection in an anvi'o pan or profile database with either (1) a single bin that describes all items available in the profile database, or (2) as many bins as there are items in the profile database wher every item has its own bin. The former is the default behavior that will be useful in most instances where you need to use this script. The latter is most useful if you are Florian and/or have something very specific in mind..
πŸ§€ pan-db profile-db contigs-db
πŸ• collection bin
🧠
πŸ”₯ anvi-script-as-markdown. Markdownizides TAB-delmited data with headers in terminal..
πŸ§€ n/a
πŸ• markdown-txt
🧠
πŸ”₯ anvi-script-augustus-output-to-external-gene-calls. Takes in gene calls by AUGUSTUS v3.3.3, generates an anvi'o external gene calls file. It may work well with other versions of AUGUSTUS, too. It is just no one has tested the script with different versions of the program.
πŸ§€ augustus-gene-calls
πŸ• external-gene-calls
🧠
πŸ”₯ anvi-script-checkm-tree-to-interactive. A helper script to convert CheckM trees into anvio interactive with taxonomy information.
πŸ§€ phylogeny
πŸ• interactive
🧠
πŸ”₯ anvi-script-compute-ani-for-fasta. Run ANI between contigs in a single FASTA file.
πŸ§€ fasta
πŸ• genome-similarity
🧠
πŸ”₯ anvi-script-compute-bayesian-pan-core. Runs mOTUpan on your gene clusters to estimate whether they are core or accessory.
πŸ§€ pan-db genomes-storage-db
πŸ• bin
🧠
πŸ”₯ anvi-script-estimate-metabolic-independence. Takes a genome as a contigs-db, and tells you whether it can be considered as an organism of high metabolic independence, or not.
πŸ§€ contigs-db
πŸ• metabolic-independence-score
🧠
πŸ”₯ anvi-script-filter-fasta-by-blast. Filter FASTA file according to BLAST table (remove sequences with bad BLAST alignment).
πŸ§€ contigs-fasta blast-table
πŸ• contigs-fasta
🧠
πŸ”₯ anvi-script-filter-hmm-hits-table. Filter weak HMM hits from a given contigs database using a domain hits table reported by anvi-run-hmms..
πŸ§€ contigs-db hmm-source hmm-hits
πŸ• hmm-hits
🧠
πŸ”₯ anvi-script-find-misassemblies. This script report errors in long read assembly using read-recruitment information. The input file should be a BAM file of long reads mapped to an assembly made from these reads..
πŸ§€ bam-file
πŸ• n/a
🧠
πŸ”₯ anvi-script-fix-homopolymer-indels. Corrects homopolymer-region associated INDELs in a given genome based on a reference genome. The most effective use of this script is when the input genome is a genome reconstructed by minION long reads, and the reference genome is one that is of high-quality. Essentially, this script will BLAST the genome you wish to correct against the reference genome you provide, identify INDELs in the BLAST results that are exclusively associated with homopolymer regions, and will take the reference genome as a guide to correct the input sequences, and report a new FASTA file. You can use the output FASTA file that is fixed as the input FASTA file over and over again to see if you can eliminate all homopolymer-associated INDELs.
πŸ§€ fasta
πŸ• fasta
🧠
πŸ”₯ anvi-script-gen-defense-finder-models-to-hmm-directory. This program generates an anvi'o compatible HMM directory to be used with anvi-run-hmms from the MDMParis Defense Finder Models..
πŸ§€ hmm-file
πŸ• hmm-source
🧠
πŸ”₯ anvi-script-gen-distribution-of-genes-in-a-bin. Quantify the detection of genes in genomes in metagenomes to identify the environmental core. This is a helper script for anvi'o metapangenomic workflow.
πŸ§€ contigs-db profile-db collection bin
πŸ• view-data misc-data-items-txt
🧠
πŸ”₯ anvi-script-gen-function-matrix-across-genomes. A program to generate reports for the distribution of functions across genomes.
πŸ§€ functions genomes-storage-db internal-genomes external-genomes groups-txt
πŸ• functional-enrichment-txt functions-across-genomes-txt
🧠
πŸ”₯ anvi-script-gen-functions-per-group-stats-output. Generate a TAB delimited file for the distribution of functions across groups of genomes/metagenomes.
πŸ§€ functions genomes-storage-db internal-genomes external-genomes
πŸ• interactive
🧠
πŸ”₯ anvi-script-gen-genomes-file. Generate an external genomes or internal genomes file.
πŸ§€ contigs-db profile-db collection
πŸ• external-genomes internal-genomes
🧠
πŸ”₯ anvi-script-gen-hmm-hits-matrix-across-genomes. A simple script to generate a TAB-delimited file that reports the frequency of HMM hits for a given HMM source across contigs databases.
πŸ§€ external-genomes internal-genomes hmm-source hmm-hits
πŸ• hmm-hits-across-genomes-txt
🧠
πŸ”₯ anvi-script-gen-pseudo-paired-reads-from-fastq. A script that takes a FASTQ file that is not paired-end (i.e., R1 alone) and converts it into two FASTQ files that are paired-end (i.e., R1 and R2). This is a quick-and-dirty workaround that halves each read from the original FASTQ and puts one half in the FASTQ file for R1 and puts the reverse-complement of the second half in the FASTQ file for R2. If you've ended up here, things have clearly not gone very well for you, and Evan, who battled similar battles and ended up implementing this solution wholeheartedly sympathizes.
πŸ§€ short-reads-fasta
πŸ• paired-end-fastq
🧠
πŸ”₯ anvi-script-gen-short-reads. Generate short reads from contigs. Useful to reconstruct mock data sets from already assembled contigs.
πŸ§€ configuration-ini
πŸ• short-reads-fasta
🧠
πŸ”₯ anvi-script-gen-user-module-file. This script generates a user-defined module file from a tab-delimited file of enzymes and other input parameters..
πŸ§€ enzymes-list-for-module
πŸ• user-modules-data
🧠
πŸ”₯ anvi-script-get-coverage-from-bam. Get nucleotide-level, contig-level, or bin-level coverage values from a BAM file very rapidly. For other anvi'o programs that are designed to profile BAM files, see anvi-profile and anvi-profile-blitz.
πŸ§€ bam-file collection-txt
πŸ• coverages-txt
🧠
πŸ”₯ anvi-script-get-hmm-hits-per-gene-call. A simple script to generate a TAB-delimited file gene caller IDs and their HMM hits for a given HMM source.
πŸ§€ contigs-db hmm-source hmm-hits
πŸ• functions-txt
🧠
πŸ”₯ anvi-script-hmm-to-hmm-directory. You give this program one or more HMM files from hmmbuild, and it generates an anvi'o compatible HMM directory to be used with anvi-run-hmms.
πŸ§€ hmm-file
πŸ• hmm-source
🧠
πŸ”₯ anvi-script-merge-collections. Generate an additional data file from multiple collections.
πŸ§€ contigs-db collection-txt
πŸ• n/a
🧠
πŸ”₯ anvi-script-permute-trnaseq-seeds. This script generates a FASTA file of tRNA-seq seeds with permuted nucleotides at positions of predicted modification-induced substitutions. The underlying nucleotide without modification is not always the most common base call. The resulting FASTA file can be queried against a database of tRNA genes to validate nucleotides at modified positions and find the most similar sequences..
πŸ§€ contigs-db profile-db
πŸ• contigs-fasta
🧠
πŸ”₯ anvi-script-pfam-accessions-to-hmms-directory. You give this program one or more PFAM accession ids, and it generates an anvi'o compatible HMM directory to be used with anvi-run-hmms.
πŸ§€ pfam-accession
πŸ• hmm-source
🧠
πŸ”₯ anvi-script-process-genbank. This script takes a GenBank file, and outputs a FASTA file, as well as two additional TAB-delimited output files for external gene calls and gene functions that can be used with the programs anvi-gen-contigs-database and anvi-import-functions.
πŸ§€ genbank-file
πŸ• contigs-fasta external-gene-calls functions-txt
🧠
πŸ”₯ anvi-script-process-genbank-metadata. This script takes the 'metadata' output of the program ncbi-genome-download (see https://github.com/kblin/ncbi-genome-download for details), and processes each GenBank file found in the metadata file to generate a FASTA file, as well as genes and functions files for each entry. Plus, it autmatically generates a FASTA TXT file descriptor for anvi'o snakemake workflows. So it is a multi-talented program like that.
πŸ§€ n/a
πŸ• contigs-fasta functions-txt external-gene-calls
🧠
πŸ”₯ anvi-script-reformat-bam. Reformat a BAM file to match the updated sequence names after running anvi-script-reformat-fasta. You will need this script to fix your BAM file if you run anvi-script-reformat-fasta on a FASTA file of sequences after you already used the previous version of the FASTA file for read recruitment..
πŸ§€ bam-file contig-rename-report-txt
πŸ• bam-file
🧠
πŸ”₯ anvi-script-reformat-fasta. Reformat FASTA file (remove contigs based on length, or based on a given list of deflines, and/or generate an output with simpler names).
πŸ§€ fasta
πŸ• contigs-fasta contig-rename-report-txt
🧠
πŸ”₯ anvi-script-snvs-to-interactive. Take the output of anvi-gen-variability-profile, prepare an output for interactive interface.
πŸ§€ variability-profile-txt
πŸ• interactive
🧠
πŸ”₯ anvi-script-transpose-matrix. Transpose a TAB-delimited file.
πŸ§€ view-data functions-txt misc-data-items-txt misc-data-layers-txt gene-calls-txt linkmers-txt
πŸ• view-data functions-txt misc-data-items-txt misc-data-layers-txt gene-calls-txt linkmers-txt
🧠
πŸ”₯ anvi-script-variability-to-vcf. A script to convert SNV output obtained from anvi-gen-variability-profile to the standard VCF format.
πŸ§€ variability-profile-txt
πŸ• vcf
πŸ§