genomes-storage-db

DB

A DB-type anvi’o artifact. This artifact is typically generated, used, and/or exported by anvi’o (and not provided by the user)..

🔙 To the main page of anvi’o programs and artifacts.

Provided by

anvi-gen-genomes-storage

Required or used by

anvi-analyze-synteny anvi-compute-functional-enrichment-across-genomes anvi-compute-functional-enrichment-in-pan anvi-compute-gene-cluster-homogeneity anvi-db-info anvi-display-functions anvi-display-pan anvi-get-sequences-for-gene-calls anvi-get-sequences-for-gene-clusters anvi-meta-pan-genome anvi-migrate anvi-pan-genome anvi-search-functions anvi-split anvi-summarize anvi-update-db-description anvi-script-compute-bayesian-pan-core anvi-script-gen-function-matrix-across-genomes anvi-script-gen-functions-per-group-stats-output

Description

This is an Anvi’o database that stores information about your genomes, primarily for use in pangenomic analyses.

You can think of it like this: in a way, a genomes-storage-db is to the the pangenomic workflow what a contigs-db is to the the metagenomic workflow. They both describe key information unique to your particular dataset and are required to run the vast majority of programs.

What kind of information?

A genomes storage database contains information about the genomes that you inputted to create it, as well as the genes within them.

Specifically, there are three tables stored in a genomes storage database:

  • A table describing the information about each of your genomes, such as their name, type (internal or external), GC content, number of contigs, completition, redunduncy, number of genes, etc.
  • A table describing the genes within your genomes. For each gene, this includes its gene caller id, associated genome and position, sequence, length, and whether or not it is partial.
  • A table describing the functions of your genes, including their sources and e-values.

Cool. How do I make one?

You can generate one of these from an internal-genomes (genomes described in bins), external-genomes (genomes described in contigs-dbs), or both using the program anvi-gen-genomes-storage.

Cool cool. What can I do with one?

With one of these, you can run anvi-pan-genome to get a pan-db. If a genomes storage database is the contigs-db of pangenomics, then a pan-db is the profile-db. It contains lots of information that is vital for analysis, and most programs will require both the pan-db and its genomes storage database as an input.

Edit this file to update this information.