contigs-db [artifact]

DB

A DB-type anviā€™o artifact. This artifact is typically generated, used, and/or exported by anviā€™o (and not provided by the user)..

šŸ”™ To the main page of anviā€™o programs and artifacts.

Provided by

anvi-gen-contigs-database

Required or used by

anvi-cluster-contigs anvi-compute-completeness anvi-db-info anvi-delete-functions anvi-delete-hmms anvi-display-contigs-stats anvi-display-metabolism anvi-display-structure anvi-estimate-genome-completeness anvi-estimate-metabolism anvi-estimate-scg-taxonomy anvi-estimate-trna-taxonomy anvi-export-contigs anvi-export-functions anvi-export-gene-calls anvi-export-gene-coverage-and-detection anvi-export-locus anvi-export-misc-data anvi-export-splits-and-coverages anvi-export-splits-taxonomy anvi-gen-fixation-index-matrix anvi-gen-gene-consensus-sequences anvi-gen-gene-level-stats-databases anvi-gen-structure-database anvi-gen-variability-profile anvi-get-aa-counts anvi-get-codon-frequencies anvi-get-pn-ps-ratio anvi-get-sequences-for-gene-calls anvi-get-sequences-for-hmm-hits anvi-get-short-reads-from-bam anvi-get-short-reads-mapping-to-a-gene anvi-get-split-coverages anvi-import-collection anvi-import-functions anvi-import-misc-data anvi-import-taxonomy-for-genes anvi-inspect anvi-interactive anvi-merge anvi-migrate anvi-profile anvi-profile-blitz anvi-refine anvi-rename-bins anvi-report-inversions anvi-run-hmms anvi-run-interacdome anvi-run-kegg-kofams anvi-run-ncbi-cogs anvi-run-pfams anvi-run-scg-taxonomy anvi-run-trna-taxonomy anvi-scan-trnas anvi-search-functions anvi-search-palindromes anvi-search-sequence-motifs anvi-show-misc-data anvi-split anvi-summarize anvi-summarize-blitz anvi-update-db-description anvi-update-structure-database anvi-script-add-default-collection anvi-script-filter-hmm-hits-table anvi-script-gen-distribution-of-genes-in-a-bin anvi-script-gen-genomes-file anvi-script-gen_stats_for_single_copy_genes.py anvi-script-get-hmm-hits-per-gene-call anvi-script-merge-collections anvi-script-permute-trnaseq-seeds

Description

A contigs database is an anviā€™o database that contains key information associated with your sequences.

In a way, an anviā€™o contigs database is a modern, more talented form of a FASTA file, where you can store additional information about your sequences in it and others can query and use it. Information storage and access is primarily done by anviā€™o programs, however, it can also be done through the command line interface or programmatically.

The information a contigs database contains about its sequences can include the positions of open reading frames, tetra-nucleotide frequencies, functional and taxonomic annotations, information on individual nucleotide or amino acid positions, and more.

Another (less computation-heavy) way of thinking about it

When working in anviā€™o, youā€™ll need to be able to access previous analysis done on a genome or transcriptome. To do this, anviā€™o uses tools like contigs databases instead of regular fasta files. So, youā€™ll want to convert the data that you have into a contigs database to use other anviā€™o programs (using anvi-gen-contigs-database). As seen on the page for metagenomes, you can then use this contigs database instead of your fasta file for all of your anviā€™o needs.

In short, to get the most out of your data in anviā€™o, youā€™ll want to use your data (which was probably originally in a fasta file) to create both a contigs-db and a profile-db. That way, anviā€™o is able to keep track of many different kinds of analysis and you can easily interact with other anviā€™o programs.

Usage Information

Creating and populating a contigs database

Contigs databases will be initialized using anvi-gen-contigs-database using a contigs-fasta. This will compute the k-mer frequencies for each contig, soft-split your contigs, and identify open reading frames. To populate a contigs database with more information, you can then run various other programs.

Key programs that populate an anviā€™o contigs database with essential information include,

Once an anviā€™o contigs database is generated and populated with information, it is always a good idea to run anvi-display-contigs-stats to see a numerical summary of its contents.

Other programs you can run to populate a contigs database with functions include,

Analysis on a populated contigs database

Other essential programs that read from a contigs database and yield key information include anvi-estimate-genome-completeness, anvi-get-sequences-for-hmm-hits, and anvi-estimate-scg-taxonomy.

If you wish to run programs like anvi-cluster-contigs, anvi-estimate-metabolism, and anvi-gen-gene-level-stats-databases, or view your database with anvi-interactive, youā€™ll need to first use your contigs database to create a profile-db.

Variants

Contigs databases, like profile-dbs, are allowed have different variants, though the only currently implemented variant, the trnaseq-contigs-db, is for tRNA transcripts from tRNA-seq experiments. The default variant stored for ā€œstandardā€ contigs databases is unknown. Variants should indicate that substantially different information is stored in the database. For instance, open reading frames are applicable to protein-coding genes but not tRNA transcripts, so ORF data is not recorded for the trnaseq variant. The $(trnaseq-workflow)s generates trnaseq-contigs-dbs using a very different approach to anvi-gen-contigs-database.

Edit this file to update this information.