anvi-gen-contigs-database

Generate a new anvi'o contigs database.

🔙 To the main page of anvi’o programs and artifacts.

Authors

Can consume

contigs-fasta external-gene-calls

Can provide

contigs-db

Usage

The input for this program is a contigs-fasta, which should contain one or more sequences. These sequences may belong to a single genome or could be contigs obtained from an assembly.

Make sure the input file matches the requirements of a contigs-fasta. If you are planning to use the resulting contigs-db with anvi-profile, it is essential that you convert your fasta file to a properly formatted contigs-fasta before you perform the read recruitment.

An anvi’o contigs database will keep all the information related to your sequences: positions of open reading frames, k-mer frequencies for each contig, functional and taxonomic annotation of genes, etc. The contigs database is one of the most essential components of anvi’o.

When run on a contigs-fasta this program will,

  • Compute k-mer frequencies for each contig (the default is 4, but you can change it using --kmer-size parameter if you feel adventurous).

  • Soft-split contigs longer than 20,000 bp into smaller ones (you can change the split size using the --split-length flag). When the gene calling step is not skipped, the process of splitting contigs will consider where genes are and avoid cutting genes in the middle. For very, very large assemblies this process can take a while, and you can skip it with --skip-mindful-splitting flag.

  • Identify open reading frames using Prodigal, UNLESS, (1) you have used the flag --skip-gene-calling (no gene calls will be made) or (2) you have provided external-gene-calls.

This program can work with compressed input FASTA files (i.e., the file name ends with a .gz extention).

Create a contigs database from a FASTA file

anvi-gen-contigs-database -f contigs-fasta \ -o contigs-db

Create a contigs database with external gene calls

anvi-gen-contigs-database -f contigs-fasta \ -o contigs-db \ --external-gene-calls external-gene-calls

See external-gene-calls for the description and formatting requirements of this file.

If user-provided or anvi’o-calculated amino acid sequences contain internal stop codons, anvi’o will yield an error. The following command will persist through this error:

anvi-gen-contigs-database -f contigs-fasta \ -o contigs-db \ --external-gene-calls external-gene-calls \ --ignore-internal-stop-codons

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.