This script takes a GenBank file, and outputs a FASTA file, as well as two additional TAB-delimited output files for external gene calls and gene functions that can be used with the programs anvi-gen-contigs-database and anvi-import-functions. It processes CDS, tRNA, and rRNA features by default, and reclassifies pseudogenes or CDS with internal stops as non-coding to ensure compatibility with anvi'o..
🔙 To the main page of anvi’o programs and artifacts.
contigs-fasta
external-gene-calls
functions-txt ![]()
This program processes a genbank-file, and converts it into anvi’o friendly artifacts: namely, a contigs-fasta, external-gene-calls and a functions-txt.
The contigs-fasta and external-gene-calls can be given to anvi-gen-contigs-database to create a contigs-db, and then you can use anvi-import-functions to bring the function data (in the functions-txt) into the database. Then you’ll have all of the data in your genbank-file converted into a single contigs-db, which you can use for a variety of anvi’o analyses.
By default, anvi-script-process-genbank will CDS, tRNA, and rRNA features by default.
CDS features are mapped to the anvi’o CODING gene call type.tRNA and rRNA features are mapped to the NONCODING gene call type.Genomic data often contains features that anvi’o may find difficult to process using standard workflows, such as gene calls with internal stop codons or frameshifts. This script identifies such features and handles them gracefully by reclassifying them as NONCODING:
CDS explicitly marked as a /pseudogene or having /pseudo in its GenBank qualifiers will be reclassified as NONCODING.CDS with notes indicating internal stops or frameshifts (based on common NCBI PGAP terms) will also be reclassified as NONCODING.This approach ensures that these features are preserved in your contigs-db without triggering translation errors during database creation.
The parameters of this program entirely deal with the outputs. Besides telling the program where to put them, you can also give the function annotation source (in the functions-txt) a custom name.
One important note about this conversion is the following: During the conversion of GenBank entries, anvi’o will assign a new gene call id to each entry, breaking the link between locus tags defined in the GenBank file and the gene entries that will later appear in the anvi’o contigs-db. One way to avoid this is to use the flag --include-locus-tags-as-functions, which will instruct anvi’o to add a new ‘function’ source for each gene in the output file for functional annotations so that the user can trace back a given gene call to the original locus tag.
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.