This script takes the 'metadata' output of the program ncbi-genome-download
(see https://github.com/kblin/ncbi-genome-download for details), and processes each GenBank file found in the metadata file to generate a FASTA file, as well as genes and functions files for each entry. Plus, it autmatically generates a FASTA TXT file descriptor for anvi'o snakemake workflows. So it is a multi-talented program like that.
🔙 To the main page of anvi’o programs and artifacts.
This program seems to know what its doing. It needs no input material from its user. Good program.
contigs-fasta functions-txt external-gene-calls
Suppose you have downloaded some genomes from NCBI (using this incredibly useful program) and you have a metadata table describing those genomes. This program will convert that metadata table into some useful files, namely: a FASTA file of contig sequences, an external gene calls file, and an external functions file for each genome you have downloaded; as well as a single tab-delimited fasta-txt file (like the one shown here) describing the path to each of these files for all downloaded genomes (that you can pass directly to a snakemake workflow if you need to). Yay.
The prerequisite for running this program is to have a tab-delimited metadata file containing information about each of the genomes you downloaded from NCBI. Let’s say your download command started like this: ncbi-genome-download --metadata-table ncbi_metadata.txt -t ....
So for the purposes of this usage tutorial, your metadata file is called ncbi_metadata.txt
.
In case you are wondering, that file should have a header that looks something like this:
assembly_accession bioproject biosample wgs_master excluded_from_refseq refseq_category relation_to_type_material taxid species_taxid organism_name infraspecific_name isolate version_status assembly_level release_type genome_rep seq_rel_dateasm_name submitter gbrs_paired_asm paired_asm_comp ftp_path local_filename
If you run this, all the output files will show up in your current working directory.
anvi-script-process-genbank-metadata -m ncbi_metadata.txt
Alternatively, you can specify a directory in which to generate the output:
anvi-script-process-genbank-metadata -m ncbi_metadata.txt -o DOWNLOADED_GENOMES
The default name for the fasta-txt file is fasta-input.txt
, but you can change that with the --output-fasta-txt
parameter.
anvi-script-process-genbank-metadata -m ncbi_metadata.txt --output-fasta-txt ncbi_fasta.txt
The default columns in the fasta-txt file are:
name path external_gene_calls gene_functional_annotation
But sometimes, you don’t want your downstream snakemake workflow to use those external gene calls or functional annotations files. So to skip adding those columns into the fasta-txt file, you can use the -E
flag:
anvi-script-process-genbank-metadata -m ncbi_metadata.txt --output-fasta-txt ncbi_fasta.txt -E
Then the fasta-txt will only contain a name
column and a path
column.
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__
tag in this file to see an example.