Split an anvi'o pan or profile database into smaller, self-contained projects. Black magic..
🔙 To the main page of anvi’o programs and artifacts.
profile-db contigs-db genomes-storage-db pan-db collection
Creates individual, self-contained anvi’o projects for one or more bins stored in an anvi’o collection. This program may be useful if you would like to share a subset of an anvi’o project with the community or a collaborator, or focus on a particular aspect of your data without having to initialize very large files. Altogether, anvi-split promotoes reproducibility, openness, and collaboration.
The program can generate split-bins from metagenomes or pangenomes. To split bins, you can provide the program anvi-split with a contigs-db and profile-db pair. To split gene clusters, you can provide it with a genomes-storage-db and pan-db pair. In both cases you will also need a collection. If you don’t provide any bin names, the program will create individual directories for each bin that is found in your collection. You can also limit the output to a single bin. Each of the resulting directories in your output folder will contain a stand-alone anvi’o project that can be shared without sharing any of the larger dataset.
Assume you have a profile-db has a collection with three bins, which are (very creatively) called BIN_1
, BIN_2
, and BIN_3
.
If you ran the following code:
anvi-split -p profile-db \ -c contigs-db \ -C collection \ -o OUTPUT
Alternatively you can specify a bin name to limit the reported bins:
anvi-split -p profile-db \ -c contigs-db \ -C collection \ --bin-id BIN_1 -o OUTPUT
Similarly, if you provide a genomes-storage-db and pan-db pair, the directories will contain their own smaller genomes-storage-db and pan-db pairs.
You can always use the program anvi-show-collections-and-bins to learn available collection and bin names in a given profile-db or pan-db.
For extremely large datasets, splitting bins may be difficult. For metagenomics projets you can,
--skip-variability-tables
to NOT report single-nucleotide variants or single-amino acid variants in your split bins (which can reach hundreds of millions of lines of information for large and complex metagenomes), and/or,--compress-auxiliary-data
to save space. While this is a great option for data that is meant to be stored long-term and shared with the community, the compressed file would need to be manually decompressed by the end-user prior to using the split bin.Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__
tag in this file to see an example.