anvi-split

Split an anvi'o pan or profile database into smaller, self-contained projects. Black magic..

🔙 To the main page of anvi’o programs and artifacts.

Authors

Can consume

profile-db contigs-db genomes-storage-db pan-db collection

Can provide

split-bins

Usage

Creates individual, self-contained anvi’o projects for one or more bins stored in an anvi’o collection. This program may be useful if you would like to share a subset of an anvi’o project with the community or a collaborator, or focus on a particular aspect of your data without having to initialize very large files. Altogether, anvi-split promotoes reproducibility, openness, and collaboration.

The program can generate split-bins from metagenomes or pangenomes. To split bins, you can provide the program anvi-split with a contigs-db and profile-db pair. To split gene clusters, you can provide it with a genomes-storage-db and pan-db pair. In both cases you will also need a collection. If you don’t provide any bin names, the program will create individual directories for each bin that is found in your collection. You can also limit the output to a single bin. Each of the resulting directories in your output folder will contain a stand-alone anvi’o project that can be shared without sharing any of the larger dataset.

An example run

Assume you have a profile-db has a collection with three bins, which are (very creatively) called BIN_1, BIN_2, and BIN_3.

If you ran the following code:

anvi-split -p profile-db \ -c contigs-db \ -C collection \ -o OUTPUT

Alternatively you can specify a bin name to limit the reported bins:

anvi-split -p profile-db \ -c contigs-db \ -C collection \ --bin-id BIN_1 -o OUTPUT

Similarly, if you provide a genomes-storage-db and pan-db pair, the directories will contain their own smaller genomes-storage-db and pan-db pairs.

You can always use the program anvi-show-collections-and-bins to learn available collection and bin names in a given profile-db or pan-db.

Performance

For extremely large datasets, splitting bins may be difficult. For metagenomics projets you can,

  • Use the flag --skip-variability-tables to NOT report single-nucleotide variants or single-amino acid variants in your split bins (which can reach hundreds of millions of lines of information for large and complex metagenomes), and/or,
  • Use the flag --compress-auxiliary-data to save space. While this is a great option for data that is meant to be stored long-term and shared with the community, the compressed file would need to be manually decompressed by the end-user prior to using the split bin.

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.