anvi-script-gen-distribution-of-genes-in-a-bin [program]

Quantify the detection of genes in genomes in metagenomes to identify the environmental core. This is a helper script for anvi'o metapangenomic workflow.

Go back to the main page of anvi’o programs and artifacts.

Can provide

view-data misc-data-items-txt

Can consume

contigs-db profile-db collection bin

Usage

This program computes the detection of genes (inputted as a bin) across your samples, so that you can visualize them in the interactive interface.

This program is used in the metapangenomic workflow on genes with metagenomes as samples to visually identify the environmental core genes and accessory genes.

Inputs

Essentially, you provide a contigs-db and profile-db pair, as well as the bin you want to look at, and this program will search each gene in your bin against the samples denoted in your profile-db:

anvi-script-gen-distribution-of-genes-in-a-bin -c contigs-db \ -p profile-db \ -C collection \ -b bin

There are two other parameters that you can set to focus the genes that you’re looking at:

  • The minimum detection required for a gene to be included (by default, a gene must have a detection value of 0.5 in at least one of your samples) -The minimum coverage required for a gene to be included (by default, a gene must have a total coverage of 0.25 times the mean total coverage in your data)

Outputs

This program will produce two outputs:

  1. [your bin name]-GENE-COVs.txt, which is a view-data artifact. This is a matrix where each row represents a gene, each column represents one of your samples, and the cells each contain a coverage value.
  2. [your bin name]-ENV-DETECTION.txt, which is a misc-data-layers. It is a two-column file, where each row is a gene and and the second column describes whether or not that gene is systematically detected in your samples. Thus, this can be added as an additional layer in the interface that describes describes which genes are detected in your samples. (as an example, see the outermost layer here)

Thus, after running this program on a bin with name BIN_NAME, you can run

anvi-interactive -d BIN_NAME-GENE-COVs.txt \ -A BIN_NAME-ENV-DETECTION.txt \ --manual \ -p profile-db

This will visually show you the coverage and detection of your genes across your samples in the interactive interface (simlarly to this figure).

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.