anvi-get-short-reads-mapping-to-a-gene [program]

Recover short reads from BAM files that were mapped to genes you are interested in. It is possible to work with a single gene call, or a bunch of them. Similarly, you can get short reads from a single BAM file, or from many of them.

Can provide


Can consume

contigs-db bam-file


This program finds all short reads from (bam-file) that align to a specific gene and returns them as a short-reads-fasta.

If instead you want to extract these short reads from a FASTQ file, get your gene sequence with anvi-export-gene-calls and take a look at anvi-script-get-primer-matches.

To run this program, just specify the bam files you’re looking at and the gene of interest. To do this, name the contigs-db containing your gene and the gene caller ID (either directly through the parameter --gene-caller-id or through a file). Here is an example:

anvi-get-short-reads-mapping-to-a-gene -c contigs-db \ --gene-caller-id 2 \ -i BAM_FILE_ONE.bam \ -O GENE_2_MATCHES

The output of this will be a file named GENE_2_MATCHES_BAM_FILE_ONE.fasta (prefix + bam file name), which will contain all short readds that aligned to gene 2 with more than 100 nucleotides.

You also have the option to provide multiple bam files; in this case, there will be an output files for each bam file inputted.

Additionally, you can change the number of nucleotides required to map to a short read for it to be reported. For example, to expand your search, you could decrese the required mapping length to 50 nucleotides, as so:

anvi-get-short-reads-mapping-to-a-gene -c contigs-db \ --gene-caller-id 2 \ -i Bam_file_one.bam Bam_file_two.bam \ -O GENE_2_MATCHES \ --leeway 50

