Get short reads back from a BAM file with options for compression, splitting of forward and reverse reads, etc.
🔙 To the main page of anvi’o programs and artifacts.
The purpose of this program is not to replace more efficient tools to recover short reads from BAM files such as
samtool. Since it was designed to address much more subtle needs, this program may have a huge memory fingerprint for very large and numerous BAM files.
Using this program you can,
In addition, you can use the previously-defined fetch filters via the
--fetch-filter parmeter to get only short reads satisfy a particular set of criteria (i.e., those that are in forward-forward or reverse-reverse orientation, those that have a template length longer than 1,000 nucleotides, and so on). For a complete set of fetch filters you can use, please see the help menu of the program.
The program can report all reads in a single file, or you can ask reads to be split into R1 and R2 files for mapping results of paired-end sequences using the flag
--split-R1-and-R2. In this case, reads that are not paired will be reported in a file with the prefix
Reads reported as a FASTA will contain necessary information in their deflines to recover which BAM file, contig, sample they are from with explicit start/stop positions on the contig to which they matched.
A basic run of this program is as follows:
anvi-get-short-reads-from-bam BAM_FILE_1.bam BAM_FILE_2.bam (…) \ --output-file OUTPUT.fa
This will report all short reads found in BAM files
BAM_FILE_2.bam and store them into a single file. You can use as many BAM files as you wish.
You can choose to only return the short reads that are contained within a collection:
Or in a bin that is described in a collection:
You can get all reads mapped to a contig:
anvi-get-short-reads-from-bam BAM_FILE_1.bam BAM_FILE_2.bam \ --target-contig CONTIG_NAME \ --output-file OUTPUT.fa
Or define explicit start/stop positions on it:
anvi-get-short-reads-from-bam BAM_FILE_1.bam BAM_FILE_2.bam \ --target-contig CONTIG_NAME \ --target-region-start 100 \ --target-region-end 1000 \ --output-file OUTPUT.fa
In this mode, the program will fetch any read that includes a nucleotide that matches to anywhere in the region defined by the user. Which means, if the user sets
101, all reads that have a nuclotide mapping to the
100th position will be returned.
You can split the output based on the directionality of paired-end reads. Adding the tag
--split-R1-and-R2 causes the program to create three separate output files: one for R1 (sequences in the forward direction), one for R2 (sequences in the reverse direction; i.e. reverse complement of R1 sequences), and one for unparied reads. When doing this, you can name these three files with a prefix by using the flag
anvi-get-short-reads-from-bam -o path/to/output \ --split-R1-and-R2 \ -O BAM_1_and_BAM_2 \ BAM_FILE_1.bam BAM_FILE_2.bam
You can also compress the output by adding the flag
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the
__resources__ tag in this file to see an example.