Filter FASTA file according to BLAST table (remove sequences with bad BLAST alignment).
🔙 To the main page of anvi’o programs and artifacts.
This program takes a contigs-fasta and blast-table and removes sequences without BLAST hits of a certain level of confidence.
For example, you could use this program to filter out sequences that do not have high-confidence taxonomy assignments before running a phylogenomic analysis.
To run this program, you’ll need to provide the contigs-fasta that you’re planning to filter, the blast-table, a list of the column headers in your blast-table (as given to BLAST by -outfmt
), and a proper_pident
threshold at which to remove the sequences. This threshold will remove sequences less than the given percent of the query amino acids that were identical to the corresponding matched amino acids. Note that this diffres from the pident
blast parameter because it doesn’t include unaligned regions.
For example, if you ran
anvi-script-filter-fasta-by-blast -f contigs-fasta \ -o path/to/contigs-fasta \ -b blast-table \ -s qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen \ -t 30
Then the output file would be a contigs-fasta that contains only the sequences in your input file that have a hit in your blast table with more than 30 percent of the amino acids aligned.
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__
tag in this file to see an example.