A program to find palindromes in sequences.
🔙 To the main page of anvi’o programs and artifacts.
This program finds palindromes in any DNA sequence. It will search for palindromes that matches criteria listed by the user (i.e., minimum length of the palindromic sequences, maximum number of mismatches, and minimum distance between the two palindromic regions, and more).
The program will print out its findings (and tribulations) and will optionally report the search results as a palindromes-txt.
Please note that this program can find both ‘in-place’ palindromes (i.e., the identity and order of nucleotides on one strand match to those on the complementary strand) that will look like this in the genomic context:
0 1
1234567890
...TCGA...
As well as ‘distant palindromes’ (i.e., special cases of palindromes that form hairpins) that will look like this in the genomic context:
0 1
12345678901234567
...ATCC...GGAT...
In this example, the ‘distance’ for the in-place palindrome will be 0, and the ‘distance’ for the distant palindromes will be 3. You can set the --min-distance
parameter to anything greater than 0 to only report distant palindromes, and eliminate all in-place palindromes from your results.
The speed of the algorithm will depend on the minimum palindrome length parameter. The shorter the palindrome length, the longer the processing time. Searching for palindromes longer than 50 nts in a 10,000,000 nts long sequence takes about 4 seconds on a laptop.
anvi-search-palindromes can use multiple different sequence sources.
In this mode anvi-search-palindromes will go through every contig sequence in a given contigs-db.
anvi-search-palindromes -c contigs-db \ --output-file palindromes-txt
Alternatively, you can use a fasta file as input.
anvi-search-palindromes --fasta-file fasta \ --output-file palindromes-txt
Those who are lazy can also pass a DNA sequence for quick searches:
anvi-search-palindromes --dna-sequence (.. A DNA SEQUENCE OF ANY LENGTH ..)
If you provide an --output-file
parameter, your results will be stored into a palindromes-txt file for downstream analyses. If you do not provide an output file, or explicitly asked for a verbose output with the flag --verbose
, you will see all your palindromes listed on your screen.
Here is an example with a single sequence and no output file path:
anvi-search-palindromes --dna-sequence CATTGACGTTGACGGCGACCGGTCGGTGATCACCGACCGGTCGCCGTCAACGTCAATG
SEARCH SETTINGS
===============================================
Minimum palindrome length ....................: 10
Number of mismatches allowed .................: 0
Minimum gap length ...........................: 0
Be verbose? ..................................: Yes
Number of threads for BLAST ..................: 1
BLAST word size ..............................: 10
58 nts palindrome
===============================================
Method .......................................: BLAST
1st sequence [start:stop] ....................: [0:58]
2nd sequence [start:stop] ....................: [0:58]
Number of mismatches .........................: 0
Distance between .............................: 0
1st sequence .................................: CATTGACGTTGACGGCGACCGGTCGGTGATCACCGACCGGTCGCCGTCAACGTCAATG
ALN ..........................................: ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2nd sequence .................................: CATTGACGTTGACGGCGACCGGTCGGTGATCACCGACCGGTCGCCGTCAACGTCAATG
SEARCH RESULTS
===============================================
Total number of sequences processed ..........: 1
Total number of palindromes found ............: 1
Longest palindrome ...........................: 58
Most distant palindrome ......................: 0
Here is another example with a contigs-db, an output file path, and the --verbose
flag:
anvi-search-palindromes -c CONTIGS.db \ --min-palindrome-length 50 \ --max-num-mismatches 1 \ --output-file palindromes.txt \ --verbose
SEARCH SETTINGS
===============================================
Minimum palindrome length ....................: 50
Number of mismatches allowed .................: 1
Minimum gap length ...........................: 0
Be verbose? ..................................: Yes
147 nts palindrome"
===============================================
1st sequence [start:stop] ....................: [268872:269019]
2nd sequence [start:stop] ....................: [269631:269778]
Number of mismatches .........................: 1
Distance between .............................: 759
1st sequence .................................: TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAAGCTAGAAAAA
ALN ..........................................: |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||x|||||||||
2nd sequence .................................: TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAAACTAGAAAAA
SEARCH RESULTS
===============================================
Total number of sequences processed ..........: 11
Total number of palindromes found ............: 1
Longest palindrome ...........................: 147
Most distant palindrome ......................: 759
Output file ..................................: palindromes.txt
Just like everything else in anvi’o, you can access the functionality the program anvi-search-palindromes
offers without using the program itself by inheriting an instance from the Palindromes
class and use it in your own Python scripts.
Here is an example, first with an input file and then an ad hoc sequence. Starting with the file (i.e., an anvi’o contigs-db):
# import argparse to pass arguments to the class
import argparse
# `Palindromes` is the class we need
from anvio.sequencefeatures import Palindromes
# we also import `Progress` and `Run` helper classes from the terminal
# module to ask the class to print no output messages to our workspace
# (this is obviously optional)
from anvio.terminal import Progress, Run
# get an instance for the case of a contigs database, and process everything in it.
# this example is with an anvi'o contigs db, but you can also pass a FASTA file
# via `fasta_file='FILE.fa'` instead of `contigs_db='CONTIGS.db'`:
p = Palindromes(argparse.Namespace(contigs_db='CONTIGS.db', min_palindrome_length=50), run=Run(verbose=False), progress=Progress(verbose=False))
p.process()
Once the processing is done, the palindromes are stored in a member dictionary, which contains a key for each sequence:
print(p.palindromes)
>>> {'Day17a_QCcontig1' : [],
'Day17a_QCcontig2' : [],
'Day17a_QCcontig4' : [<anvio.sequencefeatures.Palindrome object at 0x7f8d6072f278>],
'Day17a_QCcontig6' : [],
'Day17a_QCcontig10': [],
'Day17a_QCcontig16': [],
'Day17a_QCcontig23': [],
'Day17a_QCcontig24': [],
'Day17a_QCcontig45': [],
'Day17a_QCcontig54': [],
'Day17a_QCcontig97': []}
Non-empty arrays are the proper palindromes found in a given sequence, described with an instance of the class Palindrome
which is defined as the following:
class Palindrome:
def __init__(self, run=terminal.Run()):
self.run=run
self.first_start = None
self.fisrt_end = None
self.first_sequence = None
self.second_start = None
self.second_end = None
self.second_sequence = None
self.num_mismatches = None
self.length = None
self.distance = None
self.midline = ''
Not only you can access to each member variable to deal with them, you can easily display the contents of one using the display()
function:
palindrome = p.palindromes['Day17a_QCcontig4'][0]
print(palindrome)
>>> TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAA (268872:269009) :: TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAA (269631:269768)
palindrome.display()
>>> 137 nts palindrome"
>>> ===============================================
>>> 1st sequence [start:stop] ....................: [268872:269009]
>>> 2nd sequence [start:stop] ....................: [269631:269768]
>>> Number of mismatches .........................: 0
>>> Distance between .............................: 759
>>> 1st sequence .................................: TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAA
>>> ALN ..........................................: |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>>> 2nd sequence .................................: TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAA
Alternatively you can process an ad hoc sequence without any input files,
p = Palindromes()
# let's set some values for fun,
p.min_palindrome_length = 14
p.max_num_mismatches = 1
# to go through some sequences of your liking:
some_sequences = {'a_sequence': 'CATTGACGTTGACGGCGACCGGTCGGTGATCACCGACCGGTCGCCGTCAACGTCAATG',
'antoher_sequence': 'AAATCGGCCGATTT',
'sequence_with_no_palindrome': 'AAAAAAAAAAAAAA'}
# in this case (where there are no input files) you can call the function `find`,
# rather than `process`, to populate the `p.palindromes` dictionary:
for sequence_name in some_sequences:
p.find(some_sequences[sequence_name], sequence_name=sequence_name)
# tadaaa:
print(p.palindromes)
>>> {'a_sequence': [<anvio.sequencefeatures.Palindrome object at 0x7fce807ddb00>],
'antoher_sequence': [<anvio.sequencefeatures.Palindrome object at 0x7fce807ddc88>],
'sequence_with_no_palindrome': []}
If you are a programmer and need more from this module, please let us know.
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__
tag in this file to see an example.