anvi-search-palindromes [program]

A program to find palindromes in sequences.

🔙 To the main page of anvi’o programs and artifacts.

Authors

A. Murat Eren (Meren)

Can consume

dna-sequence fasta contigs-db

Can provide

palindromes-txt

Usage

This program finds palindromes in any DNA sequence. It will search for palindromes that mathes criteria listed by the user (i.e., minimum lenght of the palindromic sequences, maximum number of mismatches, and minimum distance between the two palindromic regions). The program will print out its findings (and tribulations) and will optionally report the search results as a palindromes-txt.

Please note that this program can find both perfect palindromes (i.e., the identity and order of nucleotides on one strand match to those on the complementary strand) and special cases of palindromes that form hairpins. You can use the minimum distance parameter to target any group of palindromes (i.e., minimum distance of 0 will report only perfect palindromes).

The speed of the algorithm will depend on the minimum palindrome length parameter. The shorter the palindrome length, the longer the processing time. Searching for palindromes longer than 50 nts in a 10,000,000 nts long sequence takes about 4 seconds on a laptop.

Sequence input sources

anvi-search-palindromes can use multiple different sequence sources.

Contigs database

In this mode anvi-search-palindromes will go through every contig sequence in a given contigs-db.

anvi-search-palindromes -c contigs-db \ --output-file palindromes-txt

FASTA file

Alternatively, you can use a fasta file as input.

anvi-search-palindromes --fasta-file fasta \ --output-file palindromes-txt

DNA sequence

Those who are lazy can also pass a DNA sequence for quick searches:

anvi-search-palindromes --dna-sequence (.. A DNA SEQUENCE OF ANY LENGTH ..)

Verbose output

If you provide an --output-file parameter, your results will be stored into a palindromes-txt file for downstream analyses. If you do not provide an output file, or explicitly asked for a verbose output with the flag --verbose, you will see all your palindromes listed on your screen.

Here is an example with a single sequence and no output file path:

anvi-search-palindromes --dna-sequence CATTGACGTTGACGGCGACCGGTCGGTGATCACCGACCGGTCGCCGTCAACGTCAATG

SEARCH SETTINGS
===============================================
Minimum palindrome length ....................: 10
Number of mismatches allowed .................: 0
Minimum gap length ...........................: 0
Be verbose? ..................................: Yes


58 nts palindrome"
===============================================
1st sequence [start:stop] ....................: [0:58]
2nd sequence [start:stop] ....................: [0:58]
Number of mismatches .........................: 0
Distance between .............................: 0
1st sequence .................................: CATTGACGTTGACGGCGACCGGTCGGTGATCACCGACCGGTCGCCGTCAACGTCAATG
ALN ..........................................: ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2nd sequence .................................: CATTGACGTTGACGGCGACCGGTCGGTGATCACCGACCGGTCGCCGTCAACGTCAATG

SEARCH RESULTS
===============================================
Total number of sequences processed ..........: 1
Total number of palindromes found ............: 1
Longest palindrome ...........................: 58
Most distant palindrome ......................: 0

Here is another example with a contigs-db, an output file path, and the --verbose flag:

anvi-search-palindromes -c CONTIGS.db \ --min-palindrome-length 50 \ --max-num-mismatches 1 \ --output-file palindromes.txt \ --verbose

SEARCH SETTINGS
===============================================
Minimum palindrome length ....................: 50
Number of mismatches allowed .................: 1
Minimum gap length ...........................: 0
Be verbose? ..................................: Yes

147 nts palindrome"
===============================================
1st sequence [start:stop] ....................: [268872:269019]
2nd sequence [start:stop] ....................: [269631:269778]
Number of mismatches .........................: 1
Distance between .............................: 759
1st sequence .................................: TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAAGCTAGAAAAA
ALN ..........................................: |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||x|||||||||
2nd sequence .................................: TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAAACTAGAAAAA

SEARCH RESULTS
===============================================
Total number of sequences processed ..........: 11
Total number of palindromes found ............: 1
Longest palindrome ...........................: 147
Most distant palindrome ......................: 759

Output file ..................................: palindromes.txt

Programmer access

Just like everything else in anvi’o, you can access the functionality the program anvi-search-palindromes offers without using the program itself by inheriting an instance from the Palindromes class and use it in your own Python scripts.

Here is an example, first with an input file and then an ad hoc sequence. Starting with the file (i.e., an anvi’o contigs-db):

# import argparse to pass arguments to the class
import argparse

# `Palindromes` is the class we need
from anvio.sequencefeatures import Palindromes

# we also import `Progress` and `Run` helper classes from the terminal
# module to ask the class to print no output messages to our workspace
# (this is obviously optional)
from anvio.terminal import Progress, Run

# get an instance for the case of a contigs database, and process everything in it.
# this example is with an anvi'o contigs db, but you can also pass a FASTA file
# via `fasta_file='FILE.fa'` instead of `contigs_db='CONTIGS.db'`:
p = Palindromes(argparse.Namespace(contigs_db='CONTIGS.db', min_palindrome_length=50), run=Run(verbose=False), progress=Progress(verbose=False))
p.process()

Once the processing is done, the palindromes are stored in a member dictionary, which contains a key for each sequence:

print(p.palindromes)

>>> {'Day17a_QCcontig1' : [],
     'Day17a_QCcontig2' : [],
     'Day17a_QCcontig4' : [<anvio.sequencefeatures.Palindrome object at 0x7f8d6072f278>],
     'Day17a_QCcontig6' : [],
     'Day17a_QCcontig10': [], 
     'Day17a_QCcontig16': [],
     'Day17a_QCcontig23': [],
     'Day17a_QCcontig24': [],
     'Day17a_QCcontig45': [],
     'Day17a_QCcontig54': [],
     'Day17a_QCcontig97': []}

Non-empty arrays are the proper palindromes found in a given sequence, described with an instance of the class Palindrome which is defined as the following:

class Palindrome:
    def __init__(self, run=terminal.Run()):
        self.run=run
        self.first_start = None
        self.fisrt_end = None
        self.first_sequence = None
        self.second_start = None
        self.second_end = None
        self.second_sequence = None
        self.num_mismatches = None
        self.length = None
        self.distance = None
        self.midline = ''

Not only you can access to each member variable to deal with them, you can easily display the contents of one using the display() function:

palindrome = p.palindromes['Day17a_QCcontig4'][0]
print(palindrome)

>>> TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAA (268872:269009) :: TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAA (269631:269768)

palindrome.display()

>>> 137 nts palindrome"
>>> ===============================================
>>> 1st sequence [start:stop] ....................: [268872:269009]
>>> 2nd sequence [start:stop] ....................: [269631:269768]
>>> Number of mismatches .........................: 0
>>> Distance between .............................: 759
>>> 1st sequence .................................: TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAA
>>> ALN ..........................................: |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>>> 2nd sequence .................................: TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAA

Alternatively you can process an ad hoc sequence without any input files,

p = Palindromes()

# let's set some values for fun,
p.min_palindrome_length = 14
p.max_num_mismatches = 1

# to go through some sequences of your liking:
some_sequences = {'a_sequence': 'CATTGACGTTGACGGCGACCGGTCGGTGATCACCGACCGGTCGCCGTCAACGTCAATG',
                  'antoher_sequence': 'AAATCGGCCGATTT',
                  'sequence_with_no_palindrome': 'AAAAAAAAAAAAAA'}

# in this case (where there are no input files) you can call the function `find`,
# rather than `process`, to populate the `p.palindromes` dictionary:
for sequence_name in some_sequences:
    p.find(some_sequences[sequence_name], sequence_name=sequence_name)

# tadaaa:
print(p.palindromes)

>>> {'a_sequence': [<anvio.sequencefeatures.Palindrome object at 0x7fce807ddb00>],
     'antoher_sequence': [<anvio.sequencefeatures.Palindrome object at 0x7fce807ddc88>],
     'sequence_with_no_palindrome': []}

If you are a programmer and need more from this module, please let us know.

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.