palindromes-txt

TXT

A TXT-type anvi’o artifact. This artifact is typically generated, used, and/or exported by anvi’o (and not provided by the user)..

🔙 To the main page of anvi’o programs and artifacts.

Provided by

anvi-search-palindromes

Required or used by

There are no anvi’o tools that use or require this artifact directly, which means it is most likely an end-product for the user.

Description

A TAB-delimited file of palindromic sequences reported by anvi-search-palindromes.

The following example is the output generated by the command below when it was run on contigs-db of the Infant Gut Dataset:

anvi-search-palindromes -c CONTIGS.db \ --min-palindrome-length 50 \ --max-num-mismatches 1 \ --output-file palindromes.txt

sequence_name length distance num_mismatches first_start first_end first_sequence second_start second_end second_sequence midline
Day17a_QCcontig1 48 0 0 195100 195148 AAGAGAAGAGGAGAAGTTCATCCATGGATGAACTTCTCCTCTTCTCTT 195100 195148 AAGAGAAGAGGAGAAGTTCATCCATGGATGAACTTCTCCTCTTCTCTT ||||||||||||||||||||||||||||||||||||||||||||||||
Day17a_QCcontig4 147 759 1 268872 269019 TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAAGCTAGAAAAA 269631 269778 TTTCGTAATACTTTTTTGCAGTAGGCATCAAATTGGTGTTGTATAGATTTCTCATTATAATTTTGTTGCATGATAATATGCTCCTTTTTCCCCTTTCCACTAATACAACAATCAGAGAGCCCCTTTTTTTCGAAAAAACTAGAAAAA |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||x|||||||||
Day17a_QCcontig4 53 1956 1 268237 268290 CAGCTGCTTTTGTCAAAAGCACATAGGAATTTCACCTCTCCCCAAGTTTACGG 270193 270246 CAGCTGCTTTTGTCAAAAGCACATAGGAATTTCACCTCTCTCCAAGTTTACGG ||||||||||||||||||||||||||||||||||||||||x||||||||||||
Day17a_QCcontig4 66 1956 1 268325 268391 ATCATCACTTTTTATTGACTATAAAAATTATTTTAGAATATTTATCGCTCCTTCTTTACGATAAGA 270281 270347 ATCATCACTTTTTATTGACTATAAAAATTATTTTAGAATGTTTATCGCTCCTTCTTTACGATAAGA |||||||||||||||||||||||||||||||||||||||x||||||||||||||||||||||||||
Day17a_QCcontig4 60 98694 1 16368 16428 AGAACAATTTTCGGAAATTCCTTCTTATTTCTCGGAGTTAAACGCTTCTGTCCCGACCTC 115062 115122 AGAACAATTTTCGGAAATTCCTTCTTATTTCTCGGAGTTAAACACTTCTGTCCCGACCTC |||||||||||||||||||||||||||||||||||||||||||x||||||||||||||||
Day17a_QCcontig16 42 0 0 105735 105777 AAAAAGAACGCTCTTTTGCTTAAGCAAAAGAGCGTTCTTTTT 105735 105777 AAAAAGAACGCTCTTTTGCTTAAGCAAAAGAGCGTTCTTTTT ||||||||||||||||||||||||||||||||||||||||||
Day17a_QCcontig23 50 0 0 51287 51337 ATAAATAAACAGAGGCCTTAGAAATATTTCTAAGGCCTCTGTTTATTTAT 51287 51337 ATAAATAAACAGAGGCCTTAGAAATATTTCTAAGGCCTCTGTTTATTTAT ||||||||||||||||||||||||||||||||||||||||||||||||||

In which,

  • sequence_name is the sequence name on which a given palindrome was found.
  • length is the the length of the palindrome.
  • distance is the number of nucleotides between the location of the palindromic sequences in the larger seqeunce.
  • num_mismatches is the number of actual nucleotides in the palindrome sequence that did not match to its counterpart when the sequence was reverse-complemented.
  • first_start is the start position of the first palindrome in the reference sequence.
  • first_end is the end position of the first palindrome.
  • second_start and second_end are just like first_start and first_end but for the second sequence. For perfect palindromes (i.e., palindromes with zero distance), these values will be identical to their counterparts in the first sequence.
  • first_sequence and second_sequence are the actual nucleotide sequences of both. They will be identical if number of mismatches are zero. Please note that only the reverse complement of the second_sequence will be found in the reference sequnce.
  • midline an array of letters that are composed of | and x characters that show where the matching and mismatching nucleotides were (if any).

Please note that the sequence_name column may not have unique sequence names if multiple palindromes found on the same sequence (which almost certainly be the case for most searches on circular genomes).

Please also note that the start and end positions are 0-indexed, which means (1) the first nucleotide in the sequence should be counted as the zeroth element, and (2) if you do this in Python using the example above, you will get the matching palindrome from the larger sequence context:

contig_sequences[Day17a_QCcontig1][195100: 195148]

>>> AAGAGAAGAGGAGAAGTTCATCCATGGATGAACTTCTCCTCTTCTCTT

Edit this file to update this information.