anvi-merge-trnaseq

This program processes one or more anvi'o tRNA-seq databases produced by anvi-trnaseq and outputs anvi'o contigs and merged profile databases accessible to other tools in the anvi'o ecosystem. Final tRNA "seed sequences" are determined from a set of samples. Each sample yields a set of tRNA predictions stored in a tRNA-seq database, and these tRNAs may be shared among the samples. tRNA may be 3' fragments and thereby subsequences of longer tRNAs from other samples which would become seeds. The profile database produced by this program records the coverages of seeds in each sample. This program finalizes predicted nucleotide modification sites using tunable substitution rate parameters..

🔙 To the main page of anvi’o programs and artifacts.

Authors

Can consume

trnaseq-db

Can provide

trnaseq-contigs-db trnaseq-profile-db

Usage

This program finds tRNA seed sequences from a set of tRNA-seq samples.

This program follows anvi-trnaseq in the trnaseq-workflow. anvi-trnaseq is run on each tRNA-seq sample, producing sample trnaseq-dbs. A tRNA-seq database contains predictions of tRNA sequences, structures, and modification sites in the sample. anvi-merge-trnaseq takes as input the tRNA-seq databases from a set of samples. It compares tRNAs predicted from the samples, finding those in common and calculating their sample coverages. The final tRNA sequences predicted from all samples are called tRNA seeds and function like contigs in metagenomic experiments. Seeds are stored in a trnaseq-contigs-db and sample coverages are stored in a trnaseq-profile-db. These databases are variants of normal contigs-dbs and profile-dbs, performing similar functions in the anvi’o ecosystem but containing somewhat different information.

Most of the heavy computational work in the trnaseq-workflow is performed by anvi-trnaseq. anvi-merge-trnaseq is meant run relatively quickly, allowing its parameters to be tuned to fit the dataset.

The anvi-merge-trnaseq --help menu provides detailed explanations of the parameters controlling the multifacted analyses performed by the program.

Key parameters

Number of reported seeds

One key parameter is the number of reported tRNA seed sequences (--max-reported-trna-seeds). The default value of 10,000 seeds is more appropriate for a complex microbial community than a pure culture of a bacterial isolate, which should yield a number of tRNA seeds equal to the number of expressed tRNAs, say ~30. Sequence artifacts may be reported in addition to the 30 actual tRNAs with a higher value like 10,000. Artifacts are relatively common despite intensive screening by anvi-trnaseq and anvi-merge-trnaseq due to nontemplated nucleotides and modification-induced mutations introduced into tRNA-seq reads by reverse transcription. In practice, artifacts are easy to distinguish from true tRNA seeds by analyzing seed coverage in anvi-interactive and checking seed homology to reference databases, among other measures.

Modification filters

Other key parameters, --min-variation and --min-third-fourth-nt, determine the coverage cutoffs that distinguish predicted positions of modified nucleotides from single nucleotide variants. Compared to SNVs, modifications typically produce higher nucleotide variability to three or four different nucleotides. However, modification-induced mutations are often highly skewed to one other nucleotide rather than all three mutant nucleotides. Furthermore, the high coverage of seeds in many tRNA-seq libraries can uncover SNVs with a low-frequency third nucleotide rather than the expected two. Some SNVs that are wrongly called modifications can be easily spotted in anvi-interactive and the output of anvi-plot-trnaseq due to covariation at two positions in the seed as a result of base pairing. In other words, SNV frequencies are equivalent at the two base paired positions in every sample, where modification artifacts have no effect on nucleotide variability at another position across the molecule.

Examples

Merge two samples.

anvi-merge-trnaseq trnaseq_database_1 trnaseq_database_2 (…) \ -o OUTPUT_DIRECTORY \ -n PROJECT_NAME \

Merge two samples with and without demethylase treatment, giving priority to the demethylase split in calling the underlying nucleotide at modified positions.

anvi-merge-trnaseq untreated_trnaseq_database demethylase_trnaseq_database (…) \ -o OUTPUT_DIRECTORY \ -n PROJECT_NAME \ --preferred-treatment demethylase

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.