The anvi'o 'trnaseq' workflow

Authors
Artifacts accepted
Artifacts produced
Third party programs
Workflow description and usage
Input

Process transfer RNA transcripts from tRNA-seq datasets

The trnaseq workflow takes in raw paired-end sequencing data generated from trna-seq libraries (i.e., the direct sequencing of transfer RNA transcripts from cultures or environmental samples), and processes these data to identify tRNA sequences and their structural features, predict chemical modification sites and modification fractions across samples, assign taxonomy to tRNA transcript seeds, and generate tables and summary data for downstream analyses. The tRNA-seq resources in anvi'o are operational, however, they are experimental. If you have datasets that are suitable for analysis, pelase consider getting in touch with us first.

🔙 To the main page of anvi’o programs and artifacts.

Authors

Samuel Miller

Artifacts accepted

The trnaseq can typically be initiated with the following artifacts:

workflow-config samples-txt

Artifacts produced

The trnaseq typically produce the following anvi’o artifacts:

trnaseq-db trnaseq-contigs-db trnaseq-profile-db trnaseq-seed-txt modifications-txt

Third party programs

This is a list of programs that may be used by the trnaseq workflow depending on the user settings in the workflow-config :

illumina-utils (QC and merging of tRNA transcripts)
Bowtie2 (Mapping transcripts to tRNA seeds)

An anvi’o installation that follows the recommendations on the installation page will include all these programs. But please consider your settings, and cite these additional tools from your methods sections.

Workflow description and usage

The tRNA-seq workflow is a Snakemake workflow run by anvi-run-workflow.

The workflow can run the following programs in order:

Illumina-utils, for merging paired-end reads and quality control
anvi-script-reformat-fasta, for making FASTA deflines anvio-compliant
anvi-trnaseq, for predicting tRNA sequences, structures, and modification sites in each sample
anvi-merge-trnaseq, for predicting tRNA seed sequences and their modification sites from the set of samples
anvi-run-trna-taxonomy, for assigning taxonomy to tRNA seeds
anvi-tabulate-trnaseq, for generating tables of seed and modification information that are easily manipulated

Input

The tRNA-seq workflow requires two files to run: a workflow-config config file and a samples-txt. You can obtain a ‘default’ config file for this workflow to further edit using the following command.

anvi-run-workflow -w trnaseq \ --get-default-config config.json

Different “rules,” or steps, of the workflow can be turned on and off as needed in the config file. The workflow can be restarted at intermediate rules without rerunning prior rules that have already completed.

samples-txt will contain a list of FASTQ or FASTA files and associated information on each library. FASTQ files contain unmerged paired-end tRNA-seq reads. Reads are merged in the workflow by Illumina-utils. FASTA files contain merged reads, and the initial read-merging steps in the workflow are skipped.

Here is an example tRNA-seq samples file with FASTQ inputs.

sample	treatment	r1	r2	r1_prefix	r2_prefix
ecoli_A1_noDM	untreated	FASTQ/ecoli_A1_noDM.r1.fq.gz	FASTQ/ecoli_A1_noDM.r2.fq.gz	NNNNNN	TTCCAGT
ecoli_A1_DM	demethylase	FASTQ/ecoli_A1_DM.r1.fq.gz	FASTQ/ecoli_A1_DM.r2.fq.gz	NNNNNN	TCTGAGT
ecoli_B1_noDM	untreated	FASTQ/ecoli_B1_noDM.r1.fq.gz	FASTQ/ecoli_B1_noDM.r2.fq.gz	NNNNNN	TGGTAGT
ecoli_B1_DM	demethylase	FASTQ/ecoli_B1_DM.r1.fq.gz	FASTQ/ecoli_B1_DM.r2.fq.gz	NNNNNN	CTGAAGT

The treatment column is optional. The treatment indicates a chemical application, such as demethylase, and can be used to have a bearing on seed sequence determination in anvi-merge-trnaseq. In the absence of a treatment column, all samples are assigned the same treatment, which can be specified in the anvi_trnaseq section of the workflow config file and defaults to untreated.

Read 1 and 2 prefix columns are also optional. These represent sequences that Illumina-utils should identify and trim from the start of the read. In the example, the read 1 prefix is a unique molecular identifier (UMI) of 6 random nucleotides, and the read 2 prefix is a sample barcode. Illumina-utils will discard the paired-end read if the prefix is not found. In the example, the read 1 UMI will always be found, but the read 2 barcode must match exactly.

Here is an equivalent tRNA-seq samples file with FASTA inputs.

sample	treatment	fasta
ecoli_A1_noDM	untreated	FASTA/ecoli_A1_noDM.fa.gz
ecoli_A1_DM	demethylase	FASTA/ecoli_A1_DM.fa.gz
ecoli_B1_noDM	untreated	FASTA/ecoli_B1_noDM.fa.gz
ecoli_B1_DM	demethylase	FASTA/ecoli_B1_DM.fa.gz

Note that barcodes and other sequence prefixes should already be trimmed from FASTA sequences.

Edit this file to update this information.