anvi-compute-trnaseq-functional-affinity

Relate changes in tRNA-seq seed abundances to the codon usage of gene functions.

🔙 To the main page of anvi’o programs and artifacts.

Authors

Can consume

trnaseq-contigs-db contigs-db

Can provide

This program does not seem to provide any artifacts. Such programs usually print out some information for you to see or alter some anvi’o artifacts without producing any immediate outputs.

Usage

This program relates changes in tRNA-seq seed abundances to the codon usage of gene functions. The rate of translational elongation generally increases when the dynamic tRNA pool is better at translating the set of codons in an mRNA transcript: “affinity” means the correlation between the pool of measured tRNAs and codon usage of genes or functional groups of genes.

Input data

Seeds, or predicted tRNA transcripts, are the end product of the anvi’o trnaseq workflow. Seeds are matched with underlying tRNA genes in paired (meta)genomic samples by the program anvi-integrate-trnaseq, storing trna-gene-hits in the trnaseq-contigs-db. Seed coverage data across tRNA-seq samples are stored in a consolidated, easily parsable seeds-specific-txt table that is produced by the program anvi-tabulate-trnaseq (“specific” as opposed to “nonspecific” coverage is explained in the trnaseq-profile-db artifact). anvi-compute-trnaseq-functional-affinity requires the trnaseq-contigs-db, the corresponding seeds-specific-txt, a (meta)genomic contigs-db, and a codon-frequencies-txt generated from the (meta)genomic contigs-db by anvi-get-codon-frequencies.

The (meta)genomic contigs-db must be annotated with KEGG functions by the program anvi-run-kegg-kofams, and the modules-db that is used must include KEGG BRITE annotations. If the modules-db is outdated, it can be recreated by rerunning anvi-setup-kegg-data. BRITE is a comprehensive hierarchical classification of gene functions. Translational affinity is calculated not only for individual genes but also for every possible grouping of genes in BRITE.

When analyzing samples with multiple populations (a metagenome rather than a single genome), a collection of bins will be inferred from the trnaseq-contigs-db, as a collection should have been specified in running anvi-integrate-trnaseq to link tRNA transcripts to genes in each population. This collection likewise constrains the functional genes analyzed by the present program in each genome.

If the (meta)genomic contigs-db does not either represent a single genome or a metagenome with collection specified when running anvi-integrate-trnaseq, this program will produce irrational results.

Translational affinity metric

Wobble weights

Translational affinity is measured by a metric similar in principle to the tRNA Adaptation Index (tAI) (dos Reis et al., 2004), based on tRNA gene copy number, and the derivative tRNA tpm (Wei et al., 2019), based on transcript abundances from well-characterized cultures. tAI relies on empirical estimates of “selective constraints on the efficiency of codon-anticodon coupling” – what I will call translational efficiencies. Watson-Crick base pairs (A-U, C-G) have 100% translational efficiency, whereas wobble pairs have lower efficiencies, e.g., G-U has an efficiency of ~40% (Sabi and Tuller, 2017).

Due to significant uncertainties in the estimation of wobble weights, it is valuable to understand the sensitivity (or insensitivity) of our translational affinity measurements to this parameterization. The program provides an option to use the wobble weight estimates of Sabi and Tuller, 2017 (the default), to set wobble weights all equal to 1 (maximum decoding efficiency), or to set them all equal to 0 (no wobble decoding).

For the curious, wobble weights are inferred by optimizing the correlation between tAI and codon usage bias (CUB), assuming that highly expressed genes are better adapted to the tRNA pool and have higher CUB, or a more skewed distribution toward codons that are decoded fastest. Ribosomal, glycolytic, and other core genes required for growth generally have the highest CUB in the genome, though this does not mean that genes with other functions have a random distribution of codons. Instead, the differential use of rarer codons in other genes enables adaptation of the tRNA pool to those genes in different circumstances, as can be measured using tRNA-seq.

Reference sample requirement

The nature of tRNA-seq data requires comparison between samples to measure translational affinity. The user must provide two or more tRNA-seq samples in the trnaseq-contigs-db/seeds-specific-txt, and specify one sample as the reference to which the other samples are compared. One example is a time series of tRNA-seq samples, with the first sample used as the reference. Another example is a stress response experiment in which cultures of an organism are subjected to different stresses, with an unstressed control sample used as the reference. By default, all other samples present in the inputs are compared to the reference, but the user can instead use a subset of samples, e.g., the program can be run twice with a subset of demethylase-treated samples and a subset of untreated samples though all of these samples were used in determining the tRNA-seq seeds.

The need for sample comparison is due to biases in measurement of different tRNA species, especially biases introduced in reverse transcription of tRNA to cDNA. Biases prevent the quantification of tRNA relative abundances in a single sample: tRNA-Ala-AGC, tRNA-Ala-UGC, and tRNA-Tyr-GUA isoacceptor relative abundances of 20%, 0.1%, and 3%, respectively, in a sample may have little relation to reality. One source of bias is the response of reverse transcriptase to different modifications, with some modifications impeding the progress of the enzyme, causing it to stall or fall off the tRNA template, and other modifications efficiently read through by the enzyme or read through with a nucleotide error introduced into the sequence. (Such errors are informative, pointing to the existence of nucleotide modifications, and are exploited by the anvi’o trnaseq workflow.) Another source of bias beside modification impediments may be the bimodal length distribution of tRNAs, with some tRNAs having ~15 more nucleotides in the variable loop than the others, giving reverse transcriptase more chances to fall off the template before processing the anticodon arm from the 3’ end, the default minimum span for a tRNA read to be analyzed and retained by the anvi’o trnaseq workflow. We assume that the biases are similar across samples in an experiment despite potential changes in the levels of certain modifications. Therefore, we measure changes in tRNA isoacceptor abundances relative to the reference sample to understand tRNA pool dynamics. Translational affinity of a gene must be interpreted in this light: the gene is favored or disfavored for elongation by changes in the tRNA pool relative to the reference sample.

Example: Sample 1 (reference) has tRNA-Ala-AGC, tRNA-Ala-UGC, and tRNA-Tyr-GUA relative abundances of 20%, 0.10%, and 3.0%, respectively; Sample 2 has relative abundances of 30%, 0.08%, and 2.5%; and Sample 3 has relative abundances of 15%, 0.13%, and 3.8%. The fold changes in abundance for Sample 2 vs. 1 are 1.5, 0.80, and 0.83, and the changes in Sample 3 vs. 1 are 0.75, 1.6, and 1.3.

Calculation

tRNA dynamics are compared to the the (static) codon composition of genes using vector arithmetic. The fold changes in measured isoacceptor relative abundances are stored in vector T. In our example, T = [1.5, 0.80, 0.83] for Sample 2 (vs. 1), and T = [0.75, 1.6, and 1.3] for Sample 3 (vs. 1). The weighted frequencies of codons translated by these isoacceptors are stored in vector C. Consider the three isoacceptors in our example: Ala-AGC, Ala-UGC, and Tyr-GUA. The AGC anticodon is fully modified to IGC in almost all organisms. Ala-IGC translates GCU with an empirical efficiency (weight) of 1, GCC with an efficiency of ~0.58, and GCA with an efficiency of ~0.24. Take the weighted sum of the relative frequencies of these codons in the gene or functional group of genes: say that Gene A in Bin (MAG) M contains 1,000 nucleotides, and the weighted sum of the three codons, respectively, is (34/1000)1 + (21/1000)0.58 + (25/1000)*0.24 = 0.052. Say that the values for Ala-UGC and Tyr-GUA are 0.010 and 0.035, making C = [0.052, 0.010, 0.035]. The dot product of T and C defines affinity, A. For Sample 2 (vs. 1) A = T • C = 1.5 * 0.052 + 0.80 * 0.010 + 0.83 * 0.035 = 0.12, and for Sample 3 (vs. 1) A = 0.75 * 0.052 + 1.6 * 0.010 + 1.3 * 0.035 = 0.10. The tRNA pool of Sample 2 has a higher affinity to Gene A than the tRNA pool of Sample 3 – again, all relative to reference Sample 1.

Note that A for Sample 2 vs. 1, taking these as arbitrary samples, can also be calculated by taking the dot product of Sample 2 isoacceptor abundances (not relative to Sample 1) and Gene A codon usage, and then dividing by the dot product of Sample 1 isoacceptor abundances and Gene A codon usage. In other words, it does not matter whether normalization to the reference sample occurs before or after isoacceptor abundances are dotted by codon usage.

Robustness

Unfortunately, tRNA-seq experiments, especially from complex samples containing many populations, rarely yield a complete profile of the tRNA species in any organism, which is why we do not use the tRNA tpm metric of Wei et al., 2019. The more tRNAs measured, the more robustly affinity estimates the preference of the tRNA pool for the gene. In anvi-compute-trnaseq-functional-affinity, the user can change the threshold number of detected tRNA isoacceptors required to calculate affinity for a genome (default of 4). Example: The threshold is set to 3 isoacceptors and multiple MAGs were reconstructed from the metagenome, including MAG L. tRNA-seq reference Sample 1 detected 3 isoacceptors in MAG L, Sample 2 detected 3 isoacceptors, one of which is not found in Sample 1, and Sample 3 detected 5 isoacceptors, including the 3 isoacceptors of Sample 1. Affinity could not be measured in Sample 2 given only 2 isoacceptors are shared with Sample 1, and affinity would be measured in Sample 3 using the 3 isoacceptors in common with Sample 1.

Furthermore, a coverage threshold for detection of a tRNA isoacceptor is set to exclude very low abundance isoacceptors, with the default value being a specific coverage of 10 reads at the 3’ end (discriminator nucleotide) of the isoacceptor seeds (“specific” as opposed to “nonspecific” coverage is explained in the trnaseq-profile-db artifact). The detection threshold must be satisfied in both the sample and reference sample for an isoacceptor to be considered.

To estimate the robustness of affinity to tRNA pool incompleteness, the metric can be recalculated leaving out isoacceptors. In the example, affinity could be recalculated with combinations of two isoacceptors or one isoacceptor rather than the three isoacceptors isoacceptors measured. “Rarefied” affinities can be plotted against the number of retained isoacceptors – ideally, with numerous detected isoacceptors, affinity would approach an asymptote as the number of subsampled isoacceptors increased. In anvi-compute-trnaseq-functional-affinity, the user can trigger rarefaction subsampling.

Standardization

It is useful to compare translational affinities across genes (or functional groups) to understand which genes are most favored or disfavored for translation given the change in the tRNA pool (relative to the reference sample). Sample A may have a similar or different ordering of genes by affinity than Sample B. It can be more informative to standardize affinities within a sample before comparing to another sample: the affinity metric, for example, is affected by the set of isoacceptors measured for a given sample, and so the magnitude of affinities may be generally greater or lesser for one sample than another though the order of genes by affinity is similar. Standardization of affinity on a scale of zero to one highlights differences between genes and between samples. We call standardized affinity S = (A - A(min gene or function)) / (A(max gene or function) - A(min gene or function)).

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.