anvi-run-globdb-functions

Annotate genes with gene family functions derived from GlobDB with gene family-level cutoffs determined by empirical Local Alignment Score Ratio (LASR) thresholds..

🔙 To the main page of anvi’o programs and artifacts.

Authors

Can consume

globdb-data contigs-db

Can provide

functions

Usage

This program annotates genes with functions from the GlobDB gene family database. It requires a globdb-data artifact produced by anvi-setup-globdb-functions.

GlobDB is a curated database of microbial genomes, and offers a set of gene families, also curated and accompanied by gene-family-level Local Alignment Score Ratio (LASR) cutoffs. anvi-run-globdb-functions uses these cutoffs to decide whether each hit is a genuine annotation or noise. Hits that pass the threshold are stored in your contigs database under the function annotation source GlobDB.

Annotate a contigs database

anvi-run-globdb-functions -c contigs-db

Use --num-threads for faster DIAMOND searches on multi-core systems.

Annotate a FASTA file of amino acid sequences

anvi-run-globdb-functions --fasta-file my_proteins.faa \ --output-file my_annotations.txt

Custom data directory

If your globdb-data lives in a non-default location, point anvi’o to it:

anvi-run-globdb-functions -c contigs-db \ --globdb-data-dir /path/to/globdb/data

Or set the environment variable ANVIO_GLOBDB_DATA_DIR once and omit the flag.

How the cutoffs work

For each DIAMOND hit, anvi’o computes a Local Alignment Score Ratio (LASR), the ratio of the raw DIAMOND alignment score to the theoretical maximum self-alignment score of the query sequence computed from BLOSUM45 diagonal values as implemented by the GlobDB folk (which includes Daan Speth et al.). This is then compared against the LASR threshold (lasr), selfmin, and selfmax values stored in the gene family’s YAML entry:

  • A query whose self-alignment score falls in the expected range (selfmin–selfmax) for the family is classified as correct_length if the BSR passes the threshold.
  • Queries shorter or longer than expected are classified as too_short or too_long respectively; these still pass and are annotated, but the classification is noted in the function description.
  • Hits that do not reach the BSR threshold are labeled below_cutoff and are silently discarded.

This is how cutoffs work (broadly speaking), and as soon as there is a resource to cite here, we will update this information.

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.