ecophylo-workflow

WORKFLOW

A WORKFLOW-type anvi’o artifact. This artifact is typically generated, used, and/or exported by anvi’o (and not provided by the user)..

🔙 To the main page of anvi’o programs and artifacts.

Provided by

anvi-run-workflow

Required or used by

There are no anvi’o tools that use or require this artifact directly, which means it is most likely an end-product for the user.

Description

The ecophylo-workflow explores the ECOlogical and PHYlogenetic relationships of proteins across genomes and metagenomes. It is a Snakemake workflow run by the anvi’o script anvi-run-workflow. Briefly, the workflow extracts a target protein from any number of assemblies in metagenomes and/or external-genomes using a user-designated HMM from hmm-list, then clusters the sequences and selects representatives from each cluster. Next, the workflow post-processes the representative sequences using phylogenetics and metagenomic read recruitment to produce an anvi’o interactive interface to explore their phylogenetic distances and co-occurrence across metagenomes (via simultaneous visual of a phylogenetic tree and read recruitment results). The workflow can use any protein-based HMM including single-copy core genes to taxonomically profile metagenomes or any functional protein to explore variants across samples.

The ecophylo-workflow has 2 modes which can be designated in the workflow-config by changing the input files that are provided: tree-mode and profile-mode. In tree-mode, the sequences will be used to calculate a phylogenetic tree. In profile-mode, the sequences will be used to calculate a phylogenetic tree but also profiled via read recruitment across user-provided metagenomes.

Required input

The ecophylo-workflow requires the following files:

  • workflow-config: This allows you to customize the workflow step by step. Here is how you can generate the default version:

anvi-run-workflow -w ecophylo --get-default-config config.json

Here is a tutorial walking through more details regarding the EcoPhylo workflow-config file: coming soon!

Want to explore phylogenetic relationships of proteins across assemblies? Tree-mode

This is the simplest implementation of EcoPhylo. The workflow will extract the target protein from input assemblies, cluster the sequences and pick representatives, then calculate a phylogenetic tree based on the amino acid version of the representative sequences. There are two sub-modes of tree-mode depending on how you pick representative sequences: NT-mode or AA-mode

NT-mode

This is the default version of tree-mode. Target protein sequences are clustered based on the nucleotide sequences of the proteins. This is done to prepare for profile-mode where there needs to be adequate NT sequence distance between proteins to prevent non-specific-read-recruitment.

Here is what the start of the EcoPhylo workflow-config should look like if you want to run tree-mode:

{
    "metagenomes": "metagenomes.txt",
    "external_genomes": "external-genomes.txt",
    "hmm_list": "hmm_list.txt",
    "samples_txt": ""
}

AA-mode

This is a sub-version of tree-mode where sequences that are used to calculate the tree are subsetted from amino acid cluster representatives rather than nucleotide clusters. If you are only interested in protein phylogenetics, this is the way to go.

To initialize AA-mode, go to the rule cluster_X_percent_sim_mmseqs in the EcoPhylo workflow-config and turn “AA_mode” to true:

{
    "cluster_X_percent_sim_mmseqs": {
        "threads": 5,
        "--min-seq-id": 0.94,
        "clustering_threshold_for_OTUs": [
            0.99
        ],
        "AA_mode": True
    }
}

Want to track proteins across metagenomic samples via read recruitment? Profile-mode

profile-mode is an extension of default tree-mode where NT sequences representatives are profiled with metagenomic reads from user provided metagenomic samples. This allows for the simultaneous visualization of phylogenetic and ecological relationships of proteins across metagenomic datasets.

Additional required files:

To initialize profile-mode, add the path to your samples-txt to your EcoPhylo workflow-config:

{
    "metagenomes": "metagenomes.txt",
    "external_genomes": "external-genomes.txt",
    "hmm_list": "hmm_list.txt",
    "samples_txt": "samples.txt"
}

Edit this file to update this information.