Installing anvio-dev on Linux

This page is for users who want to install the development version of anvi’o, anvio-dev, on personal computers running a Linux operating system.

Following the active development of anvi’o (you’re a wizard, arry)

This section is not quite meant to be followed by those who would define themselves as end users in a conventional sense. But we are not the kinds of people who would dare to tell anyone what they can and cannot do. FWIW, our experience suggests that if you are doing microbiology, you will do computers no problem if you find dem computers exciting.

If you follow these steps, you will have anvi’o setup on your system in such a way, every time you initialize your anvi’o environment you will get the very final state of the anvi’o code. Plus, you can have both the stable and active anvi’o on the same computer.

Nevertheless, it is important to keep in mind that there are multiple advantages and disadvantages to working with the active development branch. Advantages are obvious and include,

  • Full access to all new features and bug fixes in real-time, without having to wait for stable releases to be announced.

  • A working system to hack anvi’o and/or add new features to the code (this strategy is exactly how we develop anvi’o and use it for our science at the same time at our lab).

In contrast, disadvantages include,

  • Unstable intermediate states may frustrate you with bugs, and in extremely rare instances loss of data (this happened only once so far during the last five years, and required one of our users to re-generate their contigs databases).

  • Difficulty to mention the anvi’o version in a paper. Although this can easily be solved by sharing not the version number of anvi’o but the cryptographic hash of the last commit for reproducibility. If you ever struggle with this, please let us know and we will help you.

If you are still here, let’s start.


(1) Things you need before you start

You will need to run the installation commands from a terminal. Since your system is using Linux, you should be good to go. :)

You also need miniconda to be installed on your system. If you don’t already have it, please follow their installation instructions.

(2) Setting up the conda environment

Please note that we recently switched from Python 3.7 to Python 3.10 in our active development branch. Thus, the way we setup the conda environment for the active development branch now differs from the way we do it for the latest stable version. There may be hiccups since these changes required many adjustments in the anvi’o code, and likely some bugs were missed. If you are reading these lines, please keep us posted if you run into an issue. First make sure you are not in any environment by running conda deactivate. Then, make sure you don’t have an environment called anvio-dev (as in anvi’o development):

conda env remove --name anvio-dev

Create a new conda environment:

conda create -y --name anvio-dev python=3.10

And activate it:

conda activate anvio-dev

Install mamba for fast dependency resolving:

conda install -y -c conda-forge mamba

If the mamba installation somehow still doesn’t work, that is OK. It is also OK if some of the commands below that start with mamba don’t work. In either of these cases, you only need to replace every instance of mamba with conda, and everything should work smoothly (but with slightly longer wait times). But it would be extremely helpful to the community if you were to ping us on in the case of a mamba failure, so we better understand under what circumstances this solution fails.

Now you are in a pristine environment, in which you will install all conda packages that anvi’o will need to work properly. This looks scary, but it will work if you just copy-paste it and press ENTER:

mamba install -y -c conda-forge -c bioconda python=3.10 \
        sqlite prodigal idba mcl muscle=3.8.1551 famsa hmmer diamond \
        blast megahit spades bowtie2 bwa graphviz "samtools>=1.9" \
        trimal iqtree trnascan-se fasttree vmatch r-base r-tidyverse \
        r-optparse r-stringi r-magrittr bioconductor-qvalue meme ghostscript

# try this, if it doesn't install, don't worry (it is sad, but OK):
mamba install -y -c bioconda fastani

If you see any error messages in the output indicating that a package failed to install, you should check the ‘Common problems’ section below or search for it in the anvi’o issues page (make sure to check the ‘Closed’ issues as well) to see if we already found a solution for the error.

(3) Setting up the local copy of the anvi’o codebase

If you are here, it means you have a conda environment with everything except anvi’o itself. We will make sure this environment has anvi’o by getting a copy of the anvi’o codebase from GitHub.

Here I will suggest ~/github/ as the base directory to keep the code, but you can change if you want to something else (in which case you must remember to apply that change all the following commands, of course). Setup the code directory:

mkdir -p ~/github && cd ~/github/

Get the anvi’o code:

If you only plan to follow the development branch, and not make changes to the codebase, you can skip this message. But if you are not an official anvi’o developer yet intend to change anvi’o and send us pull requests to reflect those changes in the official repository, you may want to clone anvi’o from your own fork rather than using the following URL. Thank you very much in advance and we are looking forward to seeing your PR!

git clone --recursive https://github.com/merenlab/anvio.git

(4) Installing the Python dependencies

Some packages in requirement.txt may require to be installed with a more up to date c-compiler on Mac OSX. Hence, we suggest all Mac users to run the following commands before you start the pip install command:

export CC=/usr/bin/clang
export CXX=/usr/bin/clang++

Finally, to install the Python dependencies of anvi’o, please run the following command:

cd ~/github/anvio/
pip install -r requirements.txt

If pysam is causing you trouble during this step, you may want to try to install it with conda first by running mamba install -y -c bioconda pysam and then try the pip install command again.

You might see errors during the pip installation that include a line like Building wheel for XXXXXX did not run successfully. and also a line like error: command 'gcc' failed: No such file or directory. If this is the case, the problem is that your Linux installation does not include the GCC compiler. You can fix that by running the following commands to upgrade your system and install the compiler: sudo apt update, followed by sudo apt full-upgrade, and finally sudo apt install gcc. Once those are complete, please retry the pip install command.

Now you have the latest copy of the anvi’o codebase, and all of its dependencies are in place.

(5) Linking conda environment and the codebase

Now we have the codebase and we have the conda environment, but they don’t know about each other.

Here we will setup your conda environment in such a way that every time you activate it, you will get the very latest updates from the main anvi’o repository. While you are still in anvi’o environment, copy-paste these lines into your terminal:

cat <<EOF >${CONDA_PREFIX}/etc/conda/activate.d/anvio.sh
# creating an activation script for the the conda environment for anvi'o
# development branch so (1) Python knows where to find anvi'o libraries,
# (2) the shell knows where to find anvi'o programs, and (3) every time
# the environment is activated it synchronizes with the latest code from
# active GitHub repository:
export PYTHONPATH=\$PYTHONPATH:~/github/anvio/
export PATH=\$PATH:~/github/anvio/bin:~/github/anvio/sandbox
echo -e "\033[1;34mUpdating from anvi'o GitHub \033[0;31m(press CTRL+C to cancel)\033[0m ..."
cd ~/github/anvio && git pull && cd -
EOF

If you are using zsh by default these may not work. If you run into a trouble here or especially if you figure out a way to make it work both for zsh and bash, please let us know. To use bash to make the above command work, first run this exec bash command. Then re-run the command above. To go back to zsh you can run exec zsh command.

If everything worked, you should be able to type the following commands in a new terminal and see similar outputs:

meren ~ $ conda activate anvio-dev
Updating from anvi'o GitHub (press CTRL+C to cancel) ...

(anvio-dev) meren ~ $ which anvi-self-test
/Users/meren/github/anvio/bin/anvi-self-test

(anvio-dev) meren ~ $ anvi-self-test -v
Anvi'o .......................................: hope (v7.1-dev)
Python .......................................: 3.10.13

Profile database .............................: 38
Contigs database .............................: 21
Pan database .................................: 16
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

(anvio-dev) meren ~ $

If that is the case, you’re all set.

Every change you will make in anvi’o codebase will immediately be reflected when you run anvi’o tools (but if you change the code and do not revert back, git will stop updating your branch from the upstream).

If you followed these instructions, every time you open a terminal you will have to run the following command to activate your anvi’o environment:

conda activate anvio-dev

If you are here, you can now jump to “Check your anvi’o setup” to see if things worked for you using anvi-self-test, but don’t forget to take a look at the bonus chapter below, especially if you are using bash.

(6) Check your installation

If you are here, you are ready to check if everything is working on your system. This section will help you finalize your installation so you are prepared for anything.

The easiest way to check your installation is to run the anvi’o program anvi-self-test:

anvi-self-test --suite mini

If you don’t want anvi’o to show you a browser window at the end and quietly finish testing if everything is OK, add --no-interactive flag to the command above. Another note, anvi-self-test is run in --suite mini mode, which tests the absolute minimal features of your anvi’o installation. If you run it without any parameters, it will tests many more things.

If everything goes smoothly, your browser should pop-up and show you an anvi’o interactive interface that looks something like this once anvi-self-test is done running:

The screenshot above is from 2015 and will be vastly different from the interactive interface you should see in your browser. It is still here so we remember where we came from 😇

If you are seeing the interactive interface, it means you now have a computer that can run anvi’o! In theory you can leave this page at this moment, but there are a few more details that would be best to attend now. So please bear with this tutorial just a little longer.

Don’t forget to come say hi to us on anvi’o Discord.


(6.1) Setup key resources

This is to further prepare your anvi’o installation for things you may need later, such as databases for taxonomic annotation of your genomes or functional annotation of your genes. This is an up-to-date list of programs that you should run in your terminal to have everything ready:

(6.2) Install an automated binning algorithm in your anvi’o environment

You can skip this section if you are not interested in reconstructing genomes from metagenomes using anvi’o.

Anvi’o offers a powerful interactive environment to reconstruct genomes from metageomes where you have full control over subtle decisions. For small assemblies (i.e., where you have less than 25,000 contigs), you do not need an additional binning software to reconstruct genomes from metagenomes. But for larger metagenomes, you have two options:

  • Use the program anvi-cluster-contigs with an automatic binning software that is already installed on your system.
  • Perform automatic binning outside of anvi’o, and import the binning results as a collection into anvi’o using the program anvi-import-collection to further refine those results.

The following recipe will help you install CONCOCT on your system just so there is an automatic binning algorithm ready on your system that you can use with anvi-cluster-contigs:

# setup a place to download CONCOCT source code
mkdir -p ~/github/ && cd ~/github/

# get a clone of the CONCOCT codebase from the fork
# that is tailored for the anvi'o conda environment
git clone https://github.com/merenlab/CONCOCT.git

# build and install
cd CONCOCT
python setup.py build
python setup.py install

Please note that you may encounter an error when running CONCOCT due to a TypeError. Please see the report #2154 for more information regarding this issue. IF you run into this issue, you may be able to resolve it by running the following command in your anvi’o conda environment: pip install scikit-learn==1.1.0. developed and tested this solution, and confirmed that it works at least for v8. But please let us know if this fix breaks any other part of anvi’o :)

If everything worked, when you type the following command,

anvi-cluster-contigs -h

You should see this output (where CONCOCT is found):

If you are a developer of an automatic binning algorithm and would like to see it in anvi’o, please get in touch with us. Anvi’o can pass any information about sequences (their coverages across samples, tetranucleotide frequencies, genes, functions, and whatever else you would like to have about them) to any program to run it on user data and import the results into anvi’o databases seamlessly through simple Python wrappers. Here are some examples of such wrappers for CONCOCT, for BinSanity, and for MaxBin2. If you wish to create one but are not sure how to test it, please start a GitHub issue.

(6.3) Troubleshooting

If your browser didn’t show up, or testing stopped with errors, please take a look at the common problems others have reported and try these solutions. Please remember you can always come to anvi’o Discord to ask for help if things are not working for you and the answers you find here are no use.

I see a lot of warning messages

It is absolutely normal to see ‘warning’ messages. In general anvi’o is talkative as it would like to keep you informed. In an ideal world you should keep a careful eye on those warning messages, but in most cases they will not require action.

If anvi-self-test fails with an error message that looks something like this,

libcrypto.so.1.0.0: cannot open shared object file: no such file or directory

it is likely that the pysam module installation failed. To fix this you should revisit the installation instructions, especially the part that says “Issues related to samtools”, and then come back to testing.

My browser didn’t show up

If your browser does not show up, or does show up but can’t show anything due to a ‘network problem’, you may also want to visit the address http://localhost:8080 by manually entering this address to your browser’s address bar, which should work on your local computer. On some systems the default network interface anvi’o uses to connect to its own server causes issues. You may also find the help page for anvi-interactive useful for future references.

If your browser does not show up while you are connected to a remote computer, it is quite normal. In some cases a text-based browser may show up instead of your graphical browser, too. This is becasue you are running anvi’o on another computer, and it tries to open a browser there. You can set things up for anvi’o to use your local browser to access to an anvi’o interactive interactive interface running remotely. For that, you can read this article (or ask your systems administrator to read it) to learn how you can forward displays from servers to your personal computer.

Browser shows up, but anvi’o complains about Chrome

If you are not using Chrome as your default browser, anvi’o will complain about it :/ We hate the idea of asking you to change your browser preferences for anvi’o :( But currently, Chrome maintains the most efficient SVG engine among all browsers we tested as of 2021. For instance, Safari can run the anvi’o interactive interface, however it takes orders of magnitude more time and memory compared to Chrome. Firefox, on the other hand, doesn’t even bother drawing anything at all. Long story short, the anvi’o interactive interface will not perform optimally with anything but Chrome. So you need Chrome. Moreover, if Chrome is not your default browser, every time interactive interface pops up, you will need to copy-paste the address bar into a Chrome window.

You can learn what is your default browser by running this command in your terminal:

python -c 'import webbrowser as w; w.open_new("http://")'

Everything is fine, but I can’t find anvi’o commands in a new terminal

If you open a new terminal and get command not found error when you run anvi’o commands, it means you need to activate anvi’o conda environment by running the following command (assuming that you named your conda environment for anvio as anvio-8, but you can always list your conda environments by running conda env list):

conda activate anvio-8

If you are getting an error that goes like,

Config Error: Something went wrong during the functional enrichment analysis :( We don't know
              what happened, but this log file could contain some clues: (...)

it often means that the R libraries that are needed to run functional enrichment analyses are not installed properly through conda :/ Luckily, you can try to install them using the R terminal as Marco Gabrielli shared on anvi’o Discord. For this, try running this command in your terminal:

Rscript -e 'install.packages(c("stringi", "tidyverse", "magrittr", "optparse"), repos="https://cloud.r-project.org")'

If everything goes alright, you can quit the R terminal by pressing CTRL+D twice. Once you are out, you can run this command to see if everything runs smoothly:

Rscript -e "library('tidyverse')"

In some cases the problem is the qvalue package, which can be a pain to install. If you are having hard time with that one, you can try this and see if that solves it:

Rscript -e 'install.packages("BiocManager", repos="https://cran.rstudio.com"); BiocManager::install("qvalue")'

Now you can take a look up some anvi’o resources here, or join anvi’o Discord to be a part of our growing community.

Bonus: An alternative BASH profile setup

This section is written by Meren and reflects his setup on a Mac system that runs miniconda where bash is setup as the default shell. If you are using another shell and if you would like to share your solution, please send a PR!

This is all personal taste and they may need to change from computer to computer, but I added the following lines at the end of my ~/.bash_profile to easily switch between different versions of anvi’o on my Mac system:

# This is where my miniconda base is, you can find out
# where is yours by running this in your terminal:
#
#    conda env list | grep base
#
export MY_MINICONDA_BASE="/Users/$USER/miniconda3"

init_anvio_7 () {
    deactivate &> /dev/null
    conda deactivate &> /dev/null
    export PATH="$MY_MINICONDA_BASE/bin:$PATH"
    . $MY_MINICONDA_BASE/etc/profile.d/conda.sh
    conda activate anvio-7.1
    export PS1="\[\e[0m\e[47m\e[1;30m\] :: anvi'o v7.1 :: \[\e[0m\e[0m \[\e[1;32m\]\]\w\[\e[m\] \[\e[1;31m\]>>>\[\e[m\] \[\e[0m\]"
}


init_anvio_dev () {
    deactivate &> /dev/null
    conda deactivate &> /dev/null
    export PATH="$MY_MINICONDA_BASE/bin:$PATH"
    . $MY_MINICONDA_BASE/etc/profile.d/conda.sh
    conda activate anvio-dev
    export PS1="\[\e[0m\e[40m\e[1;30m\] :: anvi'o v7.1 dev :: \[\e[0m\e[0m \[\e[1;34m\]\]\w\[\e[m\] \[\e[1;31m\]>>>\[\e[m\] \[\e[0m\]"
}

alias anvio-7.1=init_anvio_7
alias anvio-dev=init_anvio_dev

You can either open a new terminal window or run source ~/.bash_profile to make sure these changes take effect. Now you should be able to type anvio-7.1 to initialize the stable anvi’o, and anvio-dev to initialize the development branch of the codebase.

Here is what I see in my terminal for anvio-7.1:

meren ~ $ anvi-self-test -v
-bash: anvi-self-test: command not found

meren ~ $ anvio-7.1

:: anvi'o v7.1 :: ~ >>>

:: anvi'o v7.1 :: ~ >>> anvi-self-test -v
Anvi'o .......................................: hope (v7.1)

Profile database .............................: 38
Contigs database .............................: 20
Pan database .................................: 15
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 2
tRNA-seq database ............................: 2

Or for anvio-dev:

meren ~ $ anvi-self-test -v
-bash: anvi-self-test: command not found

:: anvi'o v7.1 :: ~ >>> anvio-dev

:: anvi'o v7.1 dev :: ~ >>>

:: anvi'o v7.1 dev :: ~ >>> anvi-self-test -v
Anvi'o .......................................: hope (v7.1-dev)
Python .......................................: 3.10.12

Profile database .............................: 38
Contigs database .............................: 21
Pan database .................................: 16
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

But please note that both aliases run deactivate and conda deactivate first, and they may not work for you especially if you have a fancy setup.


If you find a mistake on this page or would you like to update something in it, please feel free to edit its source by clicking the edit button at the top-right corner (which you will see if you are logged in to GitHub) 😇