Conda package manager#

Content from this lesson has been inspired and adapted from a number of sources including:

Introduction#

Conda is an open source package management and environment management system that runs on multiple operating systems (Windows, Linux, macOS). Its features include:

  • Quickly installing, running and updating packages and their dependencies.

  • Easily creating, saving, loading and switching between environments on your local computer.

While it was created for Python programs, it can package and distribute software for any language. It is a tool that helps you find and install packages, but also lets you manage different software environments where you can install different configurations of packages. For example, this enables you to install different versions of Python in two separate environments without creating incompatibities in either of those projects.

Conda, Miniconda, Miniforge and Anaconda

It’s common to be confused when confronted with Conda, Miniconda, Miniforge, Anaconda.org, and Anaconda.com. Conda is specifically the package and environment manager tool itself, and is open source. Miniconda and Anaconda are distributions provided by Anaconda.com which require a commercial license for use except in certain cases, while Anaconda.org is a repository of packages available for download. The default channel on the Anaconda repository is also covered by the commercial license; however, the channel conda-forge and other community channels are available outside this license. Miniforge is an open-source install of Conda separate from Anaconda.

Adapted from Introduction to Conda for Data Scientists

That’s a lot of information, so here’s the key parts you need to know:

  • Conda is the name of the software.

  • We recommend you install Conda with Miniforge, a fast and open-source distribution.

Conda is widely used across scientific computing and data science based domains due it’s well populated package ecosystem and environment management capabilities.

  • Conda installs prebuilt packages, which allows for installing complicated packages in one step because someone else has built the tool with the right compilers and libraries

  • The cross platform nature of Conda allows for users to more easily share the environments. This helps researchers share their computational environment along side their data and analysis, helping improve the reproducibility of their research

  • Conda also provides access to widely used machine learning and data science libraries such as TensorFlow, SciPy, NumPy that are available as pre-configured, hardware specific packages (such as GPU-enabled TensorFlow) allowing for code to be as performant as possible

Installing Conda#

On Aire#

We provide a module for miniforge, meaning you don’t need to install it yourself.

Using conda on Aire

On another system#

You can install Conda from a number of sources. In order to ensure you are using an open-source distribution and the conda-forge channel by default, we recommend you install the Miniforge distribution. Installers are available for Windows, MacOS and Linux. You do not need administrative rights to install Conda on a machine.

If you have questions or issues installing Conda locally please get in touch via the Research Computing Contact form.

Conda environments#

As well as managing packages Conda also allows you to create and manage environments. A Conda environment is a directory that contains a specific set of installed packages and tools. This allows you to separate the dependencies of different projects cleanly so for example, you can use Python 3.7 in one Conda environment to reproduce a collaborators results but use Python 3.10 in your own projects without any hassle. Conda makes it easy to switch between different environments and allows you to create and delete them as required. Conda environments also make it easier to share our environment setup between machines and with collaborators as we can export our environments into a text file.

If you want to find out more about good dependency management practices in general, please read our documentation; we use this material to inform this session but take a more trial-and-error approach here.

The base environment

By default Conda includes the base environment. This contains a starting installation of Python and the dependencies of the Conda tool itself.

Therefore, it’s best practice to not install packages into the base environment and create your own environments into which you install the tools you need.

Installing into the base environment can lead to dependency conflicts and prevents you from being able to swap between different versions of packages for different libraries.

General guidelines for handling environments#

While following the steps below to build, experiment with, and then create a reproducible environment, you will hopefully notice the following key principles:

  • In general, environments should be treated as disposable and rebuildable: you should be able to tear down and rebuild your environment quickly and easily (of course, some larger environments with complex installations will be an exception to this rule). Ideally, you won’t have to rebuild, but being able to will save you an awful lot of heartbreak if and when something goes wrong. We’ll see how we can use an environment.yaml file to do this.

  • Export your exact environment as metadata for analysis results: it is useful to save a snapshot of your environment to store along any results or outputs produced in that specific environment.

  • Environments must be stored in your home directory and all research output must be stored in /mnt/scratch/users: misuse of the system can affect performance for all users and will lead to your jobs being stopped.

Creating environments#

There are two main ways to create a fresh Conda environment:

  1. Creating directly from the command line with a list of required packages;

  2. Creating from an environment.yaml file that lists required packages.

We will step through examples of both, and compare both techniques.

1. On the fly creation#

If you have come across Conda before, this is likely the method of creating environments that you’ve encountered.

You can create an environment with Conda with the subcommand conda create. When creating an environment we need to give it a name; we recommend giving it a name related to the project you’re building it to support. In this example, we use the (unimaginative name) py39-env as we’re going to be using Python 3.9; you can imagine that if you’re working with multiple different versions of Python is could be useful to record this in the environment name, and prefix it with the project title.

$ conda create --name py39-env python=3.9

The above command will prompt Conda to create a new environment called py39-env and install into it python at version 3.9. We can specify multiple packages when creating a Conda environment by separating each package name with a space.

$ conda create --name data-sci-env pandas=1.4.2 matplotlib=3.5.1 scikit-learn

With the above command we create a new environment but don’t specify to install Python. However, because we’ve specified Python packages which depend on Python being installed to run Conda will install the highest version of Python suitable for these packages.

2. Creation from an environment.yaml file#

Instead of providing a list of packages as arguments to the Conda command, you can instead point Conda to a file that lists your dependencies.

First, you need to create an environment file with the dependencies required, saved with the file extension .yaml or .yml (usually called environment.yaml, but it doesn’t have to be):

name: data-sci-env
dependencies:
- scikit-learn
- matplotlib=3.5.1
- pandas=1.4.2

You’ll note that this list has the same dependencies as our on-the-fly example previously (conda create --name data-sci-env pandas=1.4.2 matplotlib=3.5.1 scikit-learn). This file should be saved in the project directory.

Then, we can create a new environment by simply pointing Conda at the environment file:

$ conda env create -f environment.yaml

Note that this second example was run much more recently (2025) than the previous example; can you spot some key differences in the output below?

With the above command we create a new environment but don’t specify to install Python. However, because we’ve specified Python packages which depend on Python being installed to run Conda will install the highest version of Python suitable for these packages.

Activating environments#

Regardless of the method you used to create the environment, in order to use a Conda environment we need to activate it. Activating our environment does a number of steps that sets the terminal we’re using up so that it can see all of the installed packages in the environment, making it ready for use.

$ conda activate data-sci-env

(data-sci-env)$

You use the subcommand conda activate ENVNAME for environment activation, where ENVNAME is the name of the environment you wish to activate. You can see it has successfully activated when it returns your prompt with the environment name prepended in brackets.

Deactivating environments#

You can deactivate your current environment with another simple subcommand conda deactivate.

(data-sci-env)$ conda deactivate

Listing current environments#

If you ever want to see your list of current environments on your machine you can you the subcommand conda env list. This will return a list of the available Conda environments you can use and the environment location in your filesystem.

$ conda env list

Updating a Conda environment and installing new packages#

It’s very likely that after creating an environment with a certain list of packages, you’ll want to add other packages, or potentially change what version of a package you have installed.

Earlier we created the data-sci-env and installed some useful data science packages. We’ve discovered we also need the statsmodels package for some extra work we want to do so we’ll look at how to install this package within our existing environment.

Searching for packages

Conda has a command-line search functionality that we describe below in the section Use Conda to search for a package; you can also use the conda-forge repository or bioconda repository to search for packages.

Once you have the name (and possibly version) of the package you want to install, again there are two different ways to add these packages, much like there were two ways to create the environment to begin with.

1. On the fly installation of new packages#

You can add new packages directly from the command line using the install subcommand with the format conda install PACKAGE, where PACKAGE is the name of the package you wish to install.

To install packages into an existing environment we need to activate it with the subcommand shown above.

$ conda activate data-sci-env

(data-sci-env)$ conda install statsmodels

Conda will always prompt the user if we’re happy to proceed with the installation and specifies all the other packages that will be installed or updated that are required for our specified package. We confirm we wish to proceed by entering y and pressing Return.

This installs any packages that are currently not installed (Conda caches packages locally in case they are required by other packages, this speeds up installs but uses more disk space to maintain this cache).

2. Updating from an environment.yaml file#

To update our environment using our environment file, we need to edit the environment.yaml to include the new packages:

name: data-sci-env
dependencies:
- scikit-learn
- matplotlib=3.5.1
- pandas=1.4.2
- statsmodels

Now, to update the environment from this file, we us the update subcommand:

$ conda env update --file environment.yaml --prune

Note that the environment does not need to be active to do this. You should pin any versions of libraries (such as matplotlib=3.5.1) that you don’t want to update.

Note that there is a FutureWarning that can safely be ignored as it is not intended to flag use of environment.yaml files.

This ensures that we have an up-to-date record of what we have installed in our project folder.

The --prune argument here clears out old unused libraries and is key to keeping your .conda folder a reasonable size. Please ensure you use the prune command to prevent environment bloat.

Removing a Conda environment#

It is also possible to delete a Conda environment through the remove subcommand. This command is outlined below in relation to removing specific packages but can also be used to delete an entire Conda environment.

To remove the py39-env we created earlier we use the command:

$ conda remove --name py39-env --all

Conda checks for user confirmation that we wish to proceed and outlines for us exactly which packages are being removed. On proceeding with removing the environment all associated environment files and packages are deleted.

Important

Using conda remove to delete an environment is irreversible. You cannot undo deletion of an environment to the exact state it was in before deletion. However, if you have exported details of your environment it is possible to recreate it.

Recording your Conda environments#

Recording dependencies is crucial for reproducibility. In order to record the exact versions of all dependencies used in your project (as opposed to the limited list you manually installed with your envrionment.yaml file), from inside your active conda environment, you can run the following export command:

$ conda activate data-sci-env

(data-sci-env)$ conda env export > env-record.yaml

This can be run as part of a batch job and included in your submission script; so that it’s saved out alongside your other output data files:

conda env export > $SCRATCH/env-record.yaml

This exported environment file is mainly useful as a record for the sake of reproducibility, not for reusability. Your environment.yaml file is a far better basis for rebuilding or sharing environments.

This record will include background library dependencies (libraries you did not explicitly install, that were loaded automatically) and details of builds. This file, while technically an environment.yaml file, will likely not be able to rebuild your environment on a machine other than the machine it was created on.

It’s important to consider the balance of reproducibility and portability: conda env export captures the exact specification of an environment including all installed packages, their dependencies and package hashes. Sometimes this level of detail should be included to ensure maximum reproduciblity of a project and when looking to validate results, but it’s important to also balance being able to allow people to reproduce your work on other systems. The next section talks about portability or reuseability more.

Sharing Conda environments#

The Conda environment.yaml file is the key to sharing conda environments across systems.

If you created your Conda environment from a .yaml file (and have kept it up-to-date by using it and the update command to install new packages), you can share this file with collaborators, and they can use the instructions above to create an environment from file.

If you instead used the on-the-fly creation method and don’t have an environment.yaml, it will take a little bit more work. As we stated in the last section, using conda env export will export all installed packages, their dependencies, and package hashes, and will be unlikely to install without error on a different system. So how can we produce a reuseable environment.yaml file?

If you follow the above steps for building your conda environment from a .yaml file, this step is not necessary. However, if you want to salvage, share, or back-up an environment that you built using repeated conda install package-name commands, this allows you to create an environment.yaml file.

Activate your environment and run a modified export:

$ conda activate data-sci-env

(data-sci-env)$ conda env export --from-history > environment_export.yaml

This will export a list of only the libraries that you explicitly installed (and not all the background dependencies), and only the pinned versions you requested. This is not useful as a record of your exact environment, but is a good backup for rebuilding or sharing your environment. Note that this will not add any pip dependencies: to find out more about pip dependencies. We won’t get into mixing in pip dependencies today, but please read our documentation for how to export a reuseable environment file including pip dependencies.

Using Conda to search for packages#

We can use the search command in Conda to find available package versions:

$ conda search python

This command searches for packages based on the argument provided. It searches in package repositories called Conda Channels which are remote websites where built Conda packages have been uploaded to. By default Conda installed with Miniforge uses the conda-forge channel. If you are using a different install of Conda, you may need to specify this channel. Alternatively, you may need to point to the Bioconda channel.

$ conda search 'python[channel=conda-forge]'

You can also search for specific version requirements with conda search:

$ conda search 'python>=3.8'

You can combine the two conditions shown above (searching a specific channel and for a specific version):

$ conda search 'python[channel=conda-forge]>=3.8'

Removing packages#

Another crucial aspect of managing an environment involves removing packages. Conda includes the remove subcommand for this operation, which allows you to specify a list of packages you wish to remove. You can do this within an activated environment, or specify to Conda the environment from which you want to remove packages.

When creating our data-sci-env we installed pandas=1.4.2, let’s imagine we made a mistake here and wanted a different version. We could remove this version of pandas with the following command:

$ conda remove -n data-sci-env pandas

When removing packages as with installing them Conda will ask for user confirmation to proceed. As you can see in the above example, removing one package may also lead to the removal of additional packages and can cause other packages to update.

With these changes made we can now install a newer version of pandas using conda install.

Of course, this can also be easily done by updating our environment.yaml file to remove the package, and running the update command shown above with the flag --prune.

Updating a package#

The above example is slightly artificial as removing a package to install a more recent version is a long-winded way of doing things with Conda. If we want to update a package to a more recent version Conda provides the update subcommand to achieve this. Crucially, conda update will update a package to its most recent version and can’t be used to specific a particular version.

Let’s say we wanted to update the matplotlib library to the most recent version in our data-sci-env.

$ conda activate data-sci-env

(data-sci-env)$ conda update matplotlib

When requesting to update a package Conda will also update other dependencies of the package that you wish to update, and can potentially install new packages that are required.

Again, this can also be easily done by updating our environment.yaml file to change the version of a specific package, and running the update command shown above with the flag --prune.

Summary#

Important