Link Search Menu Expand Document
Table of contents
  1. What is conda?
    1. What is a package?
    2. Why do we need something to manage packages?
      1. Simple dependencies
      2. Project specific dependencies
      3. Reproducible research
    3. Next: Environments

What is conda?

At the very core conda is a programming-language-agnostic package manager. In the following, I will try to explain what packages are, why do we use them, and why do we need someone or something to manage them.

What is a package?

A package is anything you install using your package manager. In particular, a conda package is a compressed tarball that contains two things

  1. a module that can be installed
  2. a recipe on ‘how to install the module’

Why do we need something to manage packages?

Simple dependencies

Not all packages were developed to work on all machines or all version of other software. A good example are syntax changes in python. With new python versions not supporting older syntax and vice-versa (not only 2->3). For example, going from 3.9->3.10 there have been changes on the typing syntax. If you update your code to the new way of defining input types. Older versions of python will not be able to run your code anymore. Furthermore, you may have a dependency that cannot be run on a newer python version. In such cases, it makes sense to find the versions of the packages that overlap in their dependency structure and work with each other. A package manager such as conda will take care of that for you at installation time (sometimes this takes a ridiculous amount of time if you have a large number of packages installed).

Project specific dependencies

Furthermore, you may have two or more projects, that require different dependencies of the same software. Say, we are working on Project 1 and Project 2. Project 1 has the dependencies Software A and Software B, and Project 2 depends on Software B and Software C. Now, Software A depends on a specific version of Software B but Software C depends on a different version of Software B. That’s not exactly ideal. Installing either version of Software B will break the other Project. conda lets us easily create and manage separate environments to avoid such version conflicts (see Figure below). Additionally, conda automatically checks the package that we are trying to install against all installed packages and their versions. That way, nothing breaks and we can have both versions.

DISCLAIMER when your environments become large and complicated this process becomes very long…

Figure showing the interproject dependencies and how conda takes care of it. On the left-hand side we have a single umbrella environment that contains all package dependencies. On the right-hand side, using an environment manager (e.g. conda), we divide the projects into two different environments. Now, the package B does suffer from a version conflict anymore

Reproducible research

The third motivation here is that creating environments makes your research reproducible. conda keeps track of the versions of the programs that you are installing. This information can be extracted and exported to a file. We can recreate the environment that your research was computed in, using the exported file, which is often named environment.yml or env.yml. This protects you against possible changes in future versions of, e.g. numpy or scipy. For example, at some point numpy updated how you access the fft modules, breaking many signal processing applications. With a version-tracked environment this would not happen.

Next: Environments