Skip to main content

Conda Tutorial

This tutorial guides users through leveraging Conda's powerful features while emphasizing the benefits of maintaining isolated environments. It covers:

  • Setting up virtual environments
  • Effectively utilizing environments
  • Integrating with SLURM for HPC workflows
  • Managing and maintaining Conda environments

What is Conda?

logo

Conda is a powerful package and environment manager designed for data science, particularly useful for Linux and MacOS users without administrative privileges. It allows users to search for, install, upgrade, and manage a wide range of open-source software packages, including programming languages like Python, R, Perl, Java, and Julia. It is an ideal tool for data scientists, developers, and researchers working across multiple systems. Conda allows seamless switching and creation of environments for testing purposes, dependency resolution, and extensive cross-platform support.

What is a Conda environment?

A Conda environment is an isolated, self-contained directory that contains a specific collection of software packages, dependencies, and their versions. This isolation ensures that projects can have their own unique configurations without interfering with the global system environment or other projects. Dependencies allow you to test how your project and code runs in newer versions and with different tools, without having to test from scratch.

Why use Conda?

Conda offers several compelling benefits:

  1. By installing these packages in a user-specified directory, Conda bypasses the need for system-wide installation permissions. 
  2. It allows users to create separate environments for different projects, each with its own specific versions of software and libraries. 
  3. It ensures that projects are fully reproducible, allowing anyone with access to your environment configuration to replicate the same setup, avoiding compatibility issues.
  4. It ensures that all COMPATIBLE dependencies of libraries are installed and reinstalls correct versions to prevent version conflicts

These features make Conda especially valuable for use on high-performance computing (HPC) systems, where managing dependencies and maintaining consistent environments are critical.

Steps to create a Conda environment on WAVE

  1. Open a terminal session in one of the WAVE cluster login nodes using a user account. See WAVE HPC User Guide - Accessing the HPC if you require help on accessing the HPC.
  2. To check if you have Conda installed, run the command:
    module avail Anaconda3 If you do not see any version numbers or description pop-up, then you need to install it( refer to the link below 
  3. SiteManager.png
  4.  Load Anaconda distribution module using the command: module load Anaconda3

See Installing Software for more information on loading modules.

    1. If it is your first time using Anaconda on WAVE, run the command: 

conda init

This is used to set up and initialize your environment for Conda usage. This command only needs to be run once, the first time you use Conda. It should resemble something similar to this:  

SiteManager.png

2. To create an environment, use the command: 

conda create --name <my-env>

You can also specify where the environment is located using the –prefix option: 

conda create --name <my-env> --prefix </absolute/path/to/target/directory>

When asked for confirmation to proceed, type ‘y’.

3. To create an environment with a specific package, use the command: 

conda create --name <my-env> <package1>=<version1> <package2>=<version2>

Specifying the version is optional. It is highly recommended to let conda manage your version as it will avoid many future package compatibility issues. Enter the version if you are sure of the versions you must use.

4. To install packages after creating the environment, use the command: 

conda install --name <my-env> <package1>=<version1> <package2>=<version2>

Steps to Using a Conda Environment 

1. Load Anaconda module if not pre-loaded using the command: 

module load Anaconda3

2. To activate the created environment, use the command: 

conda activate <my-env>

Once activated, the name of the active environment will appear in your terminal prompt, making it easy to identify which environment you are currently using.To view your environments, run the command 

conda info --envs

Navigate to your project's directory and execute your program depending on the type of file and how you intend to run it. (For more help on running your project, check out the corresponding tutorial)

To deactivate the environment, use the command: 

conda deactivate <my-env>

Steps to use pip install within a Conda Environment

1. It is highly recommended to use pip to install packages as conda installation methods may not provide the latest package.

2. Activate the environment where you want to install the package using the below command: 

conda activate <my-env>

3. Install pip in the environment (if not already installed) using command: 

conda install pip

4. Use pip to install the desired package within the active environment using the command: 

pip install <package-name>

5. Verify installation using command: 

pip show <package-name>

Steps for Using Conda in a SLURM job

Create a Conda environment and install the necessary packages.
In the SLURM job script, load Anaconda distribution and activate your environment as shown below. 

#!/bin/bash 
# 
#SBATCH --job-name=someapp 
#SBATCH --output=somejob-%j.out 
# 
#SBATCH --partition=cmp
#SBATCH --nodes=1 
#SBATCH --ntasks=1 
#SBATCH --time=00:40:00 
#
#SBATCH --mail-user=<user_account@scu.edu>
#SBATCH --mail-type=END
# Load Anaconda to run the program 
module load Anaconda
# Activate the environment
conda activate <my-env>
# Run the Program
python ./somepythonprog -o outdir -a param1 -b param2 ...
# Submit the job using the command: 

sbatch <script-name>.sh