Session 4: Modules and Software

Managing Software Environments on HPC Systems

Session content

Session aims

By the end of this session, you will be able to:

  • Understand the module system and its benefits for HPC environments
  • Use basic module commands to list, load, and unload software
  • Create scripts that load modules and run software
  • Request new software installations through proper channels
  • Explore alternative software management approaches (Spack, containers)
  • Apply best practices for reproducible software environments

In this session we will learn about software on Aire, and how to access software via the module system. We will also discuss some alternatives to install software yourself on the system.

View Interactive Slides: Module System on Aire

What are Modules?

Modules are a way to manage different software environments on HPC systems:

  • They allow users to load and unload software packages dynamically
  • This helps in managing different versions of software and their dependencies
  • Simplifies the user environment and avoids conflicts between software versions
  • Provides a consistent and reproducible environment

Why Use Modules?

Benefits: - Clean separation of software environments - Easy to switch between environments - Optimized builds for HPC hardware

Without Modules: - Software conflicts - Path management nightmares - Inconsistent environments - Difficult reproducibility

Basic Module Commands

Listing Available Modules

module avail              # List all available modules
module avail python       # List all Python modules
module avail gcc          # List all GCC modules

Loading and Unloading Modules

# Load a module
module load python/3.13.0

# Load without specifying version (uses default)
module load python

# Unload a module
module unload python/3.13.0

# List currently loaded modules
module list

# Unload all modules
module purge

Using Modules in Scripts

Let’s say we have a Python file called hello_world.py:

print("hello world!")

How would we write a bash script that loads the Python module and runs the Python script?

Creating a Module Script

Create a file called python_test.sh:

#!/bin/bash

module load python/3.13.0
python hello_world.py

Make it executable and run it:

chmod +x python_test.sh    # Add executable permissions
ls -F                      # Check it's executable (shows *)
./python_test.sh          # Run the script
Best Practice for Python

When running Python jobs, we recommend using the Miniforge module to create a conda environment instead of using the basic Python install. Read our documentation on dependency management.

Requesting New Software

Centralized Management

  • Popular software is centrally installed by the Research Computing team
  • Ensures optimized performance and avoids conflicts
  • Regularly updated with new versions

How to Request New Software

If software you need isn’t available:

  1. Submit a Research Computing Query with details about the software
  2. Include: software name, version, and brief justification for use in your research
  3. The team will evaluate and install if appropriate

Alternative Software Management

You can also manage your own software on Aire through several routes. Many users won’t need this, but it may be necessary if you want fine-grained control or need older versions of software.

Package Managers

Spack

  • Spack: Flexible package manager for HPC systems
  • Allows users to install software without admin privileges
  • Supports complex dependency management
module load spack
spack install htop
spack load htop

EasyBuild

  • EasyBuild: Framework for building and installing software on HPC
  • Automates the build process using configuration files
  • Good for complex scientific software

Other Options

Manual Building

  • Download and compile software yourself
  • Requires knowledge of build systems and dependencies
  • Most control but most work

Containers

  • Encapsulate software environments using Apptainer
  • Ensures portability and consistency across systems
  • Great for complex software stacks

Best Practices

Environment Management

  • Use modules to manage software environments effectively
  • Unload modules when no longer needed to avoid conflicts
  • For R or Python, use Miniforge module to create conda environments

Reproducibility

Key Principles
  1. Always specify versions: module load gcc/14.2.0 not module load gcc
  2. Document everything: Keep track of modules and versions used
  3. Use scripts: Automate your module loading in job scripts
  4. Version control: Keep your workflow scripts in version control

Collaboration

  • Share module load commands with collaborators
  • Use version-controlled scripts to manage workflows
  • Consider containers for complex environments

Common Module Workflows

Data Analysis Workflow

#!/bin/bash
module load python/3.13.0
module load scipy/1.11.3
module load matplotlib/3.7.2

python analysis.py

Compilation Workflow

#!/bin/bash
module load gcc/14.2.0
module load cmake/3.24.2
module load openmpi/4.1.4

cmake .
make -j8

Machine Learning Workflow

#!/bin/bash
module load miniforge/24.3.0
conda activate ml-env
python train_model.py

Exercises

Work through these exercises to practice using the module system on Aire.

Exercise 1: Explore Available Software

Get familiar with the available software on Aire:

# List all available modules
module avail

# Search for specific software
module avail python
module avail gcc
module avail cmake

# Look for software you might need for your research
module avail R
module avail matlab

Questions to consider: - How many versions of Python are available? - What’s the default version when multiple versions exist? - Can you find software relevant to your research area?

Exercise 2: Practice Loading and Managing Modules

Learn to load, check, and unload modules:

# Load a module and check it's loaded
module load gcc
module list

# Load multiple modules
module load python/3.13.0
module load cmake/3.24.2
module list

# Try loading without specifying version
module unload gcc
module load gcc
module list

# Clean up - unload all modules
module purge
module list

Key Learning Points: - Always specify versions for reproducibility: module load gcc/14.2.0 - Use module list to see what’s currently loaded - Use module purge to start with a clean environment

Exercise 3: Create and Test a Module Script

Create a script that uses modules to run software:

# Create a simple Python script first
cat > hello_modules.py << 'EOF'
import sys
print(f"Hello from Python {sys.version}")
print(f"Python executable: {sys.executable}")
EOF

# Create a bash script that loads modules and runs Python
cat > test_modules.sh << 'EOF'
#!/bin/bash

echo "Starting with clean environment..."
module purge
module list

echo "Loading Python module..."
module load python/3.13.0
module list

echo "Running Python script..."
python hello_modules.py

echo "Script completed!"
EOF

# Make executable and test
chmod +x test_modules.sh
./test_modules.sh

Exercise 4: Create a Project Setup Script

Create a reusable script for a typical research project:

# Create a comprehensive project setup script
cat > project_setup.sh << 'EOF'
#!/bin/bash
# Project Setup Script
# Description: Loads all necessary modules for data analysis project

echo "Setting up research environment..."

# Start with clean environment
module purge

# Load essential tools
module load gcc/14.2.0        # Compiler
module load python/3.13.0     # Python
module load cmake/3.24.2      # Build system

# Optional: Load domain-specific software
# module load r/4.3.1          # For R users
# module load matlab/2023b     # For MATLAB users

echo "Loaded modules:"
module list

echo "Environment ready!"
echo "Python version: $(python --version)"
echo "GCC version: $(gcc --version | head -n1)"

# Optional: Activate conda environment
# echo "Activating conda environment..."
# conda activate myproject
EOF

chmod +x project_setup.sh
./project_setup.sh

Exercise 5: Explore Software Request Process

Practice finding and understanding the software request process:

  1. Find the request form: Navigate to the Research Computing Query form
  2. Identify software needs: Think of software you need that isn’t available
  3. Draft a request: Write a brief justification for a software package you might need

Example request template:

Software Name: [e.g., TensorFlow 2.14]
Version: [specific version if needed]
Research Purpose: [brief description of how it will be used]
Justification: [why this specific version/software is needed]
What You’ve Accomplished
  • ✅ Explored available software using module avail
  • ✅ Practiced loading and unloading modules
  • ✅ Created scripts that use modules effectively
  • ✅ Built a reusable project setup script
  • ✅ Understanding the software request process
  • ✅ Applied best practices for reproducible environments

Summary

Key Takeaways
  • Modules provide clean software environments without conflicts
  • Always specify versions for reproducible research
  • Use scripts to automate and document your module usage
  • Request new software through Research Computing queries
  • Consider alternatives like Spack or containers for special requirements
  • Follow best practices for collaboration and reproducibility

Next Steps

Now you know how to manage software on Aire! Let’s move on to Session 5: Job Scheduling and Submission to learn how to run your code on the compute nodes.

Additional Resources