Session 4: Modules and Software
Managing Software Environments on HPC Systems
Session content
Session aims
By the end of this session, you will be able to:
- Understand the module system and its benefits for HPC environments
- Use basic module commands to list, load, and unload software
- Create scripts that load modules and run software
- Request new software installations through proper channels
- Explore alternative software management approaches (Spack, containers)
- Apply best practices for reproducible software environments
In this session we will learn about software on Aire, and how to access software via the module system. We will also discuss some alternatives to install software yourself on the system.
What are Modules?
Modules are a way to manage different software environments on HPC systems:
- They allow users to load and unload software packages dynamically
- This helps in managing different versions of software and their dependencies
- Simplifies the user environment and avoids conflicts between software versions
- Provides a consistent and reproducible environment
Why Use Modules?
Benefits: - Clean separation of software environments - Easy to switch between environments - Optimized builds for HPC hardware
Without Modules: - Software conflicts - Path management nightmares - Inconsistent environments - Difficult reproducibility
Basic Module Commands
Listing Available Modules
module avail # List all available modules
module avail python # List all Python modules
module avail gcc # List all GCC modulesLoading and Unloading Modules
# Load a module
module load python/3.13.0
# Load without specifying version (uses default)
module load python
# Unload a module
module unload python/3.13.0
# List currently loaded modules
module list
# Unload all modules
module purgeUsing Modules in Scripts
Let’s say we have a Python file called hello_world.py:
print("hello world!")How would we write a bash script that loads the Python module and runs the Python script?
Creating a Module Script
Create a file called python_test.sh:
#!/bin/bash
module load python/3.13.0
python hello_world.pyMake it executable and run it:
chmod +x python_test.sh # Add executable permissions
ls -F # Check it's executable (shows *)
./python_test.sh # Run the scriptWhen running Python jobs, we recommend using the Miniforge module to create a conda environment instead of using the basic Python install. Read our documentation on dependency management.
Requesting New Software
Centralized Management
- Popular software is centrally installed by the Research Computing team
- Ensures optimized performance and avoids conflicts
- Regularly updated with new versions
How to Request New Software
If software you need isn’t available:
- Submit a Research Computing Query with details about the software
- Include: software name, version, and brief justification for use in your research
- The team will evaluate and install if appropriate
Alternative Software Management
You can also manage your own software on Aire through several routes. Many users won’t need this, but it may be necessary if you want fine-grained control or need older versions of software.
Package Managers
Spack
- Spack: Flexible package manager for HPC systems
- Allows users to install software without admin privileges
- Supports complex dependency management
module load spack
spack install htop
spack load htopEasyBuild
- EasyBuild: Framework for building and installing software on HPC
- Automates the build process using configuration files
- Good for complex scientific software
Other Options
Manual Building
- Download and compile software yourself
- Requires knowledge of build systems and dependencies
- Most control but most work
Containers
- Encapsulate software environments using Apptainer
- Ensures portability and consistency across systems
- Great for complex software stacks
Best Practices
Environment Management
- Use modules to manage software environments effectively
- Unload modules when no longer needed to avoid conflicts
- For R or Python, use Miniforge module to create conda environments
Reproducibility
- Always specify versions:
module load gcc/14.2.0notmodule load gcc - Document everything: Keep track of modules and versions used
- Use scripts: Automate your module loading in job scripts
- Version control: Keep your workflow scripts in version control
Collaboration
- Share module load commands with collaborators
- Use version-controlled scripts to manage workflows
- Consider containers for complex environments
Common Module Workflows
Data Analysis Workflow
#!/bin/bash
module load python/3.13.0
module load scipy/1.11.3
module load matplotlib/3.7.2
python analysis.pyCompilation Workflow
#!/bin/bash
module load gcc/14.2.0
module load cmake/3.24.2
module load openmpi/4.1.4
cmake .
make -j8Machine Learning Workflow
#!/bin/bash
module load miniforge/24.3.0
conda activate ml-env
python train_model.pyExercises
Work through these exercises to practice using the module system on Aire.
Exercise 1: Explore Available Software
Get familiar with the available software on Aire:
# List all available modules
module avail
# Search for specific software
module avail python
module avail gcc
module avail cmake
# Look for software you might need for your research
module avail R
module avail matlabQuestions to consider: - How many versions of Python are available? - What’s the default version when multiple versions exist? - Can you find software relevant to your research area?
Exercise 2: Practice Loading and Managing Modules
Learn to load, check, and unload modules:
# Load a module and check it's loaded
module load gcc
module list
# Load multiple modules
module load python/3.13.0
module load cmake/3.24.2
module list
# Try loading without specifying version
module unload gcc
module load gcc
module list
# Clean up - unload all modules
module purge
module listKey Learning Points: - Always specify versions for reproducibility: module load gcc/14.2.0 - Use module list to see what’s currently loaded - Use module purge to start with a clean environment
Exercise 3: Create and Test a Module Script
Create a script that uses modules to run software:
# Create a simple Python script first
cat > hello_modules.py << 'EOF'
import sys
print(f"Hello from Python {sys.version}")
print(f"Python executable: {sys.executable}")
EOF
# Create a bash script that loads modules and runs Python
cat > test_modules.sh << 'EOF'
#!/bin/bash
echo "Starting with clean environment..."
module purge
module list
echo "Loading Python module..."
module load python/3.13.0
module list
echo "Running Python script..."
python hello_modules.py
echo "Script completed!"
EOF
# Make executable and test
chmod +x test_modules.sh
./test_modules.shExercise 4: Create a Project Setup Script
Create a reusable script for a typical research project:
# Create a comprehensive project setup script
cat > project_setup.sh << 'EOF'
#!/bin/bash
# Project Setup Script
# Description: Loads all necessary modules for data analysis project
echo "Setting up research environment..."
# Start with clean environment
module purge
# Load essential tools
module load gcc/14.2.0 # Compiler
module load python/3.13.0 # Python
module load cmake/3.24.2 # Build system
# Optional: Load domain-specific software
# module load r/4.3.1 # For R users
# module load matlab/2023b # For MATLAB users
echo "Loaded modules:"
module list
echo "Environment ready!"
echo "Python version: $(python --version)"
echo "GCC version: $(gcc --version | head -n1)"
# Optional: Activate conda environment
# echo "Activating conda environment..."
# conda activate myproject
EOF
chmod +x project_setup.sh
./project_setup.shExercise 5: Explore Software Request Process
Practice finding and understanding the software request process:
- Find the request form: Navigate to the Research Computing Query form
- Identify software needs: Think of software you need that isn’t available
- Draft a request: Write a brief justification for a software package you might need
Example request template:
Software Name: [e.g., TensorFlow 2.14]
Version: [specific version if needed]
Research Purpose: [brief description of how it will be used]
Justification: [why this specific version/software is needed]
- ✅ Explored available software using
module avail - ✅ Practiced loading and unloading modules
- ✅ Created scripts that use modules effectively
- ✅ Built a reusable project setup script
- ✅ Understanding the software request process
- ✅ Applied best practices for reproducible environments
Summary
- Modules provide clean software environments without conflicts
- Always specify versions for reproducible research
- Use scripts to automate and document your module usage
- Request new software through Research Computing queries
- Consider alternatives like Spack or containers for special requirements
- Follow best practices for collaboration and reproducibility
Next Steps
Now you know how to manage software on Aire! Let’s move on to Session 5: Job Scheduling and Submission to learn how to run your code on the compute nodes.