Session 5: Job Scheduling and Submission

Running Jobs on the Slurm Scheduler

Session content

Session aims

By the end of this session, you will be able to:

  • Understand what a job scheduler is and why it’s essential for HPC systems
  • Write and submit batch job scripts using Slurm
  • Monitor, manage, and cancel running jobs effectively
  • Request different types of compute resources (CPU, memory, GPUs, time)
  • Use job arrays to efficiently run multiple similar tasks
  • Apply best practices for job submission and resource allocation

View Interactive Slides: Job Scheduling and Submission

Background: What is a Job Scheduler?

High Performance Computing (HPC) systems are shared by many users, each submitting their own jobs — code they want to run using the cluster’s compute power.

A job scheduler is the system that:

  • Organizes when and where jobs run
  • Allocates the requested resources (CPU cores, memory, GPUs)
  • Ensures fair access to shared resources for all users

Why Do We Need a Scheduler?

Without a scheduler:

  • Manual coordination between users
  • Resource conflicts
  • Inefficient resource usage
  • Chaos with thousands of CPUs

With a scheduler:

  • Fair resource allocation
  • Automatic job management
  • Efficient resource utilization
  • Priority-based execution

Slurm: The Scheduler on Aire

At Leeds, the Slurm scheduler (Simple Linux Utility for Resource Management) manages all jobs on the Aire cluster.

Job Submission Workflow

  1. Write a job script describing your requirements
  2. Submit the job to Slurm with sbatch
  3. Queue - Slurm places your job in a queue
  4. Execute - When resources are available, Slurm starts your job
  5. Complete - Job finishes and resources are freed

flowchart LR
    A[Write Job Script] --> B[Submit with sbatch]
    B --> C[Job in Queue]
    C --> D[Resources Available?]
    D -->|Yes| E[Job Runs]
    D -->|No| C
    E --> F[Job Completes]

Basic Slurm Commands

Essential Commands

Command Purpose Example
sbatch Submit a job sbatch myjob.sh
squeue View job queue squeue -u $USER
scancel Cancel a job scancel 12345
sinfo View node information sinfo
sacct View job accounting sacct -j 12345

Checking the Queue

squeue                    # Show all jobs
squeue -u $USER          # Show only your jobs
squeue -u $USER --long   # Detailed view of your jobs

Example output:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
12345      test myjob.sh  user123  R    0:05:23      1 node001
12346      test analyze  user123 PD       0:00      2 (Resources)

Job States: - R = Running - PD = Pending (waiting for resources) - CG = Completing - CD = Completed

Writing Job Scripts

A job script is a shell script with special Slurm directives that tell the scheduler what resources you need.

Basic Job Script Template

#!/bin/bash
#SBATCH --job-name=myjob          # Job name
#SBATCH --partition=test          # Partition to use
#SBATCH --time=01:00:00           # Time limit (1 hour)
#SBATCH --nodes=1                 # Number of nodes
#SBATCH --ntasks=1                # Number of tasks
#SBATCH --cpus-per-task=4         # CPUs per task
#SBATCH --mem=8G                  # Memory per node
#SBATCH --output=myjob_%j.out     # Output file (%j = job ID)
#SBATCH --error=myjob_%j.err      # Error file

# Load required modules
module load python/3.13.0

# Change to working directory
cd $SLURM_SUBMIT_DIR

# Run your program
echo "Job started at $(date)"
echo "Running on node: $(hostname)"
echo "Job ID: $SLURM_JOB_ID"

python my_script.py

echo "Job finished at $(date)"

Key SBATCH Directives

Directive Purpose Example
--job-name Name for your job --job-name=analysis
--partition Queue to use --partition=standard
--time Maximum runtime --time=02:30:00
--nodes Number of nodes --nodes=2
--ntasks Number of tasks --ntasks=8
--cpus-per-task CPUs per task --cpus-per-task=4
--mem Memory per node --mem=16G
--output Output file --output=job_%j.out

Submitting and Managing Jobs

Submit a Job

sbatch myjob.sh

Output: Submitted batch job 12345

Monitor Your Jobs

# Check if your job is running
squeue -u $USER

# Get detailed job information
scontrol show job 12345

# Check job history
sacct -j 12345

Cancel Jobs

# Cancel specific job
scancel 12345

# Cancel all your jobs
scancel -u $USER

# Cancel jobs by name
scancel --name=myjob

Resource Requests

CPU and Memory

# Single core job
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G

# Multi-core job (8 cores, shared memory)
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G

# Multi-task job (MPI)
#SBATCH --ntasks=16
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=2G

Time Limits

# Format: DD-HH:MM:SS or HH:MM:SS
#SBATCH --time=30:00        # 30 minutes
#SBATCH --time=2:00:00      # 2 hours
#SBATCH --time=1-12:00:00   # 1 day, 12 hours

GPU Resources

# Request 1 GPU
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1

# Request specific GPU type
#SBATCH --partition=gpu
#SBATCH --gres=gpu:v100:2

High Memory Jobs

# Request high memory node
#SBATCH --partition=himem
#SBATCH --mem=500G

Common Job Patterns

Serial Job (Single Core)

#!/bin/bash
#SBATCH --job-name=serial_job
#SBATCH --time=01:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G

module load python/3.13.0
python serial_script.py

Parallel Job (Shared Memory)

#!/bin/bash
#SBATCH --job-name=parallel_job
#SBATCH --time=02:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G

module load gcc/14.2.0
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

./my_openmp_program

MPI Job (Distributed Memory)

#!/bin/bash
#SBATCH --job-name=mpi_job
#SBATCH --time=04:00:00
#SBATCH --ntasks=16
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4G

module load openmpi/4.1.4
mpirun ./my_mpi_program

Job Arrays

Use job arrays to submit many similar jobs efficiently:

#!/bin/bash
#SBATCH --job-name=array_job
#SBATCH --time=01:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --array=1-10          # Submit jobs 1 through 10
#SBATCH --output=job_%A_%a.out # %A = array job ID, %a = task ID

# Process different input files
INPUT_FILE="input_${SLURM_ARRAY_TASK_ID}.txt"
OUTPUT_FILE="output_${SLURM_ARRAY_TASK_ID}.txt"

module load python/3.13.0
python process_file.py $INPUT_FILE $OUTPUT_FILE

Best Practices

Resource Estimation

Right-Sizing Your Jobs
  • Start small: Test with minimal resources first
  • Monitor usage: Use sacct to check actual resource usage
  • Don’t over-request: Only ask for what you need
  • Time limits: Be realistic but add some buffer time

File Management

# Use scratch space for temporary files
#SBATCH --chdir=$SCRATCH/myproject

# Copy results back to home
cd $SLURM_SUBMIT_DIR
cp $SCRATCH/myproject/results.txt ./

Error Handling

# Add error checking to your scripts
set -e  # Exit on any error

# Check if files exist before processing
if [ ! -f "input.txt" ]; then
    echo "Error: input.txt not found"
    exit 1
fi

Troubleshooting Common Issues

Job Won’t Start

Symptoms: Job stays in PD (pending) state

Common causes: - Requesting too many resources - Wrong partition name - Resource limits exceeded - System maintenance

Solutions: - Check squeue -u $USER for reason codes - Use sinfo to see available resources - Reduce resource requests if appropriate

Job Fails Immediately

Symptoms: Job completes quickly with non-zero exit code

Common causes: - Module not loaded - Input files missing - Insufficient memory - Wrong file paths

Solutions: - Check error files (*.err) - Test script interactively first - Verify all paths and dependencies

Out of Memory Errors

Symptoms: Job killed due to memory usage

Solutions: - Increase --mem or --mem-per-cpu - Use memory profiling tools - Consider algorithm optimizations

Exercise

Submit Your First Job

Create and submit a simple job script:

  1. Create the job script (first_job.sh):
#!/bin/bash
#SBATCH --job-name=first_job
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --output=first_job_%j.out

echo "Hello from $(hostname)!"
echo "Job ID: $SLURM_JOB_ID"
echo "Current time: $(date)"
sleep 60
echo "Job completed!"
  1. Submit the job:
sbatch first_job.sh
  1. Monitor the job:
squeue -u $USER
  1. Check the output when it completes:
cat first_job_*.out

Python Hello World Job

Now let’s create a more practical job that loads a module and runs Python code:

  1. Create a Python script (hello_world.py):
cat > hello_world.py << 'EOF'
#!/usr/bin/env python3
import sys
import datetime

print("Hello from Python on HPC!")
print(f"Python version: {sys.version}")
print(f"Script running at: {datetime.datetime.now()}")
print(f"Python executable: {sys.executable}")

# Do some simple computation
numbers = [1, 2, 3, 4, 5]
squared = [x**2 for x in numbers]
print(f"Original numbers: {numbers}")
print(f"Squared numbers: {squared}")
print(f"Sum of squares: {sum(squared)}")

print("Python job completed successfully!")
EOF
  1. Create the job script (python_job.sh):

You can check what versions of Python are available:

module avail python 
Tip

Usually, you will want to load the Miniforge module and activate a Conda environment instead of using the base system Python. See our documentation on dependency management

cat > python_job.sh << 'EOF'
#!/bin/bash
#SBATCH --job-name=python_hello
#SBATCH --time=00:02:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --output=python_job_%j.out
#SBATCH --error=python_job_%j.err

# Load the Python module
echo "Loading Python module..."
module load python/3.13.0

# Show which Python we're using
echo "Using Python: $(which python)"
echo "Python version: $(python --version)"

# Run our Python script
echo "Running Python script..."
python hello_world.py

echo "Job completed at $(date)"
EOF
  1. Submit the job:
sbatch python_job.sh
  1. Monitor the job:
squeue -u $USER
  1. Check the output when it completes:
cat python_job_*.out

You should see output showing: - The Python module being loaded - Python version information - The hello world message and computation results - Confirmation that the job completed successfully

What This Exercise Demonstrates
  • Module loading: How to load software in job scripts
  • Python execution: Running Python code on compute nodes
  • Job monitoring: Using output files to verify successful execution
  • Resource specification: Appropriate resource requests for simple scripts

Summary

Key Takeaways
  • Slurm manages all jobs on Aire through a fair scheduling system
  • Job scripts define requirements using #SBATCH directives
  • Right-size your requests - don’t over-request resources
  • Monitor your jobs with squeue and sacct
  • Use appropriate partitions for different types of work
  • Test interactively first before submitting large jobs

Next Steps

Now you can submit and manage jobs on Aire! Let’s move on to Session 6: Best Practices and Troubleshooting to learn how to optimize your HPC workflows.

Additional Resources