flowchart LR
A[Write Job Script] --> B[Submit with sbatch]
B --> C[Job in Queue]
C --> D[Resources Available?]
D -->|Yes| E[Job Runs]
D -->|No| C
E --> F[Job Completes]
Running Jobs on the Slurm Scheduler
Without a scheduler:
With a scheduler:
Note
A job scheduler organizes when and where jobs run, allocates resources, and ensures fair access for all users.
flowchart LR
A[Write Job Script] --> B[Submit with sbatch]
B --> C[Job in Queue]
C --> D[Resources Available?]
D -->|Yes| E[Job Runs]
D -->|No| C
E --> F[Job Completes]
sbatch| Command | Purpose | Example |
|---|---|---|
sbatch |
Submit a job | sbatch myjob.sh |
squeue |
View job queue | squeue -u $USER |
scancel |
Cancel a job | scancel 12345 |
sinfo |
View node information | sinfo |
sacct |
View job accounting | sacct -j 12345 |
#!/bin/bash
#SBATCH --job-name=myjob # Job name
#SBATCH --partition=node # Partition to use
#SBATCH --time=01:00:00 # Time limit (1 hour)
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks=1 # Number of tasks
#SBATCH --cpus-per-task=4 # CPUs per task
#SBATCH --mem=8G # Memory per node
#SBATCH --output=myjob_%j.out # Output file (%j = job ID)
#SBATCH --error=myjob_%j.err # Error file
# Load required modules
module load python/3.13.0
# Run your program
echo "Job started at $(date)"
python my_script.py
echo "Job finished at $(date)"| Directive | Purpose | Example |
|---|---|---|
--job-name |
Name for your job | --job-name=analysis |
--partition |
Queue to use | --partition=himem |
--time |
Maximum runtime | --time=02:30:00 |
--nodes |
Number of nodes | --nodes=2 |
--ntasks |
Number of tasks | --ntasks=8 |
--cpus-per-task |
CPUs per task | --cpus-per-task=4 |
--mem |
Memory per node | --mem=16G |
--output |
Output file | --output=job_%j.out |
Tip
Always specify realistic time limits and memory requirements!
Single Core Job:
Multi-core (Shared Memory):
# Different time limit formats
#SBATCH --time=30:00 # 30 minutes
#SBATCH --time=2:00:00 # 2 hours
#SBATCH --time=1-12:00:00 # 1 day, 12 hoursWarning
Format: DD-HH:MM:SS or HH:MM:SS
Be realistic with time requests - jobs are killed when time limit is reached!
Submit many similar jobs efficiently:
#!/bin/bash
#SBATCH --job-name=array_job
#SBATCH --partition=test
#SBATCH --time=01:00:00
#SBATCH --array=1-10 # Submit jobs 1 through 10
#SBATCH --output=job_%A_%a.out # %A = array job ID, %a = task ID
# Process different input files
INPUT_FILE="input_${SLURM_ARRAY_TASK_ID}.txt"
OUTPUT_FILE="output_${SLURM_ARRAY_TASK_ID}.txt"
python process_file.py $INPUT_FILE $OUTPUT_FILEEmbarassingly parallel
Perfect for parameter sweeps or processing multiple datasets!
Check Queue Status:
Note
Use sacct to see actual resource usage and optimize future jobs!
Right-Sizing Your Jobs
sacct to check actual resource usageGood Practices:
Resource Management:
Job Won’t Start (PD state):
sinfoJob Fails Immediately:
*.err filesOut of Memory:
--memTip
Always check error files and test scripts interactively first!
Key Points
#SBATCH directives to specify requirementssqueue and sacct for optimizationRemember: Start small, test thoroughly, monitor usage, and optimize!

HPC1: Introduction to High Performance Computing | University of Leeds