Session 3: Storage on Aire
Understanding File Systems and Data Management
Session content
Session aims
By the end of this session, you will be able to:
- Distinguish between different storage areas and their purposes
- Navigate between storage locations using environment variables
- Monitor your disk usage and quotas
- Transfer files between storage areas and your local machine
View Interactive Slides: Storage Systems on Aire
Scroll to the bottom of this section for practical exercises.
Storage Areas Overview
The Aire HPC file system includes several special directories:
- Home directory (
/users/<username>, env var$HOME) for personal files - Scratch directory (
/mnt/scratch/<username>, env var$SCRATCH) for large, temporary data
- Flash (NVMe) scratch (
$TMP_SHARED, usually/mnt/flash/tmp/job.<JOB-ID>) for very fast I/O during jobs - Node-local scratch (
$TMP_LOCALor$TMPDIR, typically/tmp/job.<JOB-ID>) for fast local storage on each compute node
- Home is backed up and not automatically purged
- Scratch and flash are not backed up and flash is deleted after each job
- Always copy important results from temporary storage (flash or node-local) back to your home directory or another permanent area before the job ends
Storage Types and Quotas
| Storage Type | Quota | Backup | Auto-Delete | Best For |
|---|---|---|---|---|
Home ($HOME) |
~30 GB, 1M files | ✅ Yes | ❌ No | Scripts, configs, small files |
Scratch ($SCRATCH) |
~1 TB, 500K files | ❌ No | ❌ No | Large datasets, job data |
Flash ($TMP_SHARED) |
~1 TB per job | ❌ No | ✅ Yes | I/O-intensive tasks |
Local ($TMP_LOCAL) |
Node-specific | ❌ No | ✅ Yes | Single-node fast storage |
Data in scratch/flash areas is temporary. These are not backed up and may be purged. Always move important data to your home or external storage when done.
Checking Disk Usage
Using the quota command
Check your current usage and limits:
quota -s # Human readable formatExample output:
Disk quotas for user yourusername (uid 12345):
Filesystem blocks quota limit grace files quota limit
/users 15000* 30000 33000 200000 1000000 1100000
/mnt/scratch 100000 1000000 1100000 500000 1500000 1650000
Using the du command
Check disk usage of directories:
du -hs * # Size of each directory
du -hs $HOME # Size of home directory
du -hs $SCRATCH # Size of scratch directoryRun du -hs * in your scratch directory to see which subdirectories are taking up the most space. This can be slow for large directories!
File Transfer Methods
Between Storage Areas on Aire
# Copy TO scratch
cp data.txt $SCRATCH/
# Copy FROM scratch back to home
cp $SCRATCH/output.dat $HOME/results/
# Copy entire directory
cp -r $HOME/myproject $SCRATCH/To/From Your Local Machine
Using scp (Secure Copy)
From local machine to Aire:
scp myfile.txt <username>@target-system:$SCRATCH/From Aire to local machine:
scp <username>@target-system:/path/results.txt .Using rsync (Recommended for large transfers)
rsync -avh data/ <username>@target-system:path/data/Benefits of rsync: - Only transfers changed files - Resumes interrupted transfers - Progress indicators with --info=progress2
Off-campus transfers (with jumphost)
# Using rsync
rsync -r --info=progress2 -e 'ssh -J username@jump-host' file.txt username@target-system:path/
# Using scp
scp -rq -J username@jump-host username@target-system:path/file.txt local-folder/Download from Internet
wget https://example.com/data.zip
curl -O https://example.com/data.zipGUI Tools
For those who prefer graphical interfaces: - FileZilla (cross-platform SFTP client) - WinSCP (Windows) - Cyberduck (Mac/Windows)
Best Practices Summary
- Multiple storage areas: Use each storage type appropriately
- Regular cleanup: Run
quota -sregularly and clean up old files
- Data transfer workflow: Transfer input to
$SCRATCHbefore jobs, copy results back after - Avoid data loss: Always backup important data from temporary storage
- Organization: Keep your home directory organized and place large data in scratch only when needed
Typical Workflow
- Prepare: Upload input data to
$SCRATCH - Process: Run jobs using scratch storage
- Preserve: Copy important results to backed up research storage
- Cleanup: Remove temporary files from
$SCRATCH
Exercises
Work through these hands-on exercises to practice storage management on Aire.
Exercise 1: Explore Your Storage Environment
Check your storage locations and current usage:
# Check your environment variables and location
echo "Home: $HOME"
echo "Scratch: $SCRATCH"
pwd
# Check your disk quotas
quota -sExercise 2: Create Organized Directory Structure
Set up a proper directory structure in both storage areas:
# Create project structure in home (for permanent files)
cd $HOME
mkdir hpc1-practice
cd hpc1-practice
mkdir scripts results
# Create working structure in scratch (for large/temporary data)
cd $SCRATCH
mkdir hpc1-data
cd hpc1-data
mkdir input output
ls -laExercise 3: Practice File Operations
Create sample files and practice moving data between storage areas:
# Create a sample script in home
cd $HOME/hpc1-practice/scripts
nano process_data.shAdd this content to the script:
#!/bin/bash
echo "Processing data in: $(pwd)"
echo "Available space in scratch:"
df -h $SCRATCHSave with Ctrl + O, Enter, Ctrl + X, then continue:
# Make executable and test
chmod +x process_data.sh
./process_data.sh
# Create sample data in scratch
echo "sample,value1,value2" > $SCRATCH/hpc1-data/input/data.csv
echo "exp1,10.5,20.3" >> $SCRATCH/hpc1-data/input/data.csv
echo "exp2,8.9,22.1" >> $SCRATCH/hpc1-data/input/data.csv
# Copy important results back to home (simulate job completion)
cp $SCRATCH/hpc1-data/input/data.csv $HOME/hpc1-practice/results/Exercise 4: Monitor Usage and Clean Up
Practice disk usage monitoring and cleanup:
# Check sizes of your directories
du -hs $HOME/hpc1-practice
du -hs $SCRATCH/hpc1-data
# Check quota again to see any changes
quota -s
# Practice cleanup - remove temporary files from scratch
rm $SCRATCH/hpc1-data/input/data.csv
ls $SCRATCH/hpc1-data/input/
# Verify important data is safe in home
ls $HOME/hpc1-practice/results/- ✅ Explored storage locations and quotas
- ✅ Created organized directory structures
- ✅ Practiced file creation and copying between storage areas
- ✅ Monitored disk usage and performed cleanup
- ✅ Followed the recommended workflow: work in scratch, save to home
Summary
- Multiple storage areas serve different purposes: home (permanent), scratch (temporary)
- Understand quotas and monitor usage with
quota -sanddu -hs - Use environment variables like
$HOMEand$SCRATCHfor navigation - Follow the workflow: work in scratch, save important results to home
- Data management is critical - scratch areas are not backed up
- Transfer tools like
rsyncandscphelp move data efficiently
Next Steps
Now you understand how to manage data on Aire! Let’s move on to Session 4: Modules and Software to learn how to access and use different software packages.