Session 3: Storage on Aire

Understanding File Systems and Data Management

Session content

Session aims

By the end of this session, you will be able to:

  • Distinguish between different storage areas and their purposes
  • Navigate between storage locations using environment variables
  • Monitor your disk usage and quotas
  • Transfer files between storage areas and your local machine

View Interactive Slides: Storage Systems on Aire

Scroll to the bottom of this section for practical exercises.

Storage Areas Overview

The Aire HPC file system includes several special directories:

  • Home directory (/users/<username>, env var $HOME) for personal files
  • Scratch directory (/mnt/scratch/<username>, env var $SCRATCH) for large, temporary data
  • Flash (NVMe) scratch ($TMP_SHARED, usually /mnt/flash/tmp/job.<JOB-ID>) for very fast I/O during jobs
  • Node-local scratch ($TMP_LOCAL or $TMPDIR, typically /tmp/job.<JOB-ID>) for fast local storage on each compute node
Critical Storage Rules
  • Home is backed up and not automatically purged
  • Scratch and flash are not backed up and flash is deleted after each job
  • Always copy important results from temporary storage (flash or node-local) back to your home directory or another permanent area before the job ends

Storage Types and Quotas

Storage Type Quota Backup Auto-Delete Best For
Home ($HOME) ~30 GB, 1M files ✅ Yes ❌ No Scripts, configs, small files
Scratch ($SCRATCH) ~1 TB, 500K files ❌ No ❌ No Large datasets, job data
Flash ($TMP_SHARED) ~1 TB per job ❌ No ✅ Yes I/O-intensive tasks
Local ($TMP_LOCAL) Node-specific ❌ No ✅ Yes Single-node fast storage
Data Loss Risk

Data in scratch/flash areas is temporary. These are not backed up and may be purged. Always move important data to your home or external storage when done.

Checking Disk Usage

Using the quota command

Check your current usage and limits:

quota -s    # Human readable format

Example output:

Disk quotas for user yourusername (uid 12345):
     Filesystem   blocks   quota   limit   grace   files   quota   limit
      /users     15000*   30000    33000            200000 1000000 1100000
   /mnt/scratch 100000   1000000 1100000           500000 1500000 1650000

Using the du command

Check disk usage of directories:

du -hs *                    # Size of each directory
du -hs $HOME               # Size of home directory
du -hs $SCRATCH            # Size of scratch directory
Pro Tip

Run du -hs * in your scratch directory to see which subdirectories are taking up the most space. This can be slow for large directories!

File Transfer Methods

Between Storage Areas on Aire

# Copy TO scratch
cp data.txt $SCRATCH/

# Copy FROM scratch back to home
cp $SCRATCH/output.dat $HOME/results/

# Copy entire directory
cp -r $HOME/myproject $SCRATCH/

To/From Your Local Machine

Using scp (Secure Copy)

From local machine to Aire:

scp myfile.txt <username>@target-system:$SCRATCH/

From Aire to local machine:

scp <username>@target-system:/path/results.txt .

Off-campus transfers (with jumphost)

# Using rsync
rsync -r --info=progress2 -e 'ssh -J username@jump-host' file.txt username@target-system:path/

# Using scp  
scp -rq -J username@jump-host username@target-system:path/file.txt local-folder/

Download from Internet

wget https://example.com/data.zip
curl -O https://example.com/data.zip

GUI Tools

For those who prefer graphical interfaces: - FileZilla (cross-platform SFTP client) - WinSCP (Windows) - Cyberduck (Mac/Windows)

Best Practices Summary

Key Storage Principles
  1. Multiple storage areas: Use each storage type appropriately
  2. Regular cleanup: Run quota -s regularly and clean up old files
  3. Data transfer workflow: Transfer input to $SCRATCH before jobs, copy results back after
  4. Avoid data loss: Always backup important data from temporary storage
  5. Organization: Keep your home directory organized and place large data in scratch only when needed

Typical Workflow

  1. Prepare: Upload input data to $SCRATCH
  2. Process: Run jobs using scratch storage
  3. Preserve: Copy important results to backed up research storage
  4. Cleanup: Remove temporary files from $SCRATCH

Exercises

Work through these hands-on exercises to practice storage management on Aire.

Exercise 1: Explore Your Storage Environment

Check your storage locations and current usage:

# Check your environment variables and location
echo "Home: $HOME"
echo "Scratch: $SCRATCH"
pwd

# Check your disk quotas
quota -s

Exercise 2: Create Organized Directory Structure

Set up a proper directory structure in both storage areas:

# Create project structure in home (for permanent files)
cd $HOME
mkdir hpc1-practice
cd hpc1-practice
mkdir scripts results

# Create working structure in scratch (for large/temporary data)
cd $SCRATCH  
mkdir hpc1-data
cd hpc1-data
mkdir input output
ls -la

Exercise 3: Practice File Operations

Create sample files and practice moving data between storage areas:

# Create a sample script in home
cd $HOME/hpc1-practice/scripts
nano process_data.sh

Add this content to the script:

#!/bin/bash
echo "Processing data in: $(pwd)"
echo "Available space in scratch:"
df -h $SCRATCH

Save with Ctrl + O, Enter, Ctrl + X, then continue:

# Make executable and test
chmod +x process_data.sh
./process_data.sh

# Create sample data in scratch
echo "sample,value1,value2" > $SCRATCH/hpc1-data/input/data.csv
echo "exp1,10.5,20.3" >> $SCRATCH/hpc1-data/input/data.csv
echo "exp2,8.9,22.1" >> $SCRATCH/hpc1-data/input/data.csv

# Copy important results back to home (simulate job completion)
cp $SCRATCH/hpc1-data/input/data.csv $HOME/hpc1-practice/results/

Exercise 4: Monitor Usage and Clean Up

Practice disk usage monitoring and cleanup:

# Check sizes of your directories
du -hs $HOME/hpc1-practice
du -hs $SCRATCH/hpc1-data

# Check quota again to see any changes
quota -s

# Practice cleanup - remove temporary files from scratch
rm $SCRATCH/hpc1-data/input/data.csv
ls $SCRATCH/hpc1-data/input/

# Verify important data is safe in home
ls $HOME/hpc1-practice/results/
What You’ve Accomplished
  • ✅ Explored storage locations and quotas
  • ✅ Created organized directory structures
  • ✅ Practiced file creation and copying between storage areas
  • ✅ Monitored disk usage and performed cleanup
  • ✅ Followed the recommended workflow: work in scratch, save to home

Summary

Key Takeaways
  • Multiple storage areas serve different purposes: home (permanent), scratch (temporary)
  • Understand quotas and monitor usage with quota -s and du -hs
  • Use environment variables like $HOME and $SCRATCH for navigation
  • Follow the workflow: work in scratch, save important results to home
  • Data management is critical - scratch areas are not backed up
  • Transfer tools like rsync and scp help move data efficiently

Next Steps

Now you understand how to manage data on Aire! Let’s move on to Session 4: Modules and Software to learn how to access and use different software packages.

Additional Resources