1. Introduction
Version control: your digital lab notebook
In today’s session, we will jump between learning about version control conceptually, and testing out some basic git tasks on our computers.
The aim of today is:
- To help you build a “mental model” of what git is and how it works
- Get you familiar with the basic git workflow
The aim of this session is:
- To highlight the usefulness of version control in research
- To get you set up with a GitHub account and on GitHub codespaces
The presentation linked below uses Reveal.js to build a HTML presentation. You can use the following keyboard shortcuts:
- Navigate slides: Use arrow keys (←/→) or space bar to advance
- Overview mode: Press
Oto see all slides at once - Speaker notes: Press
Sto open speaker view
- Help menu: Press
?to see all keyboard shortcuts - Fullscreen: Press
Fto toggle fullscreen mode - Zoom: Press
Alt+ click to zoom in on slide content - Print/PDF: Add
?print-pdfto the URL for print-friendly version
You can also scroll down to see the presentation content printed all-in-one page below. Note that the content is written for a presentation as opposed to article, and designed for the presentation light theme, so some points may be repeated and not optimised for the dark theme!
Open introduction presentation ↗
Presentation content in a single page
About Research Computing
- Provide specialised guidance and advice for researchers
- Programming, dependency management, code project organisation
- Grant advice regarding computing needs
- Provide research computing training courses
- Provide consultancy work for research projects
- Contact us via the Request Form
Join our community
We’re trying to make build a self-supporting community for researchers who do computation research
Visit arc.leeds.ac.uk/community/join/ to join us!
What is version control?

What can version control do for us?
- Record what changes were made to a file, when, and who by
- Travel back in time
- Manage multiple versions of the same set of files
- Work collaboratively without over-writing each other’s work
What is a version control system?
- Automated and structured way to version control your work
- Usually a software that records a “snapshot” of your work
- Records:
- What files have changes/how they changed since the last snapshot
- Who took the snapshot/made the changes
- A date and time for the snapshot
Why is version control important?
Protect your work
Have you ever “lost” research work?
- Accidentally overwritten results?
- Forgotten where the file is saved/what it’s called?
- Had a file become corrupted or unopenable?
Why is version control important?
Reproducibility
Allows you to…
- Point to a set of results you produced, possibly years ago, and say “this is the version of the code that produced these results!”
- Tie certain versions of code to a preprint of a manuscript, and other versions to the final accepted version
- Associate a Digital Object Identifier (DOI) with a certain version of your code
Why is version control important?
Collaboration and open research
- Helps ensure contributions are recorded at all stages of a project
- Facilitates open-source development of research code
- Allows other researchers to accurately reproduce your methods
What is git?
- Free and open-source version control software
- Most widely used
- Can be installed locally
- Tracks changes in any set of files (code, documents, images, etc.)
- Creates a complete history of your project in a hidden
.gitfolder
What is gitHub?
- Git repository hosting platform
- Accessed through a browser
- Provides a remote ‘hub’ for your code
- Facilitates sharing and collaborating on code projects
How will we learn git on the course?
- We’ll use codespaces, cloud-based development environments hosted on GitHub
- Codespaces allows us to practice using git without a local install
- Local install is recommended for a more complete git experience!
What can we use git for?
gitworks for any files that are in plaintext.- This includes:
.py, .txt, .ipynb, .csv, .json, .c, .cc, .tex, .bib, .md
- Mainly used for code, but can also be used for notes, documentation, and certain data sets (we’ll discuss suitability and caveats later in the course)
What should we not use git for?
Generally speaking, we don’t use it for…
- Local configuration files
- Test outputs and logs
- Passwords and other sensitive data
- Large binary files (like videos and audio)
git terminology
Repo
- We are going to be working in a “repo”, short for “repository”
- This is simply the folder or directory that we want to track
gittracks changes to an entire project folder instead of just a single file
git terminology
Also good to know…
- Local vs remote
clonebranch
We will discuss these in more detail later!
Help me visualise all this!!
Let’s see what this looks like in practice…