Version control with git and GitHub
About Research Computing
- Provide specialised guidance and advice for researchers
- Programming, dependency management, code project organisation
- Grant advice regarding computing needs
- Provide research computing training courses
- Provide consultancy work for research projects
- Contact us via the Request Form
What is version control?
![Cartoon of a researcher saving various different versions of a file under increasingly complicated and confusing file names.]()
One way of doing version control…
What can version control do for us?
- Record what changes were made to a file, when, and who by
- Travel back in time
- Manage multiple versions of the same set of files
- Work collaboratively without over-writing each other’s work
What is a version control system?
- Automated and structured way to version control your work
- Usually a software that records a “snapshot” of your work
- Records:
- What files have changes/how they changed since the last snapshot
- Who took the snapshot/made the changes
- A date and time for the snapshot
Why is version control important?
Protect your work
Have you ever “lost” research work?
- Accidentally overwritten results?
- Forgotten where the file is saved/what it’s called?
- Had a file become corrupted or unopenable?
Why is version control important?
Reproducibility
Allows you to…
- Point to a set of results you produced, possibly years ago, and say “this is the version of the code that produced these results!”
- Tie certain versions of code to a preprint of a manuscript, and other versions to the final accepted version
- Associate a Digital Object Identifier (DOI) with a certain version of your code
Why is version control important?
Collaboration and open research
- Helps ensure contributions are recorded at all stages of a project
- Facilitates open-source development of research code
- Allows other researchers to accurately reproduce your methods
What is git?
- Free and open-source version control software
- Most widely used
- Can be installed locally
- Tracks changes in any set of files (code, documents, images, etc.)
- Creates a complete history of your project in a hidden
.git folder
What is gitHub?
- Git repository hosting platform
- Accessed through a browser
- Provides a remote ‘hub’ for your code
- Facilitates sharing and collaborating on code projects
How will we learn git on the course?
- We’ll use codespaces, cloud-based development environments hosted on GitHub
- Codespaces allows us to practice using git without a local install
- Local install is recommended for a more complete git experience!
What can we use git for?
git works for any files that are in plaintext.
- This includes:
.py, .txt, .ipynb, .csv, .json, .c, .cc, .tex, .bib, .md
- Mainly used for code, but can also be used for notes, documentation, and certain data sets (we’ll discuss suitability and caveats later in the course)
What should we not use git for?
Generally speaking, we don’t use it for…
- Local configuration files
- Test outputs and logs
- Passwords and other sensitive data
- Large binary files (like videos and audio)
Repo
- We are going to be working in a “repo”, short for “repository”
- This is simply the folder or directory that we want to track
git tracks changes to an entire project folder instead of just a single file
git terminology
Also good to know…
- Local vs remote
clone
branch
We will discuss these in more detail later!
Help me visualise all this!!
Let’s see what this looks like in practice…
Time for the first practical…