Skip to main content

1. Introduction

Version control: your digital lab notebook

In today’s session, we will jump between learning about version control conceptually, and testing out some basic git tasks on our computers.

The aim of today is:

The aim of this session is:

TipReveal.js Presentation Controls

The presentation linked below uses Reveal.js to build a HTML presentation. You can use the following keyboard shortcuts:

  • Navigate slides: Use arrow keys (←/→) or space bar to advance
  • Overview mode: Press O to see all slides at once
  • Speaker notes: Press S to open speaker view
  • Help menu: Press ? to see all keyboard shortcuts
  • Fullscreen: Press F to toggle fullscreen mode
  • Zoom: Press Alt + click to zoom in on slide content
  • Print/PDF: Add ?print-pdf to the URL for print-friendly version

You can also scroll down to see the presentation content printed all-in-one page below. Note that the content is written for a presentation as opposed to article, and designed for the presentation light theme, so some points may be repeated and not optimised for the dark theme!

Open introduction presentation ↗

Presentation content in a single page

About Research Computing

  • Provide specialised guidance and advice for researchers
    • Programming, dependency management, code project organisation
    • Grant advice regarding computing needs
  • Provide research computing training courses
  • Provide consultancy work for research projects
  • Contact us via the Request Form

Join our community

We’re trying to make build a self-supporting community for researchers who do computation research

Visit arc.leeds.ac.uk/community/join/ to join us!

What is version control?

Cartoon of a researcher saving various different versions of a file under increasingly complicated and confusing file names.

One way of doing version control…

What can version control do for us?

  • Record what changes were made to a file, when, and who by
  • Travel back in time
  • Manage multiple versions of the same set of files
  • Work collaboratively without over-writing each other’s work

What is a version control system?

  • Automated and structured way to version control your work
  • Usually a software that records a “snapshot” of your work
  • Records:
    • What files have changes/how they changed since the last snapshot
    • Who took the snapshot/made the changes
    • A date and time for the snapshot

Why is version control important?

Protect your work

Have you ever “lost” research work?

  • Accidentally overwritten results?
  • Forgotten where the file is saved/what it’s called?
  • Had a file become corrupted or unopenable?

Why is version control important?

Reproducibility

Allows you to…

  • Point to a set of results you produced, possibly years ago, and say “this is the version of the code that produced these results!”
  • Tie certain versions of code to a preprint of a manuscript, and other versions to the final accepted version
  • Associate a Digital Object Identifier (DOI) with a certain version of your code

Why is version control important?

Collaboration and open research
  • Helps ensure contributions are recorded at all stages of a project
  • Facilitates open-source development of research code
  • Allows other researchers to accurately reproduce your methods

What is git?

  • Free and open-source version control software
  • Most widely used
  • Can be installed locally
  • Tracks changes in any set of files (code, documents, images, etc.)
  • Creates a complete history of your project in a hidden .git folder

What is gitHub?

  • Git repository hosting platform
  • Accessed through a browser
  • Provides a remote ‘hub’ for your code
  • Facilitates sharing and collaborating on code projects

How will we learn git on the course?

  • We’ll use codespaces, cloud-based development environments hosted on GitHub
  • Codespaces allows us to practice using git without a local install
  • Local install is recommended for a more complete git experience!

What can we use git for?

  • git works for any files that are in plaintext.
  • This includes:

.py, .txt, .ipynb, .csv, .json, .c, .cc, .tex, .bib, .md

  • Mainly used for code, but can also be used for notes, documentation, and certain data sets (we’ll discuss suitability and caveats later in the course)

What should we not use git for?

Generally speaking, we don’t use it for…

  • Local configuration files
  • Test outputs and logs
  • Passwords and other sensitive data
  • Large binary files (like videos and audio)

git terminology

Repo

  • We are going to be working in a “repo”, short for “repository”
    • This is simply the folder or directory that we want to track
    • git tracks changes to an entire project folder instead of just a single file

git terminology

Also good to know…

  • Local vs remote
  • clone
  • branch

We will discuss these in more detail later!

Help me visualise all this!!

Let’s see what this looks like in practice…

Time for the first practical…