Skip to content

Introduction

Further materials

Links are provided throughout these notes to point you towards more detailed and advanced topics.

A great starting place is our detailed documentation for the two day version of this course, available here

We will start this workshop off with a presentation that will introduce a range of topics, methods, approaches and tools at a theoretical level. The rest of the day will be spent putting these topics into practice.

Authors

Including content from SWD3: Software development practices for Research (2023) by

"Good enough" software engineering

While we will talk about some "good enough practices" in software engineering, remember that these are not best practices - there's so much more you can apply, but we are aware that you are first and foremost researchers, who likely have quite strict time and financial budgets. We have tried to intersperse further reading links and suggestions for those that are interested in pursuing this further.

This will come up again in the introduction talk, but for the absolute bare minimum, prevent your code from falling apart by using the DeReLiCT acronym. This stands for: recording Dependencies, putting your code in a Repository, adding a Licence to your repository, making it easy to Cite your code, and using Tests to verify that your code is working.

We will be using the notes here as a format for the course, but will also be referencing our documentation here for the longer two-day version of this course. This website is an excellent reference that goes more in-depth to the topics we are going to cover today, so please bookmark it and come back to it to refresh yourself on these topics after the course!

If nothing else, apply the DeReLiCT acronym to your code

Science that is based on untested, undocumented code that is not publicly available, is neither robust nor reproducible. While scientific publishers and journals are still catching up to the proliferation of code use in research, more and more journals now require open and shared code as part of the peer review process.

Reviews that are more technically competent should be on the lookout for benchmarking, tests, and comparison of numerical methods to analytical solutions.

Read more about how to follow these criteria here, and see how the FAIR data principles are applied to software here.

Introduction presentation

If the embedded presentation below doesn't load, the presentation slides can also be accessed here.

Pythagoras: Introducing our case study

As part of a larger scale research project, you need to use a groundbreaking, novel mathematical theorem: the Pythagorean theorem (otherwise known as Pythagoras' theorem)! You are using this in a project analysing the mechanical load-bearing capacities of roofs under heavy snowfall.

Pythagorean theorem

In a right-angled triangle, the area of the square whose side is the hypotenuse (the side opposite the right angle) is equal to the sum of the areas of the squares on the other two sides.

\(a^2 + b^2 = c^2\) ; \(c = \sqrt{a^2 + b^2}\)

Visual representation of the Pythagorean theorem; Wikimedia

Visual representation of the Pythagorean theorem; Wikimedia.

Real world example

If only research software was as simple as implementing the Pythagorean theorem to calculate a hypotenuse!

To get an idea of the scope of work we are talking about, I'm going to use my research as an example. My overall research project was looking at metallic intrusions into planetesimal mantles after two planetesimals violently collided - the formation of Pallasite meteorites. I wanted to model the cooling and crystallisation of molten metal injected into a solid, colder crystalline mantle.

In order to do this, I implemented the Crank-Nicolson method in 3D using the fractional Step method, and including a macro-scale crystallisation model. This complex mathematics essentially involved a cube that contained two different materials, once of which was solid and cold, and the other which was hit an initially molten. Because this has application outside of researching the specifics of molten metal mixing with solid olivine crystals, instead of hard-coding in parameters such as heat capacity and density of the two materials, I let these be defined at a function level.

This meant that at a later point in my PhD, I was able to re-use my code to investigate melting of sulfides embedded in metal on the microscale without any rewriting of my code needed.

While you need to write some Python code that will calculate the length of the hypotenuse of a right-angled triangle in a specific context for your research project, you know you will likely need to do similar calculations when producing data for your next paper; also, your colleagues often need to calculate this too, so it's probably useful for your code to be a little bit flexible so that it can be re-used easily.

You also know that you want to run a bunch of preliminary calculations with this code on your laptop, and it would be great if you could get your collaborator/supervisor to run some calculations on their data too. Once you are happy with your initial calculations, you plan on running some more intensive calculations that will include this hypotenuse calculation on a HPC cluster (such as ARC).

Your task

Over the course of this workshop, you are going to:

  1. Brainstorm how you might set up the code to solve this problem, and write some pseudo-code
  2. Set up a reproducible coding environment
  3. Create a standard Python project folder structure in a git repository (publicly accessible, version controlled)
  4. Write some code with tests and comments/documentation, using linters/formatter to keep your code clean
  5. Create a citeable release on your GitHub repository
  6. Build a Python package, automated workflows, and a GitHub pages documentation webpage

You can take these steps in a different order; this is just a suggested workflow. We might not reach the final stages listed above during the day, but we have compiled detailed documentation for you to complete this in your own time.