Session 2: Writing, testing, linting and formatting your code
So you now have the framework of directories for your code, you have version control set up, and you are working off a branch that isn't main
.
Time for brainstorming, gathering requirements, and pseudocode, before we start writing our actual Python package. Once we have a strong idea of what we want our code to do and how it will look, we will write some draft functions and then build some tests.
Software Development Lifecycle
Implementing 'best practices'
Note that the steps suggested below are not the absolute "best practice" - we suggest a simplified, stripped-down version of the workflow and practices applied in software engineering, in order to make it doable as part of your research. We provide links to further reading throughout these materials.
These steps are the bare minimum
Software engineering uses a framework called the software development life cycle (SDLC) in order to plan, execute and maintain code and software. While this framework is useful to anyone developing code, whether as a software engineer or as a researcher, it does use a lot of jargon and can be difficult to map on to the usual research project trajectory. Fortunately, many people have thought about how to apply the same concepts, objectives, and theories behind frameworks such as SDLC to scientific and research work - one example is this useful article from Dr Carlos Costa, Developing Scientific Software.
While the entire article is a worthwhile read, we are going to pull out and adapt the modified software development cycle included.
Software dev. cycle for scientific computing
Further reading on these topics
SDLC general topics
Implementation
- SOLID Principles: Improve Object-Oriented Design in Python
- The K.I.S.S Principle in Programming
- You aren't gonna need it
Optimisation
- Profiling in Python: How to Find Performance Bottlenecks
- The Python profilers
- Code Refactoring Best Practices – with Python Examples
- Refactoring Python Applications for Simplicity
- Optimizing code
Maintenance
Adapted from Developing Scientific Software by Dr Carlos Costa. In this course, we will be focusing on the "Implementation cycle"; as this builds a good base for then later optimising and extending your code. The steps in the "Implementation cycle" are developed below, while links to relevant notes on the later stages are provided in the call out box on the right.
Implementation cycle
- Gather requirements
- Sketch the design
- Write initial code
- Write initial tests
- Iterative development
Optimisation cycle
- Profile your code
- Refactor and optimise
New methods and maintenance
- Add new methods
- Extend tests if needed
- Test against new versions of dependencies
Please read Developing Scientific Software for further information on the later steps not covered in this course.
Implementation cycle in more detail
- Gather requirements: What does the code need to be able to do? What sort of inputs and outputs do you expect? How will it be used? By whom?
- Sketch the design: Write some pseudocode, sketch a diagram, jot down how the code might work.
- Write initial code: Write your simple, first-step code, using as few imports as possible.
- Write initial tests: Create and run some simple tests to verify your first version of the code.
- Iterative development Add new tests to ensure coverage of any code added.
Note that the order of steps three and four can be swapped if you prefer a test-driven development workflow.
Begin the process for your project
Gather requirements
Step 1: Gather your requirements
Jot down the answers to these questions. These can be kept in a file within your repository if you are happy with them being public.
- What does your code need to do?
- How will it be used?
- Who will be using it?
- What platform is it being run on?
In our example, the code needs to be able to calculate the hypotenuse of a right-angled triangle, given the opposite and adjacent side lengths. It will be used as part of a larger project, and might be used in different settings, for different subjects, so should be flexible. It might be used on desktop machines but also on a larger HPC platform.
While we are using Python as the example language for this project, if you are familiar or comfortable with multiple languages, the requirements gathering stage might help you to choose what language is suitable. For example, will the language be available on the platforms you plan on using? Is the language commonly used by people in the research area?
Sketch out your code design
It's time to figure out what functions your code will include, and what the expected input and output of these functions will be. How will the different tasks be split up? How will the calculations be performed? Are your requirements met?
Step 2: Write some pseudocode
Different people write pseudocode in different ways. Feel free to draw with arrows, boxes, or write code-like text.
Pseudocode is for you, to help you figure out how to write your code.
Different ways you might sketch out your code:
\(a^2 + b^2 = c^2 \rightarrow c = \sqrt{a^2 + b^2}\)
Opp, Adj -> (opp^2 + adj^2)^(0.5) -> hyp
def hyp func (opp, adj):
hyp^2 = (opp**2) + (adj**2)
hyp = (hyp^2)**0.5
return hyp
Write the initial code
Once you have your pseudocode written, you should know exactly what functions you need and what arguments and returns to expect, and potentially if you will require any external libraries. You can turn your psuedocode into code scaffolding and then fill it out. Say you have sketched a function that looks something like this:
arg1, arg2, arg3, arg4 \(\rightarrow\) some incredible maths using numpy \(\rightarrow\) ans1, ans2
We can write some function scaffolding like this:
def basic_function(arg1, arg2, arg3, arg4):
return(ans1, ans2)
Then you can start filling out the function to get from input \(\rightarrow\) output.
When writing your code, it is often useful to leave lots of comments explaining what you want to do, why you have done something a certain way, or if something doesn't work as expected. You can clean these up later as you add more proper documentation.
# function needs to return ans1 and 2
# raised an error when I used a value of x for arg3
def basic_function(arg1, arg2, arg3, arg4):
return(ans1, ans2)
Step 3: Write initial code
Inside your src/example_package/source.py
file, you can start adding actual functions.
Add documentation as you go; see the code snippets below for suggestions on how to format your functions and docstrings, and add in-code comments with #
. Since this is a small package, put everything into source.py
; feel free to change the name if desired. In a larger project, you can group different functions by purpose into sensibly names "modules". You shouldn't require any external packages for this example.
There are a number of different standards for docstring formatting; this brief article outlines some of the options. We will be using the "Google" docstring format in this tutorial as it is very commonly used, but please feel free to choose a format that suits you best. Consistency is key within a project, so stick to a style for all of your docstrings to ensure any automated tools you use work with them.
Your devcontainer comes with a few VSCode extensions preloaded, including autoDocstring - Python Docstring Generator which generates a docstring template for you inside functions. We will later introduce you to a tool that helps you to build a simple documentation website from your code, but this requires your docstrings to be formatted correctly so that it can read them and load them in. Before you choose to insert an automatically generated docstring template, make sure that all your arguments and returns are present so that the extension automatically captures them in the template.
def your_function_name(argument, default_argument="value"):
"""A one line summary of your code
A more detailed description if you want to add context. This docstring
uses the Google format. The numpy format is also very commonly used.
Args:
argument (integer): _description_
default_argument (string, optional): _description_. Defaults to "value".
Returns:
argument (integer): _description_
"""
return argument
The Autodocstring plugin works best when you have specified arguments and returns in your function, but can always be edited by hand at a later point.
Write initial tests
We are going to be using pytest
, which is already installed in your packaging-env
in your dev container. Thepytest
documentation suggests that each test has four parts:
- Arrange: you set the test up; you define variables/example data.
- Act: you run the functions you want to test.
- Assert: you check the answers to these functions are expected.
- Clean-up: you wipe the board clean and delete any variables or outputs.
These tests will go into your test
directory, in a Python file that begins with test_
, and are essentially functions who's names also begin with test_
- this means that pytest
will be able to find and identify them as tests. Whew, the word "test" has almost lost meaning by now.
In practise, a test might look like this:
def test_example(self):
"""Test for the example function"""
# Arrange
test_variable_1 = 0
test_variable_2 = 1
expected_output = 7
# Act
output = your_function(test_variable_1, test_variable_2)
# Assert
assert output == expected_output
# No cleanup needed
You can see that testing in Python depends heavily on assert statements.
- You can use a basic assert statement to check if output is identical, eg.
assert one == 1
. - For floating point numbers of values where tolerance is required, you can use the
pytest.approx()
function -- see documentation here; remember that this will require an import statement likefrom pytest import approx
at the beginning of your test script. You can define tolerance to suit your approach. - The
math
library also includes aisclose()
function -- see documentation here. - The
numpy.testing
module contains many different assert statements for arrays -- see documentation here.
Once you've written your tests, you can run pytest
from the conbda env where it's installed, in the top-level directory (where your src/
and tests/
directories are). See details on running pytest
here.
Step 4: Write and run initial tests
First, sketch pseudocode for your tests.
- If given a specific input, what specific output do you expect?
- What are some weird, edge cases that might trip your code up?
- How might you separate out code-testing vs. scientific validation?
- Are you matching integers with
==
, or will you have to include tolerances? - Do you need to import any external libraries into your test script, like
pytest
ornumpy
?
Run your tests. Can you break your code so a test fails?
Linting and formatting your code
You'll have noticed as you type your code, that you will see syntax highlighting that acts a bit like spellchecker in Word. This is because we loaded in the ruff
linter, and a code spell checker. This quickly catches any small mistakes you might make.
Additionally, we have included the black
formatter, which will reformat your code to match PEP8. You can have a look at what black
does using this online "playground". You can run black
from the command line within your packing-env
conda environment:
black {source_file_or_directory}
NOTE: this will change the files to follow the black style guide. Please add and commit changes before you apply this formatter, so that you can roll back changes if you no longer want the formatted version. Run your tests immediately after formatting to ensure the code still passes. Use `git restore
to undo your changes.