Session 2: Writing, testing, linting and formatting your code

So you now have the framework of directories for your code, you have version control set up, and you are working off a branch that isn't main.

Time for brainstorming, gathering requirements, and pseudocode, before we start writing our actual Python package. Once we have a strong idea of what we want our code to do and how it will look, we will write some draft functions and then build some tests.

Software Development Lifecycle

Implementing 'best practices'

Note that the steps suggested below are not the absolute "best practice" - we suggest a simplified, stripped-down version of the workflow and practices applied in software engineering, in order to make it doable as part of your research. We provide links to further reading throughout these materials.

These steps are the bare minimum

Software engineering uses a framework called the software development life cycle (SDLC) in order to plan, execute and maintain code and software. While this framework is useful to anyone developing code, whether as a software engineer or as a researcher, it does use a lot of jargon and can be difficult to map on to the usual research project trajectory. Fortunately, many people have thought about how to apply the same concepts, objectives, and theories behind frameworks such as SDLC to scientific and research work - one example is this useful article from Dr Carlos Costa, Developing Scientific Software.

While the entire article is a worthwhile read, we are going to pull out and adapt the modified software development cycle included.

Software dev. cycle for scientific computing

Implementation cycle

Gather requirements
Sketch the design
Write initial code
Write initial tests
Iterative development

Optimisation cycle

Profile your code
Refactor and optimise

New methods and maintenance

Add new methods
Extend tests if needed
Test against new versions of dependencies

Please read Developing Scientific Software for further information on the later steps not covered in this course.

Implementation cycle in more detail

Gather requirements: What does the code need to be able to do? What sort of inputs and outputs do you expect? How will it be used? By whom?
Sketch the design: Write some pseudocode, sketch a diagram, jot down how the code might work.
Write initial code: Write your simple, first-step code, using as few imports as possible.
Write initial tests: Create and run some simple tests to verify your first version of the code.
Iterative development Add new tests to ensure coverage of any code added.

Note that the order of steps three and four can be swapped if you prefer a test-driven development workflow.

Begin the process for your project

Gather requirements

Step 1: Gather your requirements

Jot down the answers to these questions. These can be kept in a file within your repository if you are happy with them being public.

What does your code need to do?
How will it be used?
Who will be using it?
What platform is it being run on?

In our example, the code needs to be able to calculate the hypotenuse of a right-angled triangle, given the opposite and adjacent side lengths. It will be used as part of a larger project, and might be used in different settings, for different subjects, so should be flexible. It might be used on desktop machines but also on a larger HPC platform.

While we are using Python as the example language for this project, if you are familiar or comfortable with multiple languages, the requirements gathering stage might help you to choose what language is suitable. For example, will the language be available on the platforms you plan on using? Is the language commonly used by people in the research area?

Sketch out your code design

It's time to figure out what functions your code will include, and what the expected input and output of these functions will be. How will the different tasks be split up? How will the calculations be performed? Are your requirements met?

Step 2: Write some pseudocode

Different people write pseudocode in different ways. Feel free to draw with arrows, boxes, or write code-like text.

Pseudocode is for you, to help you figure out how to write your code.

Different ways you might sketch out your code:

\(a^2 + b^2 = c^2 \rightarrow c = \sqrt{a^2 + b^2}\)

Opp, Adj -> (opp^2 + adj^2)^(0.5) -> hyp

def hyp func (opp, adj):

    hyp^2 = (opp**2) + (adj**2)

    hyp = (hyp^2)**0.5

    return hyp

Write the initial code

Once you have your pseudocode written, you should know exactly what functions you need and what arguments and returns to expect, and potentially if you will require any external libraries. You can turn your psuedocode into code scaffolding and then fill it out. Say you have sketched a function that looks something like this:

arg1, arg2, arg3, arg4 \(\rightarrow\) some incredible maths using numpy \(\rightarrow\) ans1, ans2

We can write some function scaffolding like this:

def basic_function(arg1, arg2, arg3, arg4):
    return(ans1, ans2)

Then you can start filling out the function to get from input \(\rightarrow\) output.

When writing your code, it is often useful to leave lots of comments explaining what you want to do, why you have done something a certain way, or if something doesn't work as expected. You can clean these up later as you add more proper documentation.

# function needs to return ans1 and 2
# raised an error when I used a value of x for arg3
def basic_function(arg1, arg2, arg3, arg4):
    return(ans1, ans2)

Step 3: Write initial code

Inside your src/example_package/source.py file, you can start adding actual functions. Add documentation as you go; see the code snippets below for suggestions on how to format your functions and docstrings, and add in-code comments with #. Since this is a small package, put everything into source.py; feel free to change the name if desired. In a larger project, you can group different functions by purpose into sensibly names "modules". You shouldn't require any external packages for this example.

There are a number of different standards for docstring formatting; this brief article outlines some of the options. We will be using the "Google" docstring format in this tutorial as it is very commonly used, but please feel free to choose a format that suits you best. Consistency is key within a project, so stick to a style for all of your docstrings to ensure any automated tools you use work with them.

Your devcontainer comes with a few VSCode extensions preloaded, including autoDocstring - Python Docstring Generator which generates a docstring template for you inside functions. We will later introduce you to a tool that helps you to build a simple documentation website from your code, but this requires your docstrings to be formatted correctly so that it can read them and load them in. Before you choose to insert an automatically generated docstring template, make sure that all your arguments and returns are present so that the extension automatically captures them in the template.

def your_function_name(argument, default_argument="value"):
    """A one line summary of your code

    A more detailed description if you want to add context. This docstring
    uses the Google format. The numpy format is also very commonly used.

    Args:
        argument (integer): _description_
        default_argument (string, optional): _description_. Defaults to "value".

    Returns:
        argument (integer): _description_
    """
    return argument

The Autodocstring plugin works best when you have specified arguments and returns in your function, but can always be edited by hand at a later point.

Write initial tests

We are going to be using pytest, which is already installed in your packaging-envin your dev container. Thepytest documentation suggests that each test has four parts:

Arrange: you set the test up; you define variables/example data.
Act: you run the functions you want to test.
Assert: you check the answers to these functions are expected.
Clean-up: you wipe the board clean and delete any variables or outputs.

These tests will go into your test directory, in a Python file that begins with test_, and are essentially functions who's names also begin with test_ - this means that pytest will be able to find and identify them as tests. Whew, the word "test" has almost lost meaning by now.

In practise, a test might look like this:

def test_example(self):
    """Test for the example function"""

    # Arrange
    test_variable_1 = 0
    test_variable_2 = 1
    expected_output = 7

    # Act
    output = your_function(test_variable_1, test_variable_2)

    # Assert
    assert output == expected_output

    # No cleanup needed

You can see that testing in Python depends heavily on assert statements.

You can use a basic assert statement to check if output is identical, eg. assert one == 1.
For floating point numbers of values where tolerance is required, you can use the pytest.approx() function -- see documentation here; remember that this will require an import statement like from pytest import approx at the beginning of your test script. You can define tolerance to suit your approach.
The math library also includes a isclose() function -- see documentation here.
The numpy.testing module contains many different assert statements for arrays -- see documentation here.

Once you've written your tests, you can run pytest from the conbda env where it's installed, in the top-level directory (where your src/ and tests/ directories are). See details on running pytest here.

Step 4: Write and run initial tests

First, sketch pseudocode for your tests.

If given a specific input, what specific output do you expect?
What are some weird, edge cases that might trip your code up?
How might you separate out code-testing vs. scientific validation?
Are you matching integers with ==, or will you have to include tolerances?
Do you need to import any external libraries into your test script, like pytest or numpy?

Run your tests. Can you break your code so a test fails?

Linting and formatting your code

You'll have noticed as you type your code, that you will see syntax highlighting that acts a bit like spellchecker in Word. This is because we loaded in the ruff linter, and a code spell checker. This quickly catches any small mistakes you might make.

Additionally, we have included the black formatter, which will reformat your code to match PEP8. You can have a look at what black does using this online "playground". You can run black from the command line within your packing-env conda environment:

black {source_file_or_directory}

NOTE: this will change the files to follow the black style guide. Please add and commit changes before you apply this formatter, so that you can roll back changes if you no longer want the formatted version. Run your tests immediately after formatting to ensure the code still passes. Use `git restore to undo your changes.