arctraining slides

SWD3

Software development practices for Research

Research Computing Team and Service

Here to support research(ers)
- Provide training
- Support users of Grid and Cloud Computing platforms
- Provide consultancy
  - To develop project proposals
  - To help recruit people with specialist skills
  - Working directly on research projects
For details please see our Website
Contact us via the IT Service Desk

Useful Links

Software Development Life Cycle (SDLC)

What are we going to do?

Brainstorming
Research

How are we going to do it?

Some topics to help define requirements include:

final goal
project scope (how to reach the final goal)
what is feasible (and how)
what is priority
what resources are available
deadlines
potential risks

Warning: Each person involved in the project may have a different need.

What is the software architecture?

When designing software, the object-oriented approach is a common programming paradigm.

Object-oriented components:

Classes: A user-defined type
Object instances: A particular object instantiated from a class.
Methods: A function which is “built in” to a class
Constructor: A special method called when instantiating a new object

Some principles: abstraction, encapsulation, decomposition, generalisation

See more:

Is this where the fun begins?

Take your time

Development is usually the most time consuming step in a Software Development Life Cycle.

Is this software good?

In this step, errors and failures are identified by exposing the code to an environment similar to the end-user experience.

There are several types of testing, some examples include:

Unit testing: are all components working?
Integration testing: are all components working when fitted together?
Performance testing: how does the software perform against different workloads? It is fast? Stable?
Functional testing: is the software aligned with Software Requirement Specification?

Can other people use my code?

You can use platforms like GitHub to release your software.

The functionality of the software is linked to several specifications related to the operating system and versions of packages and other software related to the project.
Listing these specifications will help others to replicate the environment in which the software was developed.

Is it over?

We can classify maintenance into a few categories:

Corrective: fix reported errors/failures.
Preventive: regular checks and fixes.
Perfective: optimize implemented features, adding new features.
Adaptive: keep the software updated according to changes external to the project (new programming language version, new regulation, etc.).

Basic Structure Suggestion

# The most basic structure for a code project should look like:
my-model
├── README.md
├── requirements.txt
├── src                <- Source code for this project
└── tests              <- Test code for this project

Readme
Requirements

Is a guide that gives users a detailed description of a project you have worked on
It is the first file a person will see when they encounter your project, so it should be fairly brief but detailed.
See how to write a good README file in this freecodecamp post.

Text information about all the necessary additional libraries, modules, and packages.
This can be replaced by files like: environment.yml, pyproject.toml, setup.py.

Advanced Project Structure

Template based on mkrapp/cookiecutter-reproducible-science github

.
├── AUTHORS.md
├── LICENSE
├── README.md
├── bin                <- Your compiled model code can be stored here (not tracked by git)
├── config             <- Configuration files, e.g., for doxygen or for your model if needed
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
├── docs               <- Documentation, e.g., doxygen or scientific papers (not tracked by git)
├── notebooks          <- Ipython or R notebooks
├── reports            <- For a manuscript source, e.g., LaTeX, Markdown, etc., or any project reports
│   └── figures        <- Figures for the manuscript or reports
├── src                <- Source code for this project
│   ├── data           <- scripts and programs to process data
│   ├── external       <- Any external source code, e.g., pull other git projects, or external libraries
│   ├── models         <- Source code for your own model
│   ├── tools          <- Any helper scripts go here
│   └── visualization  <- Scripts for visualisation of your results, e.g., matplotlib, ggplot2 related.
└── tests              <- Test code for this project

Virtual Environments

If application A needs version 1.0 of a particular module but application B needs version 2.0, then the requirements are in conflict and installing either version 1.0 or 2.0 will leave one application unable to run.

The solution for this problem is to create a virtual environment, a self-contained directory tree that contains installation for particular versions of software/packages.

Conda

Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux.
It offers dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, Fortran, and more.
Easy user install via Anaconda.

Code formatting

# myscript.py:
x = {  'a':37,'b':42,
'c':927}
y = 'hello '+       'world'
class foo  (     object  ):
   def f    (self   ):
       return       y **2
   def g(self, x :int,
       y : int=42
       ) -> int:
       return x--y
def f  (   a ) :
   return      37+-a[42-a :  y*3]

Coding conventions

If your language or project has a standard policy, use that. For example:

Python: PEP8
R: Google’s guide for R, tidyverse style guide
C++: Google’s style guide
Julia: Official style guide

Linters

Linters are automated tools which enforce coding conventions and check for common mistakes. For example:

Python:
- flake8 (flags any syntax/style errors)
- black (enforces the style)
- isort (“Sorts” imports alphabetically in groups)

Example: Flake8 Linter

$ conda install flake8
$ flake8 myscript.py
myscript.py:2:6: E201 whitespace after '{'
myscript.py:2:11: E231 missing whitespace after ':'
myscript.py:2:14: E231 missing whitespace after ','
myscript.py:2:18: E231 missing whitespace after ':'
myscript.py:3:1: E128 continuation line under-indented for visual indent
myscript.py:3:4: E231 missing whitespace after ':'
myscript.py:4:13: E225 missing whitespace around operator
myscript.py:4:14: E222 multiple spaces after operator
myscript.py:5:1: E302 expected 2 blank lines, found 0
myscript.py:5:13: E201 whitespace after '('
myscript.py:5:25: E202 whitespace before ')'
myscript.py:6:4: E111 indentation is not a multiple of 4
myscript.py:6:9: E211 whitespace before '('
myscript.py:6:20: E202 whitespace before ')'
myscript.py:7:8: E111 indentation is not a multiple of 4
myscript.py:7:14: E271 multiple spaces after keyword
myscript.py:7:25: E225 missing whitespace around operator
myscript.py:8:4: E301 expected 1 blank line, found 0
myscript.py:8:4: E111 indentation is not a multiple of 4
myscript.py:8:17: E203 whitespace before ':'
myscript.py:8:18: E231 missing whitespace after ':'
myscript.py:9:8: E128 continuation line under-indented for visual indent
myscript.py:9:9: E203 whitespace before ':'
myscript.py:9:15: E252 missing whitespace around parameter equals
myscript.py:9:16: E252 missing whitespace around parameter equals
myscript.py:10:8: E124 closing bracket does not match visual indentation
myscript.py:10:8: E125 continuation line with same indent as next logical line
myscript.py:11:8: E111 indentation is not a multiple of 4
myscript.py:12:1: E302 expected 2 blank lines, found 0
myscript.py:12:6: E211 whitespace before '('
myscript.py:12:9: E201 whitespace after '('
myscript.py:12:13: E202 whitespace before ')'
myscript.py:12:15: E203 whitespace before ':'
myscript.py:13:4: E111 indentation is not a multiple of 4
myscript.py:13:10: E271 multiple spaces after keyword
myscript.py:13:26: E203 whitespace before ':'
myscript.py:13:34: W291 trailing whitespace

Example: Black Code Formatter

Install and run Black

$ conda install black
$ black myscript.py

Check the file!

# myscript.py:
x = {"a": 37, "b": 42, "c": 927}
y = "hello " + "world"


class foo(object):
    def f(self):
        return y**2

    def g(self, x: int, y: int = 42) -> int:
        return x - -y


def f(a):
    return 37 + -a[42 - a : y * 3]

IDE

Using an Integrated development environment (IDE) will certainly save you time, but the advantages of using an IDE go beyond that. Below are some IDE advantages

Syntax highlighting
Text autocompletion
Refactoring options
Easily Importing libraries
Build, compile, or run

Visual Studio Code

To install VS Code follow the instructions here.

VSC Example: automatically using black

Configure VSC to use Black: Code (or File) > Preferences > Settings

Search for python formatting provider and choose black
Search for format on save and check the box to enable

Select interpreter: View > Command Palette.. (or Ctrl+Shift+P)

Search for Python: Select Interpreter
Choose the correct environment

Now the Black package is going to fix your codes layout every time you save a code file.

Version Control

Piled Higher and Deeper by Jorge Cham

Test-driven development

Example, suppose we need to find the result of a number divided by another number:

Naive solution
TDD solution

Write a function a_div_b.
Call it interactively on two or three different inputs.
If it produces the wrong answer, fix the function and re-run that test.

This clearly works — after all, thousands of scientists are doing it right now — but there’s a better way

Write a short function for each test.
Write a a_div_b function that should pass those tests.
If a_div_b produces any wrong answers, fix it and re-run the test functions.

Writing the tests before writing the function they exercise is called test-driven development (TDD). Its advocates believe it produces better code faster because:

If people write tests after writing the thing to be tested, they are subject to confirmation bias, i.e., they subconsciously write tests to show that their code is correct, rather than to find errors.
Writing tests helps programmers figure out what the function is actually supposed to do.

Possible tests: `a_div_b` example

Let’s think in all possible scenarios for this problem and how we could test them.

Bigger by smaller
Smaller by bigger
Negative numbers

Using 4 and 2, the answer should be 2.

assert a_div_b(4, 2) == 2

Or… the answer should be larger than 1.

assert a_div_b(8, 7) > 1

Using 2 and 4, the answer should be 0.5.

assert a_div_b(2, 4) == 0.5

Or… the answer should be smaller than 1.

assert a_div_b(7, 8) < 1

Using -4 and -2, the answer should be 2.

assert a_div_b(-4, -2) == 2

Or… the answer should be positive.

assert a_div_b(-4, -2) > 0

Bringing it all together

The Hypotenuse Problem

Calculating the hypotenuse

\[ c = \sqrt{a^2 + b^2} \]

General Design

1 squared function
1 sum function
1 square root function
1 hypotenuse function that uses the other functions

Workflow

Install Git, Anaconda, VScode
Create a GitHub repository + Licence + .gitignore + Readme
Setup GH Action for testing (Python Application)
Clone GH repository in local machine
Create project structure (source and test folders)
Setup tests (start with test_)
Develop code
Add docstring (you can use autoDocstring - Python Docstring Generator on VS Code)
Lint code and tests
Push to github
EXTRA: Create Sphinx documentation
EXTRA: Setup file and local install
EXTRA: GH Release

Extra: Sphinx documentation

Create docstring for every function
Install sphinx
Start the basic structure using: $ sphinx-quickstart docs
Use the apidoc to get docstrings: $ sphinx-apidoc -o docs .
Edit files:

conf.py
index.rst
dependencies.rst
usage.rst
functions.rst

add extentions: 'sphinx.ext.todo', 'sphinx.ext.viewcode', 'sphinx.ext.autodoc'.
change theme: sphinx_rtd_theme
add the src (change the folder name as necessary!) folder as path:

 import os
 import sys
 sys.path.insert(0, os.path.abspath('../src'))

Add extra files after Contents

.. toctree::
   :maxdepth: 2
   :caption: Contents:

   dependencies
   usage
   functions

List all your dependencies:

Dependencies
============

- python
- pytest
- flake8
- black
- sphinx

Explain how to use your software

Usage Guide
============

To start working with this repository you need to clone it onto your local
machine: ::

    $ git clone https://github.com/...


Next ...

Create a function file with the following:

API reference
=============

.. automodule:: calc
   :members:
   :undoc-members:
   :show-inheritance:

Extra: documentation Action

Create a new GH action to create a nice website for your documentation.

The action is available here
You may need update GH Actions permissions to allow write
After a successful documentation action, you need to select gh-pages branch to activate your website

Extra: Setup file

Create a setup.py file like:

import setuptools

with open("README.md", "r") as fh:
    long_description = fh.read()

setuptools.setup(
    name="hypot",
    version="0.1.0",
    author="Patricia Ternes",
    author_email="p.ternesdallagnollo@leeds.ac.uk",
    description="The hypot SWD3 demo package",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3.9",
        "Intended Audience :: Science/Research/Learning",
    ],
    python_requires=">=3.9",
)

Local Installation

Install: install the hypot package into the environment using:

$ python setup.py install

Usage: if you want to create a personalised script, you can import the hypot modules as follows:

from hypot.calc import squared, addition, sqroot

Remove: If you want to remove your package, use pip:

$ pip uninstall hypot

Release

Release in GitHub are based in tags with the following structure:

v0.5.2

Change	Release	Example
Major	Breaking	0
Minor	Feature	5
Patch	Fix	2