Practical 4: Installing and using Conda with Miniforge, and writing some code

Activity Overview

Duration: 50 minutes
Goal: Get set up with Conda on your virtual machine and create a Conda environment to run Python scripts. Write one function, and create a docstring for this function.

Step 1: Installing Conda via Miniforge

You can follow the instructions here:

wget -O Miniforge3.sh "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"

bash Miniforge3.sh -b -p "${HOME}/conda"

source "${HOME}/conda/etc/profile.d/conda.sh"

Now try a git status: we don’t want the Conda installer saved in our repo, so let’s create a .gitignore file (if we weren’t using this cloud machine we would put this installer somewhere else):

code .gitignore

Add the following line to the .gitignore:

*.sh

This will prevent any files ending in .sh (essentially the equivalent of a Windows .exe) from being tracked with version control
Try your local git cycle again to add this change to the repository

Step 2: Install the Python extension for VSCode

This isn’t essential, but makes working with Python a bit easier, with syntax highlighting. You can find the extensions menu in the left sidebar.

Step 3: Creating a development environment

Next, we need to add libraries to our conda environment:

code environment.yml

You will see the following template that we generated back during the folder organisation step:

name: my-env-name

channels:
  - conda-forge
  - nodefaults

dependencies:
  - python=3.13
  - pytest
  - blackd
  - isort

  # remove/modify these as needed
  - numpy
  - pandas

  # keep this to install your Python package locally
  - pip
  - pip:
    - --editable .

We need to modify this to work for our project.
Add any additional dependencies you might need: don’t worry, we can edit this later.

What does pip: --editable . mean?

The final line in the conda environment file above means to install the Python package in the local directory (. means this directory) as an editable package. This means your code will be available in this environment, but also you can continue to edit your code.

The __init__.py file inside the package, along with the pyproject.toml file we created during the folder set up, make this possible.

Why install your local Python package?

Why bother with installing your Python package rather than just importing scripts?

Installing your package decouples it from the file structure:
- If you have scripts in various different folders in the overall project folder, it can become difficult to keep track of relative filepaths in order to import things; this keeps your scripts much tidier.
- If you move a file that relies on filepaths to your Python modules, suddenly things can break; this can become especially annoying when trying to migrate your work to a HPC system.
It makes it essentially no extra work to allow others (or yourself) to install it from GitHub, making your code much more reusable.

The pyproject.toml file

During the folder structure organisation stage, we created a pyproject.toml file. This provides the information pip needs to install our Python package:

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "amazing_project"
version = "0.1.0"
description = "Brief description"
authors = [{name = "Your Name"}]
requires-python = ">=3.13"

[tool.setuptools.packages.find]
where = ["src"]

For now, we’re not going to touch this; we just need to know it’s there.

Now we just need to create the Conda environment from the file:

conda env create -f environment.yml

Step 4: Activate and test the environment

After the environment has been created, you’ll need to activate it:

conda activate <your-env-name>

replacing <your-env-name> with the name you provided in the environment.yml file.

You may need to launch a new terminal. Test that the environment works by launching python and trying to import one of the installed libraries:

python

and then

import numpy as np

When you install your package locally, Conda will build the required files: you’ll see something weird in your src/ folder called something like <your-package-name>.egg-info. You can add a line to your .gitignore file to keep this binary out of your version control:

*.egg-info

You can also keep any Python cache files out with this line:

src/<your-package-name>/__pycache__/*

Step 5: Update the environment

Next, we’re going to add an additional library to the environment to see hwo that works:

name: my-env-name

channels:
  - conda-forge
  - nodefaults

dependencies:
  - python=3.13
  - pytest
  - blackd
  - isort

  # remove/modify these as needed
  - numpy
  - pandas

  # keep this to install your Python package locally
  - pip
  - pip:
    - marimo # add this line
    - --editable .

This is a notebook tool similar to Jupyter that we will use for some quick prototyping. You could also install Jupyter with Jupytext (to improve version controlling of your notebooks) if you prefer that interface.

To update the environment with this new library, we run:

conda env update --file environment.yaml --prune

Step 6: Set VSCode to recognise your environment

Use the VSCode shortcut Ctrl Shift P to open the command prompt. Start typing select interpreter and it should autosuggest the “Python: Select Interpreter” option. Choose your Conda environment from the list. Now you’ll be able to run Python files from the VSCode interface using the correct environment.

Step 7: Start turning pseudocode into code

Turn your comments on step-by-step processes into function stubs:

# function to process data
def process_data():
    # TODO: Implement data loading
    pass

Then, start adding your inputs and outputs:

# function to process data
def process_data(filename):
    # TODO: Implement data loading
    return dataframe

You can flesh out your pseudocode as you work:

# function to process data
def process_data(filename):
    # TODO: Implement data loading (csv file)
    # TODO: Create dataframe
    # TODO: Check for any invalid data
    # TODO: Replace NaN values with zero
    # TODO: Check format of headers: replace " " with "_"
    return dataframe

Replace pseudocode with code:

import pandas as pd

# function to process data
def process_data(filename):
    df = pd.read_csv(filename, sep=',')

    # TODO: Check for any invalid data
    # TODO: Replace NaN values with zero
    # TODO: Check format of headers: replace " " with "_"
    return dataframe

And continue until you have a full function.

Step 8: Write DocStrings for your code

Like everything in Python, there are multiple different ways of doing things, and multiple different formats that documentation can be written in. We will discuss your README.md in more detail later, but for now we will focus on function and module documentation.

What is a docstring?

A docstring is the very first piece of text inside a module or function (or class, or method) that acts as documentation for the following code.

1. One-liners

For very simple modules or functions, the docstring might only be a single line:

def very_basic_function():
    """Here's a single line docstring"""
    pass

The single line should succinctly explain what the function does, and be wrapped in triple quotes (""")

2. Multi-liners

Many modules or functions will be more complex and need more detailed docstrings.

def a_more_complicated_function(input_1=0.0, input_2="string"):
    """A single line summery followed by a blank line

    Keyword arguments:
    input_1 -- explanation (default 0.0)
    input_1 -- explanation (default "string")
    """
    pass

For more information, please read the official docstring guidance.

3. Style and formatting guides

There are a number of different ways to organise multi-line comments; one very popular way is the Google Style Guide.

They provide the following example of a module-level docstring (so at the top of a .py file):

"""A one-line summary of the module or program, terminated by a period.

Leave one blank line.  The rest of this docstring should contain an
overall description of the module or program.  Optionally, it may also
contain a brief description of exported classes and functions and/or usage
examples.

Typical usage example:

  foo = ClassFoo()
  bar = foo.function_bar()
"""

and this example of a complex multi-line docstring for a function:

def fetch_smalltable_rows(
    table_handle: smalltable.Table,
    keys: Sequence[bytes | str],
    require_all_keys: bool = False,
) -> Mapping[bytes, tuple[str, ...]]:
    """Fetches rows from a Smalltable.

    Retrieves rows pertaining to the given keys from the Table instance
    represented by table_handle.  String keys will be UTF-8 encoded.

    Args:
        table_handle: An open smalltable.Table instance.
        keys: A sequence of strings representing the key of each table
          row to fetch.  String keys will be UTF-8 encoded.
        require_all_keys: If True only rows with values set for all keys will be
          returned.

    Returns:
        A dict mapping keys to the corresponding table row data
        fetched. Each row is represented as a tuple of strings. For
        example:

        {b'Serak': ('Rigel VII', 'Preparer'),
         b'Zim': ('Irk', 'Invader'),
         b'Lrrr': ('Omicron Persei 8', 'Emperor')}

        Returned keys are always bytes.  If a key from the keys argument is
        missing from the dictionary, then that row was not found in the
        table (and require_all_keys must have been False).
    """

4. Using a plugin to help speed up docstring creation

Extension like autoDocstring help you to write better and more consistent documentation by providing you with a preformatted template to fill in, that by default follows the Google DocString guide. You can read the autoDocstring guidance here.