Practical 4: Installing and using Conda with Miniforge, and writing some code
Activity Overview
Duration: 50 minutes
Goal: Get set up with Conda on your virtual machine and create a Conda environment to run Python scripts. Write one function, and create a docstring for this function.
Step 1: Installing Conda via Miniforge
You can follow the instructions here:
wget -O Miniforge3.sh "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3.sh -b -p "${HOME}/conda"
source "${HOME}/conda/etc/profile.d/conda.sh"
Now try a git status
: we don’t want the Conda installer saved in our repo, so let’s create a .gitignore
file (if we weren’t using this cloud machine we would put this installer somewhere else):
code .gitignore
Add the following line to the .gitignore
:
*.sh
- This will prevent any files ending in
.sh
(essentially the equivalent of a Windows.exe
) from being tracked with version control - Try your local git cycle again to add this change to the repository
Step 2: Install the Python extension for VSCode
This isn’t essential, but makes working with Python a bit easier, with syntax highlighting. You can find the extensions menu in the left sidebar.
Step 3: Creating a development environment
Next, we need to add libraries to our conda environment:
code environment.yml
You will see the following template that we generated back during the folder organisation step:
name: my-env-name
channels:
- conda-forge
- nodefaults
dependencies:
- python=3.13
- pytest
- blackd
- isort
# remove/modify these as needed
- numpy
- pandas
# keep this to install your Python package locally
- pip
- pip:
- --editable .
- We need to modify this to work for our project.
- Add any additional dependencies you might need: don’t worry, we can edit this later.
pip: --editable .
mean?
The final line in the conda environment file above means to install the Python package in the local directory (.
means this directory) as an editable package. This means your code will be available in this environment, but also you can continue to edit your code.
The __init__.py
file inside the package, along with the pyproject.toml
file we created during the folder set up, make this possible.
Why bother with installing your Python package rather than just importing scripts?
- Installing your package decouples it from the file structure:
- If you have scripts in various different folders in the overall project folder, it can become difficult to keep track of relative filepaths in order to import things; this keeps your scripts much tidier.
- If you move a file that relies on filepaths to your Python modules, suddenly things can break; this can become especially annoying when trying to migrate your work to a HPC system.
- It makes it essentially no extra work to allow others (or yourself) to install it from GitHub, making your code much more reusable.
pyproject.toml
file
During the folder structure organisation stage, we created a pyproject.toml
file. This provides the information pip needs to install our Python package:
[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "amazing_project"
version = "0.1.0"
description = "Brief description"
authors = [{name = "Your Name"}]
requires-python = ">=3.13"
[tool.setuptools.packages.find]
where = ["src"]
For now, we’re not going to touch this; we just need to know it’s there.
Now we just need to create the Conda environment from the file:
conda env create -f environment.yml
Step 4: Activate and test the environment
After the environment has been created, you’ll need to activate it:
conda activate <your-env-name>
replacing <your-env-name>
with the name you provided in the environment.yml
file.
You may need to launch a new terminal. Test that the environment works by launching python and trying to import one of the installed libraries:
python
and then
import numpy as np
When you install your package locally, Conda will build the required files: you’ll see something weird in your src/
folder called something like <your-package-name>.egg-info
. You can add a line to your .gitignore
file to keep this binary out of your version control:
*.egg-info
You can also keep any Python cache files out with this line:
src/<your-package-name>/__pycache__/*
Step 5: Update the environment
Next, we’re going to add an additional library to the environment to see hwo that works:
name: my-env-name
channels:
- conda-forge
- nodefaults
dependencies:
- python=3.13
- pytest
- blackd
- isort
# remove/modify these as needed
- numpy
- pandas
# keep this to install your Python package locally
- pip
- pip:
- marimo # add this line
- --editable .
This is a notebook tool similar to Jupyter that we will use for some quick prototyping. You could also install Jupyter with Jupytext (to improve version controlling of your notebooks) if you prefer that interface.
To update the environment with this new library, we run:
conda env update --file environment.yaml --prune
Step 6: Set VSCode to recognise your environment
Use the VSCode shortcut Ctrl Shift P
to open the command prompt. Start typing select interpreter and it should autosuggest the “Python: Select Interpreter” option. Choose your Conda environment from the list. Now you’ll be able to run Python files from the VSCode interface using the correct environment.
Step 7: Start turning pseudocode into code
Turn your comments on step-by-step processes into function stubs:
# function to process data
def process_data():
# TODO: Implement data loading
pass
Then, start adding your inputs and outputs:
# function to process data
def process_data(filename):
# TODO: Implement data loading
return dataframe
You can flesh out your pseudocode as you work:
# function to process data
def process_data(filename):
# TODO: Implement data loading (csv file)
# TODO: Create dataframe
# TODO: Check for any invalid data
# TODO: Replace NaN values with zero
# TODO: Check format of headers: replace " " with "_"
return dataframe
Replace pseudocode with code:
import pandas as pd
# function to process data
def process_data(filename):
= pd.read_csv(filename, sep=',')
df
# TODO: Check for any invalid data
# TODO: Replace NaN values with zero
# TODO: Check format of headers: replace " " with "_"
return dataframe
And continue until you have a full function.
Step 8: Write DocStrings for your code
Like everything in Python, there are multiple different ways of doing things, and multiple different formats that documentation can be written in. We will discuss your README.md in more detail later, but for now we will focus on function and module documentation.
What is a docstring?
A docstring is the very first piece of text inside a module or function (or class, or method) that acts as documentation for the following code.
1. One-liners
For very simple modules or functions, the docstring might only be a single line:
def very_basic_function():
"""Here's a single line docstring"""
pass
The single line should succinctly explain what the function does, and be wrapped in triple quotes ("""
)
2. Multi-liners
Many modules or functions will be more complex and need more detailed docstrings.
def a_more_complicated_function(input_1=0.0, input_2="string"):
"""A single line summery followed by a blank line
Keyword arguments:
input_1 -- explanation (default 0.0)
input_1 -- explanation (default "string")
"""
pass
For more information, please read the official docstring guidance.
3. Style and formatting guides
There are a number of different ways to organise multi-line comments; one very popular way is the Google Style Guide.
They provide the following example of a module-level docstring (so at the top of a .py
file):
"""A one-line summary of the module or program, terminated by a period.
Leave one blank line. The rest of this docstring should contain an
overall description of the module or program. Optionally, it may also
contain a brief description of exported classes and functions and/or usage
examples.
Typical usage example:
foo = ClassFoo()
bar = foo.function_bar()
"""
and this example of a complex multi-line docstring for a function:
def fetch_smalltable_rows(
table_handle: smalltable.Table,bytes | str],
keys: Sequence[bool = False,
require_all_keys: -> Mapping[bytes, tuple[str, ...]]:
) """Fetches rows from a Smalltable.
Retrieves rows pertaining to the given keys from the Table instance
represented by table_handle. String keys will be UTF-8 encoded.
Args:
table_handle: An open smalltable.Table instance.
keys: A sequence of strings representing the key of each table
row to fetch. String keys will be UTF-8 encoded.
require_all_keys: If True only rows with values set for all keys will be
returned.
Returns:
A dict mapping keys to the corresponding table row data
fetched. Each row is represented as a tuple of strings. For
example:
{b'Serak': ('Rigel VII', 'Preparer'),
b'Zim': ('Irk', 'Invader'),
b'Lrrr': ('Omicron Persei 8', 'Emperor')}
Returned keys are always bytes. If a key from the keys argument is
missing from the dictionary, then that row was not found in the
table (and require_all_keys must have been False).
"""
4. Using a plugin to help speed up docstring creation
Extension like autoDocstring help you to write better and more consistent documentation by providing you with a preformatted template to fill in, that by default follows the Google DocString guide. You can read the autoDocstring guidance here.