Software Delivery for Scientific Python Projects
Testing, data validation, workflow automation, and Linux packaging practices for making scientific Python projects reliable, reproducible, and shareable.
Author
This course focuses on the practical work that turns research code into software other people can run, trust, and extend. Learners start by testing Python code with pytest and organizing project folders, then add validation boundaries for scientific data, build reproducible workflows with Snakemake, and finish by packaging projects for Linux using Lima, Apptainer, container registries, and GitHub Actions.
The course is designed around hands-on notebook exercises and project-structure examples. By the end, learners should be able to write tests, validate inputs, connect analysis steps into workflows, and prepare computational projects for reproducible execution beyond a single development machine.
Credits
Installation
To run the course materials on your own machine:
- Install VSCode as your editor
- Install pixi or alternatively conda to create virtual Python environments (see the lessons on environment and package management)
- Download the materials for a lesson using the "Download Materials" button
- Extract the zip file and open the notebook in VSCode
- In VSCode, open a new terminal and install the environment:
pixi installconda env create -f environment.yml
conda activate software-deliveryTo run the Python-focused lessons, create the course environment from pixi.toml and install the notebook kernel:
pixi install
pixi run install-kernelThe Linux packaging unit also uses platform tools such as Homebrew, Lima, Apptainer, GHCR, and GitHub Actions. Those tools are installed or accessed during the relevant lessons rather than managed entirely by Pixi.
Course Contents
Testing Code
Unit Testing with pytest
Practice writing, fixing, parameterizing, and organizing automated tests with pytest.
Python Project Structure
Organize computational science projects so data, code, environments, documentation, and collaboration files are easy to find and maintain.
Validating Data
Data Validation Patterns
Practice guard clauses, dataclass hooks, and Pydantic validators that fail fast on invalid scientific inputs.
Run-Time Data Validation Frameworks in Python
Compare beartype, attrs, and Pydantic for enforcing Python type and value constraints at runtime.
Specialized Data Validation Frameworks
Apply validation ideas to DataFrames, LLM inputs, JSON messages, and command-line interfaces.
Integrating Workflows with Snakemake
Static Workflows with Snakemake
Build Snakemake rules that run code, declare outputs, and connect inputs into simple reproducible workflows.
Generalizing Workflows with Wildcards
Use wildcards, expand(), and glob_wildcards() to scale Snakemake workflows across repeated filenames and connected rules.