High-Performance Python for Scientists
How to identify performance bottlenecks, optimize numerical workflows, streamline data handling, and extend analyses beyond the limits of a single machine
Author
Modern neuroscience produces large, complex datasets that place heavy demands on analysis code. This three-day workshop will focus on practical ways to make Python analyses more efficient and scalable. Participants will learn how to identify performance bottlenecks, optimize numerical workflows, streamline data handling, and extend analyses beyond the limits of a single machine.
The course emphasizes approaches that are both rigorous and usable in day-to-day research, with the aim of helping you write code that can keep pace with the scale of today’s neuroscience.
Credits
Installation
To run the course materials on your own machine:
- Install VSCode as your editor
- Install pixi or alternatively conda to create virtual Python environments (see the lessons on environment and package management)
- Download the materials for a lesson using the "Download Materials" button
- Extract the zip file and open the notebook in VSCode
- In VSCode, open a new terminal and install the environment:
pixi installconda env create -f environment.yml
conda activate performance_pythonCourse Contents
Measuring and Profiling Performance in Python Code
Measuring and Interpreting Time in Python
This notebook focuses on building an intuition for careful time measurement rather than blindly trusting a single timing result.
Micro-Benchmarking: Measuring How Code Scales
This notebook focuses on measuring and visualizing how performance changes as data grows, helping us gain a stronger understaanding of our code.
Profiling Code: What lines, and what functions are taking the most time?
This notebook introduces a disciplined workflow for improving runtime without breaking correctness: first lock in behavior, then measure where time is actually spent, and only then optimize.
Effective Memory Management
Understanding and Controlling Memory Usage in Numpy
In this session, we explore how Numpy uses memory, calculating how much space our data really takes, examine how arrays are created, and investigate when memory is copied, reused, or temporarily expanded.
Data Representation and Disk IO: Performance Beyond RAM
In this session, we measure what happens when we write arrays to disk, compare text and binary formats, and explore how data types determine both memory usage and file size.
Structured Scientific Data with HDF5: Design, Access, and Compression
In this notebook, we use h5py to explore how HDF5, one of the most widely used scientific data formats, works at a practical level to reduce memory pressure using the h5py library.
Compiling Code for High-Performance Computing
Let the Compiler Help Us Compute: Practical Compiled Code for Neuroscience
In this notebook, we'll look at two powerful tools for compiling processing-heavy code in order to do more computational work outside of Python's runtime: Numexpr and Numba
Working with the GPU in Python: Practical CuPy
In this notebook, we learn how to use **CuPy**, a NumPy-compatible GPU array library, to do operations on the GPU, avoiding common performance traps along the way.