iBOTS Learn: High-Performance Python for Scientists

High-Performance Python for Scientists

How to identify performance bottlenecks, optimize numerical workflows, streamline data handling, and extend analyses beyond the limits of a single machine

Author

Dr. Nicholas Del Grosso

Modern neuroscience produces large, complex datasets that place heavy demands on analysis code. This three-day workshop will focus on practical ways to make Python analyses more efficient and scalable. Participants will learn how to identify performance bottlenecks, optimize numerical workflows, streamline data handling, and extend analyses beyond the limits of a single machine.

The course emphasizes approaches that are both rigorous and usable in day-to-day research, with the aim of helping you write code that can keep pace with the scale of today’s neuroscience.

Credits

Dr. Nicholas Del Grosso

Installation

To run the course materials on your own machine:

Install VSCode as your editor
Install pixi or alternatively conda to create virtual Python environments (see the lessons on environment and package management)
Download the materials for a lesson using the "Download Materials" button
Extract the zip file and open the notebook in VSCode
In VSCode, open a new terminal and install the environment:

pixi install

conda env create -f environment.yml
conda activate performance_python

Course Contents

Measuring and Profiling Performance in Python Code

Measuring and Interpreting Time in Python

This notebook focuses on building an intuition for careful time measurement rather than blindly trusting a single timing result.

Micro-Benchmarking: Measuring How Code Scales

This notebook focuses on measuring and visualizing how performance changes as data grows, helping us gain a stronger understaanding of our code.

Profiling Code: What lines, and what functions are taking the most time?

This notebook introduces a disciplined workflow for improving runtime without breaking correctness: first lock in behavior, then measure where time is actually spent, and only then optimize.

Effective Memory Management

Understanding and Controlling Memory Usage in Numpy

In this session, we explore how Numpy uses memory, calculating how much space our data really takes, examine how arrays are created, and investigate when memory is copied, reused, or temporarily expanded.

Data Representation and Disk IO: Performance Beyond RAM

In this session, we measure what happens when we write arrays to disk, compare text and binary formats, and explore how data types determine both memory usage and file size.

Structured Scientific Data with HDF5: Design, Access, and Compression

In this notebook, we use h5py to explore how HDF5, one of the most widely used scientific data formats, works at a practical level to reduce memory pressure using the h5py library.

Compiling Code for High-Performance Computing

Let the Compiler Help Us Compute: Practical Compiled Code for Neuroscience

In this notebook, we'll look at two powerful tools for compiling processing-heavy code in order to do more computational work outside of Python's runtime: Numexpr and Numba

Working with the GPU in Python: Practical CuPy

In this notebook, we learn how to use **CuPy**, a NumPy-compatible GPU array library, to do operations on the GPU, avoiding common performance traps along the way.