Developing Reusable Code in Jupyter Notebooks

Courses

Building Modular and Automated Analysis Workflows

Authors

Dr. Sangeetha Nandakumar | Dr. Nicholas Del Grosso

Download Materials

In research projects, the analytical process often involves multiple, interconnected steps, such as data cleaning, model building, and visualization, where each step depends on the previous one. Managing these dependencies efficiently is crucial for maintaining organized and reproducible research. To streamline such workflows, developing reusable code in Jupyter notebooks is essential. By breaking down complex processes into reusable components—such as functions, configuration settings, or entire notebooks—researchers can ensure consistency, reduce redundancy, and focus on the analysis itself rather than rewriting code for each step. In this notebook, we will explore key techniques for developing reusable code, including the use of magic commands to manage notebook execution, running and sequencing notebooks from one another, and importing code from one notebook to another to treat them like modular scripts.

We start by looking at magic commands to manage notebooks, how to use notebook’s interactive nature to build functions, how to sequence of notebooks from another notebook, how to import contents of notebook like it was a script.

Setup

Download Data

import os
import owncloud

# Ensure the 'data' directory exists
if not os.path.exists('run_section'):
    print('Creating directory for data')
    os.mkdir('run_section')

# Download hello.py
if not os.path.exists('run_section/hello.py'):
    oc = owncloud.Client.from_public_link('https://uni-bonn.sciebo.de/s/4PZ3gTgRWYnyPfP')
    oc.get_file('/', 'run_section/hello.py')

# Download hello.txt
if not os.path.exists('run_section/hello.txt'):
    oc = owncloud.Client.from_public_link('https://uni-bonn.sciebo.de/s/3Bg1TVfGkDXUVYg')
    oc.get_file('/', 'run_section/hello.txt')

# Download hello_nb.ipynb
if not os.path.exists('run_section/hello_nb.ipynb'):
    oc = owncloud.Client.from_public_link('https://uni-bonn.sciebo.de/s/PUkIdpQSEnyzIsT')
    oc.get_file('/', 'run_section/hello_nb.ipynb')

Creating directory for data

Section 1: Line Magic Commands

Magic commands in Jupyter notebooks are special commands that provide a convenient way to perform certain tasks and control the behavior of the Jupyter environment. They start with a %. These commands are designed to facilitate tasks such as running scripts, timing execution, and integrating with other environments or languages.

When we are developing an code as researchers, we are mainly concerned with how long an analysis runs and how to store and load results. In this section, let’s focus on some of Jupyter magic commands that let’s us do just that.

Code	Description
`%timeit sum(range(100))`	Measures the execution time of the `sum(range(100))` expression using multiple runs.
`%timeit -n 1000 sum(range(100))`	Measures the execution time of `sum(range(100))` with exactly 1000 iterations.
`%load magic_commands/text_config.txt`	Loads the contents of `text_config.txt` from the specified file into the current code cell.
`project_name = 'Mice Visual Cortex Analysis'`	Assigns the string `'Mice Visual Cortex Analysis'` to the variable `project_name`.
`%store project_name`	Saves the value of `project_name` for later use in another session or notebook.
`%store -r project_id`	Restores the previously stored variable `project_id` into the current session.
`project_id`	Outputs the restored value of `project_id`.

Exercises

Example: Measure how long it takes to sum numbers from 1 to 100

%timeit sum(range(100))

1.01 μs ± 11 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

You might be surprised that it took longer than to just run the code. That is because %timeit runs the line of code multiple times with a lot of iterations in each run to get a reliable estimation of the execution time. The output you see here will look something like this

651 ns ± 11.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Here’s what it means:

Number/Value	Explanation
`651 ns`	The mean (average) execution time of the code per loop. The unit here is nanoseconds (ns), meaning it took an average of 651 nanoseconds per loop.
`± 11.4 ns`	The standard deviation of the execution time. It indicates the variability in the execution time across different runs. In this case, it varies by 11.4 nanoseconds.
`7 runs`	The number of times `%timeit` ran the test. In this case, the test was executed 7 times to gather the statistics for the mean and standard deviation.
`1,000,000 loops each`	The number of iterations (loops) that were executed per run. Here, the code was repeated 1,000,000 times in each of the 7 runs to get a more accurate timing.

Exercise: Measure how long it takes to sum numbers from 1 to 1000

Solution

%timeit sum(range(1000))

17 μs ± 109 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Exercise: Measure how long it takes to sum numbers from 1 to 100000

Solution

%timeit sum(range(100000))

1.95 ms ± 10.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Do you notice that number of loops reduced as the code took longer to execute? This is because %timeit automatically adjusts the number of loops to ensure an accurate measurement based on the how quickly the code being tested runs.

Example: Measure how long it takes to sum numbers from 1 to 100 with 1000 loops per iteration

%timeit -n 1000 sum(range(100))

1.07 μs ± 210 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Exercise: Measure how long it takes to sum numbers from 1 to 100 with 10000 loops per iteration

Solution

%timeit -n 10000 sum(range(100))

999 ns ± 24.6 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Exercise: Measure how long it takes to sum numbers from 1 to 100 with 5 iterations. Hint: Use -r to set number of iterations

Solution

%timeit -r 5 sum(range(100))

993 ns ± 3.34 ns per loop (mean ± std. dev. of 5 runs, 1,000,000 loops each)

Exercise: Measure how long it takes to sum numbers from 1 to 100 with 5 iterations and 1000 loops per iteration

Solution

%timeit -r 5 -n 1000 sum(range(100))

1.19 μs ± 311 ns per loop (mean ± std. dev. of 5 runs, 1,000 loops each)

In the below exercises, let’s practice loading contents of another file.

Example: Load contents of magic_commands/text_config file

# %load magic_commands/text_config.txt
num_exp=10 # number of experiments
scientist="John Doe" # name of the scientist

As soon as you execute the cell with %load, the following happens

%load command itself is turned into a comment
Below the comment, the contents of the file gets loaded
You can edit it and then execute the line

Exercise: Load contents of magic_commands/python_config file

Solution

# %load magic_commands/python_config.py

It can also work with other notebooks. It loads the whole notebook.

Exercise: Load contents of magic_commands/notebook_config.ipynb

Solution

# %load magic_commands/notebook_config.ipynb

Example: Store “Mice Visual Cortex Analysis” as project_name so any notebook can access it

project_name = 'Mice Visual Cortex Analysis'
%store project_name

Stored 'project_name' (str)

Now this variable is stored on the disk in ~/.ipython. It will be available to all jupyter notebooks as long as the notebooks are run in same environment

Exercise: Store 123456 as project_id so any notebook can access it

Solution

project_id = 123456
%store project_id

Stored 'project_id' (int)

Exercise: Store “Genius Lab” as lab_name so any notebook can access it.

Solution

lab_name = "Genius lab"
%store lab_name

Stored 'lab_name' (str)

Example: In another notebook, retrieve project_name

%store -r project_id
project_id

Exercise: In another notebook, retrieve lab_name

Solution

%store -r lab_name
lab_name

'Genius lab'

Exercise: In another notebook, retrieve project_id

Solution

%store -r project_id
project_id

Section 2: Cell Magic Commands

Line magic commands run only on one line of code. However, when we are dealing with writing contents of a cell to a file, or timing a whole block of code, Jupyter provides Cell Magic Commands that start with %% which runs on whole cell.

Let’s look into some cell magic commands that run on entire cell.

Code	Description
`%%time`	Measures the time it takes to execute the entire cell (the code block).
`%%writefile experiment_info.txt`	Writes the content of the cell into a new text file named `experiment_info.txt`.
`%%writefile -a experiment_info.py`	Appends the content of the cell to the existing Python file `experiment_info.py`.
`%%capture output`	Captures the standard output and standard error of the cell into the variable `output` for later use.
`output.stdout, output.stderr`	Retrieves the captured standard output and error from the `output` variable.

Exercises

Example: Measure how long it takes to sum numbers upto 1000 with loop.

%%time
result = 0
for i in range(1000):
    result += i

CPU times: user 118 μs, sys: 1 μs, total: 119 μs
Wall time: 125 μs

Exercise: Measure how long it takes to sum numbers upto 10000000 with loop.

Solution

%%time
result = 0
for i in range(10000000):
    result += i

CPU times: user 1.28 s, sys: 4 ms, total: 1.28 s
Wall time: 1.28 s

You can also use it for single line of code as long as the code is below %%time

Exercise: Measure how long it takes to sum numbers upto 10000000 without loop.

Solution

%%time 
sum(range(10000000))

CPU times: user 228 ms, sys: 4 ms, total: 232 ms
Wall time: 230 ms

49999995000000

We might also want to write contents of a single cell into a file of its own. This can be useful when you write functions or have a list of variables that you want to store as a python script to access later on.

Example: Store experiment_name, num_mice, num_neuropixels in a file called experiment_info.txt

%%writefile experiment_info.txt
experiment_name = "Mice Visual Cortex"
num_mice = 25
num_neuropixels = 300

Writing experiment_info.txt

Exercise: Store experiment_name, num_mice, num_neuropixels in a file called experiment_info.py

Solution

%%writefile experiment_info.py
experiment_name = "Mice Visual Cortex"
num_mice = 25
num_neuropixels = 300

Writing experiment_info.py

Exercise: Add num_electrodes to experiment_info.py Hint: Use -a

Solution

%%writefile -a experiment_info.py
num_electrodes = 100

Appending to experiment_info.py

Sometimes the output can be too long and cluttering. We can deal with that by storing the output in a variable without displaying it on the screen

Example: print("Hello World") but do not display the out

%%capture output
print("Hello World")

output.stdout, output.stderr

('Hello World\n', '')

Here the display is captured in stdout and if there are any errors, they are captured in stderr

Exercise: print(“Hello”) and print(“World”) in two separate lines but do not display the output

Solution

%%capture output
print("Hello")
print("World")

output.stdout

'Hello\nWorld\n'

Exercise: 1+2 on one line and 5*3 on another. But no display

Solution

%%capture output
1+2
5*3

output.stdout, output.stderr

('', '')

The reason you’re seeing (’’, ‘’) for both output.stdout and output.stderr is that neither 1 + 2 nor 5 * 3 produces any standard output or error. These expressions are evaluated, but unless you explicitly use print() or raise an exception, there is no output to capture.

Section 3: Writing Functions Inside Jupyter Notebook

Writing functions in a Jupyter notebook provides a interactive and flexible environment for development and analysis. Notebooks allow for immediate feedback, enabling us to write, test, and modify functions incrementally by executing individual cells. The dynamic and iterative nature makes Jupyter notebooks an excellent tool for experimentation and fine-tuning functions in a user-friendly interface.

In this section, let us practice writing functions.

Exercises

Example: Write a function called add_two_nums which adds num1 and num2 and prints sum on screen.

def add_two_nums(num1, num2):
    print(num1, num2)
add_two_nums(3, 4)

3 4

Exercise: Write a function called add_three_nums which adds num1, num2 and num3 and prints sum on screen.

Solution

def add_three_nums(num1, num2, num3):
    print(num1+num2+num3)
add_three_nums(3, 4, 5)

Exercise: Write a function called subtract_two_nums which subtracts num1 and num2 and prints difference on screen.

Solution

def subtract_two_nums(num1, num2):
    print(num1-num2)
subtract_two_nums(3,4)

-1

Example: Write a function called add_two_nums which adds num1 and num2 and returns sum.

def add_two_nums(num1, num2):
    result = num1 + num2
    return result
result = add_two_nums(3, 4)
result

Exercise: Write a function called add_three_nums which adds num1, num2 and num3 and returns the sum.

Solution

def subtract_two_nums(num1, num2):
    result = num1-num2
    return result
result = subtract_two_nums(3,4)
result

-1

Exercise: Write a function called subtract_two_nums which subtracts num1 and num2 and returns the difference.

Solution

def add_three_nums(num1, num2, num3):
    result = num1+num2+num3
    return result
result = add_three_nums(3, 4, 5)
result

Sometimes, you might have to access functions from scripts into your notebooks. Unlike notebooks, scripts do not have the markdown cells to add explanation or logic. Instead, we can make use of docstrings to explain our function. They reside within the function and give a brief explanation of the purpose of the function.

Example: Add a docstring to add_two_nums

def add_two_nums(num1, num2):
    '''
    adds num1 and num2
    '''
    result = num1 + num2
    return result
result = add_two_nums(3, 4)
result

Exercise: Add a docstring to subtract_two_nums

Solution

def subtract_two_nums(num1, num2):
    '''
    subtracts num1 and num2
    Equation: num1 - num2
    '''
    result = num1-num2
    return result
result = subtract_two_nums(3,4)
result

-1

Exercise: Add a doctring to add_three_nums

Solution

def add_three_nums(num1, num2, num3):
    '''
    Adds num1, num2, and num3
    '''
    result = num1+num2+num3
    return result
result = add_three_nums(3, 4, 5)
result

How would you describe these functions if you wrote them inside notebooks instead of scripts?

How would you make use of markdown cells to add explanations?

When we develop or use a function, we might have to time the execution. We can combine %%time to time our function.

Example: Time execution of add_two_nums(10, 100)

%%time
add_two_nums(10,100)

CPU times: user 5 μs, sys: 0 ns, total: 5 μs
Wall time: 9.3 μs

Exercise: Time execution of subtract_two_nums(10, 100)

Solution

%%time
subtract_two_nums(10, 100)

CPU times: user 6 μs, sys: 0 ns, total: 6 μs
Wall time: 8.58 μs

-90

Exercise: Time execution of add_three_nums(10, 100, 1000)

Solution

%%time
add_three_nums(10, 100, 1000)

CPU times: user 5 μs, sys: 0 ns, total: 5 μs
Wall time: 8.11 μs

Section 4: Accessing Contents of Script/Notebook with %run

The %run is a magic command in Jupyter helps us execute code from one notebook inside another. This approach is particularly useful when we want to reuse code or break our work into smaller, more manageable parts without copying everything into the current notebook. By using %run, we can bring in all the variables, functions, and data from another notebook or script, making them immediately available in our current environment.

Additionally, we have the flexibility to pass extra information, known as arguments, to the notebook we’re running. This allows us to customize its behavior for different tasks or scenarios. Overall, %run helps us keep our work organized and maintainable, especially in complex projects where reusing code is key to efficiency.

In this section, we will practice using %run

Code	Description
`%run hello.py`	Executes the Python script located at `hello.py` in the current Jupyter notebook environment.

Exercises

Example: Run hello.py script from here

%run run_section/hello.py

Hello world

Exercise: Run hello_nb.ipynb from here

Solution

%run run_section/hello_nb.ipynb

Exercise: Run hello.txt from here. What difference do you notice?

Solution

%run run_section/hello.txt

The error occurs because %run expects the file being executed to contain valid Python code or a Jupyter notebook. In this case, hello.txt is a plain text file with the content “Hello World,” which is not valid Python syntax. Since %run is trying to execute the text as Python code, it encounters a SyntaxError.

Exercise: Change contents of hello.txt to say print(“Hello World”) and run it. Does this work?

Solution

%run run_section/hello.txt

%run not only shows standard outputs, but we can also access variables in the script or python notebook.

Example: Run hello.py and print name

%run run_section/hello.py
name

Hello world

'John Doe'

Exercise: Run hello.py and print age

Solution

%run run_section/hello.py
age

Hello world

Exercise: Run hello_nb.ipynb and print location

Solution

%run run_section/hello_nb.ipynb
location

'Earth'

Any variables you have here can be overwritten if they are also in the run notebook.

Example: Set name to “Jane Doe” and run hello.py. What is name now?

name = "Jane Doe"
print(name)
%run run_section/hello.py
name

Jane Doe
Hello world

'John Doe'

Exercise: Set age to 50 and run hello.py. What is the age now?

Solution

age = 50
print(age)
%run run_section/hello.py
age

50
Hello world

It is the same for notebooks.

Exercise: Set location to “Saturn” and run hello_nb.ipynb. What is the location now?

Solution

location = "Saturn"
print(location)
%run run_section/hello_nb.ipynb
location

Saturn

'Earth'

%run also lets us access any functions within a script of notebook

Example: Add 2 and 3 using add_two_numbers in hello.py

%run run_section/hello.py
add_two_numbers(2, 3)

Hello world

Exercise: Add 2, 3, 4 using add_three_numbers from hello_nb.ipynb

Solution

%run run_section/hello_nb.ipynb
add_three_numbers(2,3,4)

Exercise: Multiply 5 and 6 using multiply_two_numbers from hello_nb.ipynb

Solution

%run run_section/hello_nb.ipynb
multiply_two_numbers(5,6)

Section 5: Accessing Parts of Another Notebook

In this section, we will learn how to access parts of another notebook without executing the whole notebook. A python library called import_ipynb can do this by essentially treating the notebook as a python module. This allows us to reuse code from another notebook without executing all the cells. import_ipynb module exracts only the relevant Python cells from the notebook ignoring markdown cells.

Code	Description
`sys.path.append('run_section/hello_nb.ipynb')`	Adds the path `'run_section/hello_nb.ipynb'` to Python’s list of module search paths, allowing you to import or run the Jupyter notebook as a module.

First, we have to add the directory where the notebook we have to import is to the path. This can be done by the below cell of code. Run it to add run_section/hello_nb.ipynb notebook.

Exercises

import sys
sys.path.append('run_section/hello_nb.ipynb')

Example: Import hello_nb.ipynb and display name

import import_ipynb
import run_section.hello_nb as nb
nb.name

'John Doe'

Exercise: Import hello_nb.ipynb and display age

Solution

import import_ipynb
import run_section.hello_nb as nb
nb.age

Exercise: Import hello_nb.ipynb and display location

Solution

import import_ipynb
import run_section.hello_nb as nb
nb.location

'Earth'

We can also import only some variables.

Example: Import only name

import import_ipynb
from run_section.hello_nb import name
name

'John Doe'

Exercise: Import only age

Solution

import import_ipynb
from run_section.hello_nb import age
age

Exercise: Import only location

Solution

import import_ipynb
from run_section.hello_nb import location
location

'Earth'

Example: Add 1, 2, 3 by importing only add_three_numbers

import import_ipynb
from run_section.hello_nb import add_three_numbers
add_three_numbers(1,2,3)

Exercise: Multiply 8 and 9 by importing only multiply_two_numbers

Solution

import import_ipynb
from run_section.hello_nb import multiply_two_numbers
multiply_two_numbers(8,9)

We can also import python scripts!

Exercise: Add 1 and 2 by importing only add_two_numbers from hello.py

Solution

import import_ipynb
from run_section.hello import add_two_numbers
add_two_numbers(1,2)

Hello world