Developing Reusable Code in Jupyter Notebooks
Authors
In research projects, the analytical process often involves multiple, interconnected steps, such as data cleaning, model building, and visualization, where each step depends on the previous one. Managing these dependencies efficiently is crucial for maintaining organized and reproducible research. To streamline such workflows, developing reusable code in Jupyter notebooks is essential. By breaking down complex processes into reusable components—such as functions, configuration settings, or entire notebooks—researchers can ensure consistency, reduce redundancy, and focus on the analysis itself rather than rewriting code for each step. In this notebook, we will explore key techniques for developing reusable code, including the use of magic commands to manage notebook execution, running and sequencing notebooks from one another, and importing code from one notebook to another to treat them like modular scripts.
We start by looking at magic commands to manage notebooks, how to use notebook’s interactive nature to build functions, how to sequence of notebooks from another notebook, how to import contents of notebook like it was a script.
Setup
Download Data
import os
import owncloud
# Ensure the 'data' directory exists
if not os.path.exists('run_section'):
print('Creating directory for data')
os.mkdir('run_section')
# Download hello.py
if not os.path.exists('run_section/hello.py'):
oc = owncloud.Client.from_public_link('https://uni-bonn.sciebo.de/s/4PZ3gTgRWYnyPfP')
oc.get_file('/', 'run_section/hello.py')
# Download hello.txt
if not os.path.exists('run_section/hello.txt'):
oc = owncloud.Client.from_public_link('https://uni-bonn.sciebo.de/s/3Bg1TVfGkDXUVYg')
oc.get_file('/', 'run_section/hello.txt')
# Download hello_nb.ipynb
if not os.path.exists('run_section/hello_nb.ipynb'):
oc = owncloud.Client.from_public_link('https://uni-bonn.sciebo.de/s/PUkIdpQSEnyzIsT')
oc.get_file('/', 'run_section/hello_nb.ipynb')Creating directory for dataSection 1: Line Magic Commands
Magic commands in Jupyter notebooks are special commands that provide a convenient way to perform certain tasks and control the behavior of the Jupyter environment. They start with a %. These commands are designed to facilitate tasks such as running scripts, timing execution, and integrating with other environments or languages.
When we are developing an code as researchers, we are mainly concerned with how long an analysis runs and how to store and load results. In this section, let’s focus on some of Jupyter magic commands that let’s us do just that.
| Code | Description |
|---|---|
%timeit sum(range(100)) |
Measures the execution time of the sum(range(100)) expression using multiple runs. |
%timeit -n 1000 sum(range(100)) |
Measures the execution time of sum(range(100)) with exactly 1000 iterations. |
%load magic_commands/text_config.txt |
Loads the contents of text_config.txt from the specified file into the current code cell. |
project_name = 'Mice Visual Cortex Analysis' |
Assigns the string 'Mice Visual Cortex Analysis' to the variable project_name. |
%store project_name |
Saves the value of project_name for later use in another session or notebook. |
%store -r project_id |
Restores the previously stored variable project_id into the current session. |
project_id |
Outputs the restored value of project_id. |
Exercises
Example: Measure how long it takes to sum numbers from 1 to 100
%timeit sum(range(100))1.01 μs ± 11 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)You might be surprised that it took longer than to just run the code. That is because %timeit runs the line of code multiple times with a lot of iterations in each run to get a reliable estimation of the execution time. The output you see here will look something like this
651 ns ± 11.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Here’s what it means:
| Number/Value | Explanation |
|---|---|
651 ns |
The mean (average) execution time of the code per loop. The unit here is nanoseconds (ns), meaning it took an average of 651 nanoseconds per loop. |
± 11.4 ns |
The standard deviation of the execution time. It indicates the variability in the execution time across different runs. In this case, it varies by 11.4 nanoseconds. |
7 runs |
The number of times %timeit ran the test. In this case, the test was executed 7 times to gather the statistics for the mean and standard deviation. |
1,000,000 loops each |
The number of iterations (loops) that were executed per run. Here, the code was repeated 1,000,000 times in each of the 7 runs to get a more accurate timing. |
Exercise: Measure how long it takes to sum numbers from 1 to 1000
Solution
%timeit sum(range(1000))17 μs ± 109 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)Exercise: Measure how long it takes to sum numbers from 1 to 100000
Solution
%timeit sum(range(100000))1.95 ms ± 10.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)Do you notice that number of loops reduced as the code took longer to execute?
This is because %timeit automatically adjusts the number of loops to ensure an accurate measurement based on the how quickly the code being tested runs.
Example: Measure how long it takes to sum numbers from 1 to 100 with 1000 loops per iteration
%timeit -n 1000 sum(range(100))1.07 μs ± 210 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)Exercise: Measure how long it takes to sum numbers from 1 to 100 with 10000 loops per iteration
Solution
%timeit -n 10000 sum(range(100))999 ns ± 24.6 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)Exercise: Measure how long it takes to sum numbers from 1 to 100 with 5 iterations. Hint: Use -r to set number of iterations
Solution
%timeit -r 5 sum(range(100))993 ns ± 3.34 ns per loop (mean ± std. dev. of 5 runs, 1,000,000 loops each)Exercise: Measure how long it takes to sum numbers from 1 to 100 with 5 iterations and 1000 loops per iteration
Solution
%timeit -r 5 -n 1000 sum(range(100))1.19 μs ± 311 ns per loop (mean ± std. dev. of 5 runs, 1,000 loops each)In the below exercises, let’s practice loading contents of another file.
Example: Load contents of magic_commands/text_config file
# %load magic_commands/text_config.txt
num_exp=10 # number of experiments
scientist="John Doe" # name of the scientistAs soon as you execute the cell with %load, the following happens
- %load command itself is turned into a comment
- Below the comment, the contents of the file gets loaded
- You can edit it and then execute the line
Exercise: Load contents of magic_commands/python_config file
Solution
# %load magic_commands/python_config.pyIt can also work with other notebooks. It loads the whole notebook.
Exercise: Load contents of magic_commands/notebook_config.ipynb
Solution
# %load magic_commands/notebook_config.ipynbExample: Store “Mice Visual Cortex Analysis” as project_name so any notebook can access it
project_name = 'Mice Visual Cortex Analysis'
%store project_nameStored 'project_name' (str)Now this variable is stored on the disk in ~/.ipython. It will be available to all jupyter notebooks as long as the notebooks are run in same environment
Exercise: Store 123456 as project_id so any notebook can access it
Solution
project_id = 123456
%store project_idStored 'project_id' (int)Exercise: Store “Genius Lab” as lab_name so any notebook can access it.
Solution
lab_name = "Genius lab"
%store lab_nameStored 'lab_name' (str)Example: In another notebook, retrieve project_name
%store -r project_id
project_id123456Exercise: In another notebook, retrieve lab_name
Solution
%store -r lab_name
lab_name'Genius lab'Exercise: In another notebook, retrieve project_id
Solution
%store -r project_id
project_id123456Section 2: Cell Magic Commands
Line magic commands run only on one line of code.
However, when we are dealing with writing contents of a cell to a file, or timing a whole block of code, Jupyter provides Cell Magic Commands that start with %% which runs on whole cell.
Let’s look into some cell magic commands that run on entire cell.
| Code | Description |
|---|---|
%%time |
Measures the time it takes to execute the entire cell (the code block). |
%%writefile experiment_info.txt |
Writes the content of the cell into a new text file named experiment_info.txt. |
%%writefile -a experiment_info.py |
Appends the content of the cell to the existing Python file experiment_info.py. |
%%capture output |
Captures the standard output and standard error of the cell into the variable output for later use. |
output.stdout, output.stderr |
Retrieves the captured standard output and error from the output variable. |
Exercises
Example: Measure how long it takes to sum numbers upto 1000 with loop.
%%time
result = 0
for i in range(1000):
result += iCPU times: user 118 μs, sys: 1 μs, total: 119 μs
Wall time: 125 μsExercise: Measure how long it takes to sum numbers upto 10000000 with loop.
Solution
%%time
result = 0
for i in range(10000000):
result += iCPU times: user 1.28 s, sys: 4 ms, total: 1.28 s
Wall time: 1.28 sYou can also use it for single line of code as long as the code is below %%time
Exercise: Measure how long it takes to sum numbers upto 10000000 without loop.
Solution
%%time
sum(range(10000000))CPU times: user 228 ms, sys: 4 ms, total: 232 ms
Wall time: 230 ms49999995000000We might also want to write contents of a single cell into a file of its own. This can be useful when you write functions or have a list of variables that you want to store as a python script to access later on.
Example: Store experiment_name, num_mice, num_neuropixels in a file called experiment_info.txt
%%writefile experiment_info.txt
experiment_name = "Mice Visual Cortex"
num_mice = 25
num_neuropixels = 300Writing experiment_info.txtExercise: Store experiment_name, num_mice, num_neuropixels in a file called experiment_info.py
Solution
%%writefile experiment_info.py
experiment_name = "Mice Visual Cortex"
num_mice = 25
num_neuropixels = 300Writing experiment_info.pyExercise: Add num_electrodes to experiment_info.py
Hint: Use -a
Solution
%%writefile -a experiment_info.py
num_electrodes = 100Appending to experiment_info.pySometimes the output can be too long and cluttering. We can deal with that by storing the output in a variable without displaying it on the screen
Example: print("Hello World") but do not display the out
%%capture output
print("Hello World")output.stdout, output.stderr('Hello World\n', '')Here the display is captured in stdout and if there are any errors, they are captured in stderr
Exercise: print(“Hello”) and print(“World”) in two separate lines but do not display the output
Solution
%%capture output
print("Hello")
print("World")output.stdout'Hello\nWorld\n'Exercise: 1+2 on one line and 5*3 on another. But no display
Solution
%%capture output
1+2
5*3output.stdout, output.stderr('', '')The reason you’re seeing (’’, ‘’) for both output.stdout and output.stderr is that neither 1 + 2 nor 5 * 3 produces any standard output or error. These expressions are evaluated, but unless you explicitly use print() or raise an exception, there is no output to capture.
Section 3: Writing Functions Inside Jupyter Notebook
Writing functions in a Jupyter notebook provides a interactive and flexible environment for development and analysis. Notebooks allow for immediate feedback, enabling us to write, test, and modify functions incrementally by executing individual cells. The dynamic and iterative nature makes Jupyter notebooks an excellent tool for experimentation and fine-tuning functions in a user-friendly interface.
In this section, let us practice writing functions.
Exercises
Example: Write a function called add_two_nums which adds num1 and num2 and prints sum on screen.
def add_two_nums(num1, num2):
print(num1, num2)
add_two_nums(3, 4)3 4Exercise: Write a function called add_three_nums which adds num1, num2 and num3 and prints sum on screen.
Solution
def add_three_nums(num1, num2, num3):
print(num1+num2+num3)
add_three_nums(3, 4, 5)12Exercise: Write a function called subtract_two_nums which subtracts num1 and num2 and prints difference on screen.
Solution
def subtract_two_nums(num1, num2):
print(num1-num2)
subtract_two_nums(3,4)-1Example: Write a function called add_two_nums which adds num1 and num2 and returns sum.
def add_two_nums(num1, num2):
result = num1 + num2
return result
result = add_two_nums(3, 4)
result7Exercise: Write a function called add_three_nums which adds num1, num2 and num3 and returns the sum.
Solution
def subtract_two_nums(num1, num2):
result = num1-num2
return result
result = subtract_two_nums(3,4)
result-1Exercise: Write a function called subtract_two_nums which subtracts num1 and num2 and returns the difference.
Solution
def add_three_nums(num1, num2, num3):
result = num1+num2+num3
return result
result = add_three_nums(3, 4, 5)
result12Sometimes, you might have to access functions from scripts into your notebooks. Unlike notebooks, scripts do not have the markdown cells to add explanation or logic. Instead, we can make use of docstrings to explain our function. They reside within the function and give a brief explanation of the purpose of the function.
Example: Add a docstring to add_two_nums
def add_two_nums(num1, num2):
'''
adds num1 and num2
'''
result = num1 + num2
return result
result = add_two_nums(3, 4)
result7Exercise: Add a docstring to subtract_two_nums
Solution
def subtract_two_nums(num1, num2):
'''
subtracts num1 and num2
Equation: num1 - num2
'''
result = num1-num2
return result
result = subtract_two_nums(3,4)
result-1Exercise: Add a doctring to add_three_nums
Solution
def add_three_nums(num1, num2, num3):
'''
Adds num1, num2, and num3
'''
result = num1+num2+num3
return result
result = add_three_nums(3, 4, 5)
result12How would you describe these functions if you wrote them inside notebooks instead of scripts?
How would you make use of markdown cells to add explanations?
When we develop or use a function, we might have to time the execution. We can combine %%time to time our function.
Example: Time execution of add_two_nums(10, 100)
%%time
add_two_nums(10,100)CPU times: user 5 μs, sys: 0 ns, total: 5 μs
Wall time: 9.3 μs110Exercise: Time execution of subtract_two_nums(10, 100)
Solution
%%time
subtract_two_nums(10, 100)CPU times: user 6 μs, sys: 0 ns, total: 6 μs
Wall time: 8.58 μs-90Exercise: Time execution of add_three_nums(10, 100, 1000)
Solution
%%time
add_three_nums(10, 100, 1000)CPU times: user 5 μs, sys: 0 ns, total: 5 μs
Wall time: 8.11 μs1110Section 4: Accessing Contents of Script/Notebook with %run
The %run is a magic command in Jupyter helps us execute code from one notebook inside another.
This approach is particularly useful when we want to reuse code or break our work into smaller, more manageable parts without copying everything into the current notebook.
By using %run, we can bring in all the variables, functions, and data from another notebook or script, making them immediately available in our current environment.
Additionally, we have the flexibility to pass extra information, known as arguments, to the notebook we’re running.
This allows us to customize its behavior for different tasks or scenarios.
Overall, %run helps us keep our work organized and maintainable, especially in complex projects where reusing code is key to efficiency.
In this section, we will practice using %run
| Code | Description |
|---|---|
%run hello.py |
Executes the Python script located at hello.py in the current Jupyter notebook environment. |
Exercises
Example: Run hello.py script from here
%run run_section/hello.pyHello worldExercise: Run hello_nb.ipynb from here
Solution
%run run_section/hello_nb.ipynbExercise: Run hello.txt from here.
What difference do you notice?
Solution
%run run_section/hello.txtThe error occurs because %run expects the file being executed to contain valid Python code or a Jupyter notebook.
In this case, hello.txt is a plain text file with the content “Hello World,” which is not valid Python syntax.
Since %run is trying to execute the text as Python code, it encounters a SyntaxError.
Exercise: Change contents of hello.txt to say print(“Hello World”) and run it. Does this work?
Solution
%run run_section/hello.txt%run not only shows standard outputs, but we can also access variables in the script or python notebook.
Example: Run hello.py and print name
%run run_section/hello.py
nameHello world'John Doe'Exercise: Run hello.py and print age
Solution
%run run_section/hello.py
ageHello world100Exercise: Run hello_nb.ipynb and print location
Solution
%run run_section/hello_nb.ipynb
location'Earth'Any variables you have here can be overwritten if they are also in the run notebook.
Example: Set name to “Jane Doe” and run hello.py. What is name now?
name = "Jane Doe"
print(name)
%run run_section/hello.py
nameJane Doe
Hello world'John Doe'Exercise: Set age to 50 and run hello.py. What is the age now?
Solution
age = 50
print(age)
%run run_section/hello.py
age50
Hello world100It is the same for notebooks.
Exercise: Set location to “Saturn” and run hello_nb.ipynb. What is the location now?
Solution
location = "Saturn"
print(location)
%run run_section/hello_nb.ipynb
locationSaturn'Earth'%run also lets us access any functions within a script of notebook
Example: Add 2 and 3 using add_two_numbers in hello.py
%run run_section/hello.py
add_two_numbers(2, 3)Hello world5Exercise: Add 2, 3, 4 using add_three_numbers from hello_nb.ipynb
Solution
%run run_section/hello_nb.ipynb
add_three_numbers(2,3,4)9Exercise: Multiply 5 and 6 using multiply_two_numbers from hello_nb.ipynb
Solution
%run run_section/hello_nb.ipynb
multiply_two_numbers(5,6)30Section 5: Accessing Parts of Another Notebook
In this section, we will learn how to access parts of another notebook without executing the whole notebook.
A python library called import_ipynb can do this by essentially treating the notebook as a python module.
This allows us to reuse code from another notebook without executing all the cells.
import_ipynb module exracts only the relevant Python cells from the notebook ignoring markdown cells.
| Code | Description |
|---|---|
sys.path.append('run_section/hello_nb.ipynb') |
Adds the path 'run_section/hello_nb.ipynb' to Python’s list of module search paths, allowing you to import or run the Jupyter notebook as a module. |
First, we have to add the directory where the notebook we have to import is to the path. This can be done by the below cell of code. Run it to add run_section/hello_nb.ipynb notebook.
Exercises
import sys
sys.path.append('run_section/hello_nb.ipynb')Example: Import hello_nb.ipynb and display name
import import_ipynb
import run_section.hello_nb as nb
nb.name'John Doe'Exercise: Import hello_nb.ipynb and display age
Solution
import import_ipynb
import run_section.hello_nb as nb
nb.age100Exercise: Import hello_nb.ipynb and display location
Solution
import import_ipynb
import run_section.hello_nb as nb
nb.location'Earth'We can also import only some variables.
Example: Import only name
import import_ipynb
from run_section.hello_nb import name
name'John Doe'Exercise: Import only age
Solution
import import_ipynb
from run_section.hello_nb import age
age100Exercise: Import only location
Solution
import import_ipynb
from run_section.hello_nb import location
location'Earth'Example: Add 1, 2, 3 by importing only add_three_numbers
import import_ipynb
from run_section.hello_nb import add_three_numbers
add_three_numbers(1,2,3)6Exercise: Multiply 8 and 9 by importing only multiply_two_numbers
Solution
import import_ipynb
from run_section.hello_nb import multiply_two_numbers
multiply_two_numbers(8,9)72We can also import python scripts!
Exercise: Add 1 and 2 by importing only add_two_numbers from hello.py
Solution
import import_ipynb
from run_section.hello import add_two_numbers
add_two_numbers(1,2)Hello world3