Python Modules

Courses

Neuroscience Data Analysis Pipelines with Python, Git, and Snakemake

Authors

Dr. Mohammad Bashiri | Dr. Nicholas Del Grosso

Download Materials

Section 1: Using functions from Python modules

Placing all your functions directly in a notebook can lead to clutter and complexity, particularly as your project expands, making it harder to navigate and maintain. Also, we probably want to use some of these functions in other notebooks too. Organizing functions into modules is a smarter approach, offering clarity, structure, and reusability.

What are Python modules?

Python modules are python files (i.e. with a .py extension) where we store our python code that we tend to re-use (e.g. python functions) in other places (notebooks, CLIs, etc.). Python modules are usually stored in a folder called src (which stands for source). Once we store our functions in Python modules, we can then import them and use them.

In this notebook, instead of defining the functions here, we are going to store them inside Python modules and import them inside the notebook.

Exercises

Example: Create a Python module called stats.py in the folder src. Move the calculate_mean function in it. Import the function from the module and make sure it works by testing a couple of different cases.

def calculate_mean(numbers):
    return sum(numbers) / len(numbers)

def calculate_std(numbers):
    mean = calculate_mean(numbers)
    variance = sum([(x - mean) ** 2 for x in numbers]) / len(numbers)
    std = variance ** 0.5
    return std

def calculate_median(numbers):
    numbers.sort()
    n = len(numbers)
    mid = n // 2
    if n % 2 == 0:
        median = (numbers[mid - 1] + numbers[mid]) / 2
    else:
        median = numbers[mid]
    return median

# First we need to make sure our python environment knows about the folder "src" (we only need to do this once)
import sys
sys.path.append('../src')

from stats import calculate_mean

data = [1, 4, 7]
mean = calculate_mean(data)

mean == 4

True

data = [3, 3, 5, 7, 7]
mean = calculate_mean(data)

mean == 5

True

Exercise: Add a function called calcualte_std to the stats.py module such that following code runs. Compare the result of this function with Numpy’s std method. Do they yield the same value?

import numpy as np
from stats import calculate_std

data = [-8, -4, 0, 4, 8]
std = calculate_std(data)

std == np.std(data)

True

Exercise: Add a function called calcualte_median to the stats.py module such that following code runs.

from stats import calculate_median

data = [1, 4, 2, 2, 3, 4, 5]

median = calculate_median(data)

median == 3

True

Exercise: Create a new Python module called transformations.py. Add a function called normalize to the transformations.py module such that following code runs.

import numpy as np

def normalize(data, min_val=0, max_val=1):
    data = np.array(data)
    # step 1: normalize between 0 and 1
    data_nomred = data - data.min()
    data_normed = data_nomred / data_nomred.max()

    data_normed = data_normed * (max_val - min_val) + min_val
    return data_normed

from transformations import normalize

data_normalized = normalize(data)

min(data_normalized) == 0, max(data_normalized) == 1

(True, True)

Exercise: Let’s use a function to plot and compare the original data vs the standardized and the normalized version. Create a new Python module called visualizations.py. Add a function called plot_histogram to the visualizations.py module. Import the function and use it to plot data, data_standardized, and data_normalized, each in a different cell.

def plot_histogram(data, nbins=20):
    import matplotlib.pyplot as plt
    figure, ax = plt.subplots(ncols=1, nrows=1)
    ax.hist(data, bins=nbins)

import matplotlib.pyplot as plt
from visualizations import plot_histogram

plot_histogram(data)
plt.xlim(-5, 15)

(-5.0, 15.0)

plot_histogram(data_normalized)
plt.xlim(-5, 15)

(-5.0, 15.0)