Python Modules
Authors
Section 1: Using functions from Python modules
Placing all your functions directly in a notebook can lead to clutter and complexity, particularly as your project expands, making it harder to navigate and maintain. Also, we probably want to use some of these functions in other notebooks too. Organizing functions into modules is a smarter approach, offering clarity, structure, and reusability.
What are Python modules?
Python modules are python files (i.e. with a .py extension) where we store our python code that we tend to re-use (e.g. python functions) in other places (notebooks, CLIs, etc.). Python modules are usually stored in a folder called src (which stands for source). Once we store our functions in Python modules, we can then import them and use them.
In this notebook, instead of defining the functions here, we are going to store them inside Python modules and import them inside the notebook.
Exercises
Example: Create a Python module called stats.py in the folder src. Move the calculate_mean function in it. Import the function from the module and make sure it works by testing a couple of different cases.
def calculate_mean(numbers):
return sum(numbers) / len(numbers)
def calculate_std(numbers):
mean = calculate_mean(numbers)
variance = sum([(x - mean) ** 2 for x in numbers]) / len(numbers)
std = variance ** 0.5
return std
def calculate_median(numbers):
numbers.sort()
n = len(numbers)
mid = n // 2
if n % 2 == 0:
median = (numbers[mid - 1] + numbers[mid]) / 2
else:
median = numbers[mid]
return median# First we need to make sure our python environment knows about the folder "src" (we only need to do this once)
import sys
sys.path.append('../src')from stats import calculate_meandata = [1, 4, 7]
mean = calculate_mean(data)
mean == 4Truedata = [3, 3, 5, 7, 7]
mean = calculate_mean(data)
mean == 5TrueExercise: Add a function called calcualte_std to the stats.py module such that following code runs. Compare the result of this function with Numpy’s std method. Do they yield the same value?
import numpy as np
from stats import calculate_stddata = [-8, -4, 0, 4, 8]
std = calculate_std(data)
std == np.std(data)TrueExercise: Add a function called calcualte_median to the stats.py module such that following code runs.
from stats import calculate_mediandata = [1, 4, 2, 2, 3, 4, 5]
median = calculate_median(data)
median == 3TrueExercise: Create a new Python module called transformations.py. Add a function called normalize to the transformations.py module such that following code runs.
import numpy as np
def normalize(data, min_val=0, max_val=1):
data = np.array(data)
# step 1: normalize between 0 and 1
data_nomred = data - data.min()
data_normed = data_nomred / data_nomred.max()
data_normed = data_normed * (max_val - min_val) + min_val
return data_normedfrom transformations import normalizedata_normalized = normalize(data)
min(data_normalized) == 0, max(data_normalized) == 1(True, True)Exercise: Let’s use a function to plot and compare the original data vs the standardized and the normalized version. Create a new Python module called visualizations.py. Add a function called plot_histogram to the visualizations.py module. Import the function and use it to plot data, data_standardized, and data_normalized, each in a different cell.
def plot_histogram(data, nbins=20):
import matplotlib.pyplot as plt
figure, ax = plt.subplots(ncols=1, nrows=1)
ax.hist(data, bins=nbins)import matplotlib.pyplot as plt
from visualizations import plot_histogramplot_histogram(data)
plt.xlim(-5, 15)(-5.0, 15.0)plot_histogram(data_normalized)
plt.xlim(-5, 15)(-5.0, 15.0)