Linear Regression and Regularization
Author
import numpy as np
import matplotlib.pyplot as pltLinear models are useful tools for describing the relationship between variables (e.g. the intensity of a stimulus and the magnitude of a neural response) and form the basis of many different methods of statistical analysis. In this notebook you are going to learn about two of them: ordinary least squares (OLS) and ridge regression. OLS finds the line that best fits the data, while ridge regression helps to avoid overfitting, leading to better predictions on new, unseen data. You’ll also learn how to use bootstrap resampling to identify which predictors are significantly associated with the outcome.
The notebook proceeds as follows:
- Estimate the linear relationship between simulated variables using OLS regression
- Use ridge regularization and test how this improves the prediction of new, unseen data
- Apply bootstrap resampling to test the significance of the model’s predictors
- Implement OLS and ridge regression using pure NumPy
Setup
Utility Functions
def _fit_regression(X, y, intercept=False, lam=0):
X = X.copy()
if intercept:
X = np.hstack((np.ones((len(X), 1)), X))
I = np.eye(X.shape[1])
if intercept:
I[0, 0] = 0
coef = np.linalg.solve(X.T @ X + lam * I, X.T @ y)
return coef
def _apply_regression(X, coefs):
X = np.atleast_2d(X).T if X.ndim == 1 else X.copy()
if len(coefs) == X.shape[1] + 1:
X = np.hstack([np.ones((len(X), 1)), X])
y_pred = X @ coefs
return y_pred
def _plot_bootstrap_coefs(coefs, ci=95):
mean = coefs.mean(axis=0)
lower = np.percentile(coefs, (100 - ci) / 2, axis=0)
upper = np.percentile(coefs, 100 - (100 - ci) / 2, axis=0)
plt.errorbar(range(len(mean)), mean,
yerr=[mean - lower, upper - mean],
fmt='o', capsize=5)
plt.axhline(0, color='gray', linestyle='--')
plt.xlabel("Coefficient")
plt.ylabel("Value");
for i, (lo, hi) in enumerate(zip(lower, upper)):
if lo > 0 or hi < 0:
plt.text(i, upper[i] + 0.02, '*', ha='center')
class utils:
fit_regression = _fit_regression
apply_regression = _apply_regression
plot_bootstrap_coefs = _plot_bootstrap_coefsSection 1: Ordinary Least Squares (OLS) Regression
Background
A linear relationship has the form where is the dependent variable and is a matrix with one or multiple predictors. is its slope, which describes how strongly and are associated. If there are multiple predictors in , there is one for each one.
The simplest way of estimating the values of is ordinary least squares (OLS) regression. OLS finds the coefficients that minimize the sum of squared residuals between the observed data and the model predictions :
Optionally, we can add an intercept to the model: to account for cases where the data is not centered on 0.
Exercises
In this section, you will use OLS to estimate the slope and intercept of a linear model from simulated data and visualize the estimated model. Rather than implementing OLS from scratch, the notebook provides utility functions for fitting and applying OLS regression. Here’s how to use them:
| Code | Description |
|---|---|
utils.fit_regression(X, y) |
Fit OLS regression and return the estimated coefficients |
utils.fit_regression(X, y, intercept=True) |
Fit OLS regression with an intercept term |
utils.apply_regression(X, coef) |
Compute predictions for a given X and coefficients |
plt.scatter(x, y) |
Create a scatter plot of x and y |
plt.plot(x, y) |
Create a line plot of x and y |
rng = np.random.default_rng(42) |
Create a random number generator with seed 42 |
rng.standard_normal(size=(10, 1)) |
Sample 10 values from a standard normal distribution |
Example: The cell below samples 20 data points that follow the equation . Use utils.fit_regression and print the estimated coefficient .
rng = np.random.default_rng(42)
X = rng.standard_normal(size=(20 , 1))
y = 3 * X + rng.standard_normal(size=(20 , 1))
plt.scatter(X, y);coef = utils.fit_regression(X, y)
coefarray([[3.15525491]])Exercise: The cell below samples 500 points from the same model. Use utils.fit_regression and print the estimated coefficient. Is this value close to the true value ?
rng = np.random.default_rng(42)
X = rng.standard_normal(size=(500 , 1))
y = 3 * X + rng.standard_normal(size=(500, 1))
plt.scatter(X, y);Solution
coef = utils.fit_regression(X, y)
coefarray([[2.99134185]])Exercise: The cell below samples 100 data points from a different model. Use utils.fit_regression to estimate the value of .
rng = np.random.default_rng(42)
X = rng.standard_normal(size=(100 , 1))
y = -0.75 * X + rng.standard_normal(size=(100, 1))
plt.scatter(X, y);Solution
coef = utils.fit_regression(X, y)
coefarray([[-0.63043635]])Example: Estimate the value of and use utils.apply_regression to compute the predicted values for y y_pred. Plot the original data points together with the prediction.
coef = utils.fit_regression(X, y)
y_pred = utils.apply_regression(X, coef)
plt.scatter(X, y)
plt.plot(X, y_pred);Exercise: The cell below samples 100 data points from the model . Estimate the value of and use utils.apply_regression to compute the predicted values for y y_pred. Plot the original data points together with the prediction. Does the line fit the data?
rng = np.random.default_rng(42)
X = rng.standard_normal(size=(100 , 1))
y = 2 + 3 * X + rng.standard_normal(size=(100,1))Solution
coef = utils.fit_regression(X, y)
y_pred = utils.apply_regression(X, coef)
plt.scatter(X, y)
plt.plot(X, y_pred);Exercise: Repeat the code from above but set intercept=True when calling utils.fit_regression() to also estimate the intercept . Does the line fit the data now?
Solution
coef = utils.fit_regression(X, y, intercept=True)
y_pred = utils.apply_regression(X, coef)
plt.scatter(X, y)
plt.plot(X, y_pred);Exercise: Print the estimated coefficients from the previous exercise. Do they match the true values and ?
Solution
coefarray([[1.99535902],
[3.11917464]])Exercise: The cell below samples 100 data points from the model . Use utils.fit_regression() to fit a linear regression with intercept and print the estimated coefficients.
rng = np.random.default_rng(42)
X = rng.standard_normal(size=(100 , 3))
y = 2 + 3*X[:,0] - 1.5*X[:,1] + 0.8*X[:,2] + rng.standard_normal(100)Solution
coef = utils.fit_regression(X, y, intercept=True)
coefarray([ 2.08767734, 3.00559502, -1.43020719, 0.60250069])Example: Predict the relationship between and (X[:,0]) using the estimates for and (coef[[0, 1]]) and plot the original data together with the estimated relationship.
y_pred = utils.apply_regression(X[:,0], coef[[0, 1]])
plt.scatter(X[:,0], y)
plt.plot(X[:,0], y_pred)Exercise: Predict the relationship between and (X[:,1]) using the estimates for and (coef[[0, 2]]) and plot the original data together with the estimated relationship.
Solution
y_pred = utils.apply_regression(X[:,1], coef[[0, 2]])
plt.scatter(X[:,1], y)
plt.plot(X[:,1], y_pred)Exercise: Predict the relationship between and (X[:,2]) using the estimates for and (coef[[0, 3]]) and plot the original data together with the estimated relationship.
Solution
y_pred = utils.apply_regression(X[:,2], coef[[0, 3]])
plt.scatter(X[:,2], y)
plt.plot(X[:,2], y_pred)Section 2: Making Robust Predictions with Ridge Regularization
Background
OLS finds the coefficients that best fit the training data, but when the number of predictors is large relative to the number of observations, it can overfit — fitting the noise in the training data rather than the underlying signal. As a result, the model performs well on the training data but poorly on new, unseen data.
Ridge regression addresses this by adding a penalty on the size of the coefficients to the OLS objective:
This shrinks the coefficients toward zero, reducing overfitting. The strength of the regularization is controlled by : a larger leads to more shrinkage. The solution is:
When is 0, ridge regression is equivalent to OLS.
Exercises
In this section, you’ll use the lam parameter in utils.fit_regression to use ridge regression with different values of . You’ll fit ridge regression to training data and evaluate it on test data to see how well the model generalizes to new, unseen data. To evaluate model accuracy, you are going to use the mean squared error (MSE) between the observed data and the model’s predictions. Here are the relevant code snippets:
| Code | Description |
|---|---|
utils.fit_regression(X, y, lam=1) |
Fit ridge regression with regularization strength |
np.sqrt(np.mean((y - y_pred)**2)) |
Compute the root mean squared error (RMSE) between y and y_pred |
np.linspace(0, 10, 20) |
Create an array of 20 evenly spaced values between 0 and 10 |
np.argmin(arr) |
Return the index of the smallest value in arr |
The cell below generates 30 samples with 10 predictors, only two of which are actually associated with y (in reality we usually do not know which predictors are associated with y).
We then assign the first 15 samples to the train set and the rest to the test set. We’ll fit the regression on the train set and evaluate it on the test set to see how well the model generalizes to unseen data.
rng = np.random.default_rng(42)
X = rng.standard_normal(size=(30 , 10))
y = 1 + 1.75*X[:,0] - 0.9*X[:,1] + rng.standard_normal(30)
X_train, X_test = X[:15], X[15:]
y_train, y_test = y[:15], y[15:]Example: Fit OLS regression on the train set, predict y_train and compute the root mean squared error (RMSE) of the prediction: np.sqrt(np.mean((y_train-y_train_pred)**2)).
coef = utils.fit_regression(X_train, y_train, intercept=True)
y_train_pred = utils.apply_regression(X_train, coef)
np.sqrt(np.mean((y_train-y_train_pred)**2))np.float64(0.34449221293568094)Exercise: Use the coef from above to predict the test set and compute the MSE. How does it compare to the MSE for the train set?
Solution
y_test_pred = utils.apply_regression(X_test, coef)
np.sqrt(np.mean((y_test-y_test_pred)**2))np.float64(2.339959720309393)Exercise: Fit a regression to the train set again but set lam=5 to use ridge regression with . Predict y_train and compute the MSE. How does it compare to the previous MSE for the train set?
Solution
coef = utils.fit_regression(X_train, y_train, intercept=True, lam=5)
y_train_pred = utils.apply_regression(X_train, coef)
np.sqrt(np.mean((y_train-y_train_pred)**2))np.float64(0.7331380443164356)Exercise: Use the coef from above to predict the test set and compute the MSE. How does it compare to the previous MSE for the test set?
Solution
y_test_pred = utils.apply_regression(X_test, coef)
np.sqrt(np.mean((y_test-y_test_pred)**2))np.float64(1.5495128300227903)Exercise: Fit a ridge regression on the train set with and use the estimated coefficients to predict y_train and y_test. How do the MSEs for test and train set compare to the previous exercises?
Solution
coef = utils.fit_regression(X_train, y_train, intercept=True, lam=15)
y_train_pred = utils.apply_regression(X_train, coef)
y_test_pred = utils.apply_regression(X_test, coef)
np.sqrt(np.mean((y_train-y_train_pred)**2)), np.sqrt(np.mean((y_test-y_test_pred)**2))(np.float64(1.0963781800130035), np.float64(1.4791331477136682))Exercise: Fit a ridge regression on the train set with and use the estimated coefficients to predict y_train and y_test. Is the MSE larger for the train or test set?
Solution
coef = utils.fit_regression(X_train, y_train, intercept=True, lam=1000)
y_train_pred = utils.apply_regression(X_train, coef)
y_test_pred = utils.apply_regression(X_test, coef)
np.sqrt(np.mean((y_train-y_train_pred)**2)), np.sqrt(np.mean((y_test-y_test_pred)**2))(np.float64(1.757910624020377), np.float64(1.733375772618923))Exercise: Use np.linspace() to create an array with 20 lambdas between 0 and 100 and run the cell below to compute the MSE on the test set for every and plot the result.
lambdas = ... # np.linspace()Solution
lambdas = np.linspace(0, 100, 20)mse = []
for lam in lambdas:
coef = utils.fit_regression(X_train, y_train, intercept=True, lam=lam)
y_train_pred = utils.apply_regression(X_train, coef)
mse.append(np.sqrt(np.mean((y_train-y_train_pred)**2)))
plt.plot(lambdas, mse)
plt.xlabel("$\lambda$")
plt.ylabel("MSE (Train Set)")Exercise: Use np.linspace() to create an array with 1000 lambdas between 0 and 200 and run the cell below to compute the MSE on the test set for every and plot the result.
lambdas = ... # np.linspace()Solution
lambdas = np.linspace(0, 200, 1000)mse = []
for lam in lambdas:
coef = utils.fit_regression(X_train, y_train, intercept=True, lam=lam)
y_test_pred = utils.apply_regression(X_test, coef)
mse.append(np.sqrt(np.mean((y_test-y_test_pred)**2)))
plt.plot(lambdas, mse)
plt.xlabel("$\lambda$")
plt.ylabel("MSE (Test Set)")Exercise: Use np.argmin to find the index of the smallest MSE for the test set and run the cell below to print the optimal value for .
min_idx = ... # np.argmin()Solution
min_idx = np.argmin(mse)print("Optimal value for lambda: ", lambdas[min_idx])Optimal value for lambda: 13.013013013013014Section 3: Quantifying Uncertainty with Bootstrapping
Background
In the previous section we used ridge regression and found the value of for making predictions on new, unseen data. However, a single regression fit gives point estimates of the coefficients, but not a measure of how reliable those estimates are. Bootstrapping is a resampling technique that quantifies this uncertainty without requiring distributional assumptions.
The idea is simple: repeatedly resample the data with replacement, refit the model on each resample, and collect the resulting coefficients. The spread of these bootstrapped estimates reflects how much the coefficients would vary if the experiment were repeated — giving a confidence interval (CI) for each coefficient.
A coefficient whose CI does not include zero can be considered statistically significant at the corresponding level.
Exercises
In the following exercises you are going to apply bootstrap resampling to identify which coefficients are significantly associated with the dependent variable. You are also going to see how ridge and OLS regression differ with respect to the coefficient estimates. The notebook provides the resampling code as well as a utility function to visualize the estimates with confidence intervals:
| Code | Description |
|---|---|
idx = np.random.choice(n, n, replace=True) |
Draw n random integers between 0 and n with replacement |
np.stack(arrays) |
Stack a list of arrays into a single array |
utils.plot_bootstrap_coefs(coefs) |
Plot mean coefficients with 95% CI, asterisk marks CIs excluding 0 |
utils.plot_bootstrap_coefs(coefs, ci=99) |
Plot with a 99% CI |
Exercise: The code below randomly samples from X and y and then applies ridge regression to the resulting bootstrap sample X_boot, y_boot. Rerun the cell a few times and observe how the values of the estimated coefficients change.
Note: The true trend in the data is given by the equation . to are not associated with at all.
idx = np.random.choice(len(X), len(X), replace=True)
X_boot, y_boot = X[idx], y[idx]
coef = utils.fit_regression(X_boot, y_boot, intercept=True, lam=13)
coefarray([ 1.28666415, 0.98292911, -0.61409241, -0.18522603, 0.15980478,
-0.19004276, -0.09841726, -0.17051188, -0.16305329, -0.06011261,
0.04875652])Exercise: Set n_boot to 100 and run the cell below to compute the regression coefficients 100 times with bootstrap resampling using the previously estimated best value of . Print the bootstrapped coefficients - what is the shape of the array?
n_boot = ...Solution
n_boot = 100coefs = []
for i in range(n_boot):
idx = np.random.choice(len(X), len(X), replace=True)
X_boot, y_boot = X[idx], y[idx]
coef = utils.fit_regression(X_boot, y_boot, intercept=True, lam=13)
coefs.append(coef)
coefs = np.stack(coefs)
coefsarray([[ 1.23695717, 0.69134477, -0.55211956, ..., 0.05927028,
0.06008401, -0.151001 ],
[ 1.62440651, 0.91209257, -0.55471076, ..., -0.05183594,
-0.11540324, 0.05235936],
[ 1.4551361 , 0.67992198, -0.15373708, ..., -0.0040847 ,
-0.01714821, 0.04267496],
...,
[ 1.42698789, 1.03316772, -0.420802 , ..., -0.14716959,
-0.05798001, -0.00685537],
[ 1.2208383 , 0.79222136, -0.54683182, ..., -0.10275386,
0.00533218, 0.03058543],
[ 1.21449081, 0.91517625, -0.39263143, ..., 0.07640955,
-0.11786716, 0.15861812]], shape=(100, 11))Exercise: Rerun the code from above but set n_boot to 10_000. How did this change the shape of coefs?
Solution
n_boot = 10000
coefs = []
for i in range(n_boot):
idx = np.random.choice(len(X), len(X), replace=True)
X_boot, y_boot = X[idx], y[idx]
coef = utils.fit_regression(X_boot, y_boot, intercept=True, lam=13)
coefs.append(coef)
coefs = np.stack(coefs)
coefsarray([[ 1.66580184, 0.83907837, -0.38962806, ..., 0.03339816,
-0.088214 , 0.05630418],
[ 0.97983361, 0.57847098, -0.2078252 , ..., 0.25443684,
-0.09184317, 0.06679293],
[ 1.32553261, 1.06024077, -0.50136916, ..., -0.01212684,
0.03584278, -0.12186026],
...,
[ 1.6390338 , 0.93733506, -0.40086725, ..., -0.26435439,
-0.08500956, 0.04845428],
[ 1.49176372, 0.81536038, -0.27869283, ..., -0.13402602,
0.08238664, 0.200545 ],
[ 1.17760004, 1.03200618, -0.62195721, ..., -0.20897942,
0.12867731, -0.17484593]], shape=(10000, 11))Example: Plot the mean value for each coefficient across all bootstrap resamples with a 95% confidence interval (CI). The asterisk indicates that the CI for the given coefficient does not include 0.
utils.plot_bootstrap_coefs(coefs, ci=95)Exercise: Plot the bootstrapped coefficients with a 99% and 99.9% CI. Which coefficients are significantly different from 0?
Solution
utils.plot_bootstrap_coefs(coefs, ci=99)utils.plot_bootstrap_coefs(coefs, ci=99.9)Exercise: Repeat the bootstrap estimation of the coefficients but set lam=0 to use OLS regression. Plot the estimated coefficients with a 99.9% CI. How does this compare to the coefficients estimated with ridge regression?
Solution
n_boot = 10000
coefs = []
for i in range(n_boot):
idx = np.random.choice(len(X), len(X), replace=True)
X_boot, y_boot = X[idx], y[idx]
coef = utils.fit_regression(X_boot, y_boot, intercept=True, lam=0)
coefs.append(coef)
coefs = np.stack(coefs)
utils.plot_bootstrap_coefs(coefs, ci=99.9)Section 4: Implementing OLS and Ridge Regression in NumPy
Background
In the previous sections we used the provided utility functions to fit and apply OLS and ridge regression. This allowed us to focus on the interpretation of the results rather than the implementation of the regression itself. However, the functions are actually pretty straightforward and can be implemented in a few steps using NumPy. This section will show you how to do exactly that.
Exercises
In this section you are going to apply OLS and ridge regression to simulated data using only NumPy functions and the matrix multiplication operator @. Some of the steps will be provided for you and there will be hints in the exercise text on which functions to use.
| Code | Description |
|---|---|
np.ones((n, 1)) |
Create an column of ones |
np.hstack((A, B)) |
Horizontally stack arrays A and B |
np.eye(p) |
Create a identity matrix |
np.linalg.solve(A, b) |
Solve the linear system for |
X.T @ X |
Compute |
X @ coef |
Compute predictions |
The cell below generates 100 data points from the model .
rng = np.random.default_rng(42)
X = rng.standard_normal(size=(100, 1))
y = 1 + 2 * X + rng.standard_normal(size=(100, 1))
plt.scatter(X, y)Exercise: Solve the normal equations using np.linalg.solve(X.T @ X, X.T @ y) and print the returned regression coefficients.
Solution
coef = np.linalg.solve(X.T @ X, X.T @ y)
coefarray([[2.03574216]])Exercise: Predict the values of via matrix multiplication of X and the estimated coefficients (X @ coef). Then plot the original data together with the prediction.
Solution
y_pred = X @ coef
plt.scatter(X, y)
plt.plot(X, y_pred);Example: To fit an intercept, we prepend a column of np.ones() to X. The coefficient assigned to that column gets multiplied by 1 for every sample and thus acts as a constant offset.
X = np.hstack((np.ones((len(X), 1)), X))
X[:10]array([[ 1. , 0.30471708],
[ 1. , -1.03998411],
[ 1. , 0.7504512 ],
[ 1. , 0.94056472],
[ 1. , -1.95103519],
[ 1. , -1.30217951],
[ 1. , 0.1278404 ],
[ 1. , -0.31624259],
[ 1. , -0.01680116],
[ 1. , -0.85304393]])Exercise: Solve the normal equations using np.linalg.solve(X.T @ X, X.T @ y) and print the returned regression coefficients. Do the values match the true intercept and slope ?
coef = np.linalg.solve(X.T @ X, X.T @ y)
coefarray([[0.99535902],
[2.11917464]])Exercise: Predict the values of via matrix multiplication of X and the estimated coefficients (X @ coef). Then plot the original data together with the prediction.
Note: Because we added a column of ones to X, use the second column when plotting (X[:,1]).
Solution
plt.scatter(X[:, 1], y)
plt.plot(X[:, 1], X @ coef);Example: To use ridge regression, we create an identity matrix with the size equal to the number of features in X. We multiply it by to control the regularization strength. Because we do not want to regularize the intercept, we set the first entry of I to 0.
I = np.eye(X.shape[1])
I[0, 0] = 0
Iarray([[0., 0.],
[0., 1.]])Exercise: Solve the normal equations using np.linalg.solve(X.T @ X + lam * I, X.T @ y) with lam=1. Compute predictions and plot the original data together with the regression line.
Solution
lam = 1
coef = np.linalg.solve(X.T @ X + lam * I, X.T @ y)
plt.scatter(X[:, 1], y)
plt.plot(X[:, 1], X @ coef);Exercise: Test different values of and observe how the slope of the regression changes.
Solution
plt.scatter(X[:, 1], y, color="black")
for lam in [1, 10, 100, 1000]:
coef = np.linalg.solve(X.T @ X + lam * I, X.T @ y)
plt.plot(X[:, 1], X @ coef, label=f"$\lambda$={lam}")
plt.legend();