Relating Neural Firing to Stimuli

Authors
Dr. Ole Bialas | Dr. Nicholas Del Grosso

Setup

Import Libraries

Import the modules required for this session

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from scipy.stats import wilcoxon
%matplotlib inline

Download Data

Download the data required for this session

import requests

urls = [
    "https://uni-bonn.sciebo.de/s/G64EkHoQkeZeoLm",
    "https://uni-bonn.sciebo.de/s/mLMkb2TwbNx3Yg6",
]
fnames = ["flash_stimuli.parquet", "flash_spikes.parquet"]

for url, fname in zip(urls, fnames):
    response = requests.get(f"{url}/download")
    print("Downloading Data ...")
    with open(fname, "wb") as file:
        file.write(response.content)
    print("Done!")
Downloading Data ...
Done!
Downloading Data ...
Done!

Section 1: Referencing Spike Times to Stimulus Presentations

In many experiments, we wish to relate neural spiking to external events, like stimuli. This requires that we can relate the timing of the spikes to the timing of the events. If both spikes and stimuli are stored in data frames, we an use pandas’ merge_asof function to merge the time information from both data frames. In this section, we are going to explore how to use pandas to compute the timing of spikes relative to stimulus onset. Then, we’ll visualize the amount of spikes ocurring right before and after a stimulus in a peri-stimulus time histogram or PSTH.

Code Description
df = pd.read_parquet(mydata.parquet) Read the file "mydata.parquet" into a data frame
pd.merge_asof(left, right, left_on, right_on) Merge data frames left and right based on the columns left_on and right_on
df["newcol"] = values Add a new column named "newcol" to df and assign it the values
df.col1.min() Get the minimum value of "col1"
df[df.col1<1] Get all rows of df where the value of "col1" is smaller than 1
sns.histplot(df, x="col1", hue="col2") Plot a histogram of the values in "col1" and add a hue for the categories in "col2"
sns.histplot(df, x="col1", bins=n) Plot a histogram of the values in "col1" with n bins
plt.axvline(x, ymin, ymax) Add a vertical line at x that goes from ymin to ymax
plt.xlim(xmin, xmax) Limit the x-axis to the range between xmin and xmax

Exercise: Load the data stored in the files "flash_spikes.parquet" and "flash_stimuli.parquet" into data frames and assign them to variables called spikes and stimuli.

Solution
spikes = pd.read_parquet("flash_spikes.parquet")
stimuli = pd.read_parquet("flash_stimuli.parquet")

Example: Use pd.merge_asof() to merge the spikes and stimuli based on the "spike_time" and "start_time". In the resulting df, each spike is paired with the last stimulus that was presented before that spike.

df = pd.merge_asof(spikes, stimuli, left_on="spike_time", right_on="start_time")
df

unit_id brain_area spike_time start_time stop_time
0 951031334 LM 1276.297339 1276.297013 1276.547221
1 951031154 LM 1276.298206 1276.297013 1276.547221
2 951031243 LM 1276.298739 1276.297013 1276.547221
3 951021543 PM 1276.299007 1276.297013 1276.547221
4 951031253 LM 1276.301272 1276.297013 1276.547221
... ... ... ... ... ...
457701 951021607 PM 1572.790671 1572.544573 1572.794779
457702 951039252 RL 1572.792065 1572.544573 1572.794779
457703 951016491 AM 1572.792470 1572.544573 1572.794779
457704 951031263 LM 1572.793746 1572.544573 1572.794779
457705 951021523 PM 1572.793838 1572.544573 1572.794779

457706 rows × 5 columns

Exercise: Subtract the stimulus .start_time column from the .spike_time column to get the spike times relative to stimulus onset. Then, print the smallest spike time in df.

Solution
df.spike_time = df.spike_time - df.start_time
df.spike_time.min()
2.332144799765956e-05

Exercise: The smallest spike time should be very close to 0. However, we would like our analysis time window to include spikes that happen before the stimulus as well. Add a new column to stimuli called "analysis_window_start" that is equal to the stimulus "start_time" minus 0.5 seconds. Print the stimuli to make sure the new column was created.

Solution
stimuli["analysis_window_start"] = stimuli["start_time"] - 0.5
stimuli

start_time stop_time analysis_window_start
0 1276.297013 1276.547221 1275.797013
1 1280.300363 1280.550569 1279.800363
2 1286.305383 1286.555591 1285.805383
3 1290.308743 1290.558946 1289.808743
4 1294.312073 1294.562279 1293.812073
... ... ... ...
70 1546.522853 1546.773054 1546.022853
71 1558.532863 1558.783069 1558.032863
72 1560.534543 1560.784746 1560.034543
73 1570.542883 1570.793091 1570.042883
74 1572.544573 1572.794779 1572.044573

75 rows × 3 columns

Exercise: Use pd.merge_asof() with right_on="analysis_window_start" to match each spike with the analysis window that started before it.

Solution
df = pd.merge_asof(
    spikes, stimuli, left_on="spike_time", right_on="analysis_window_start"
)
df

unit_id brain_area spike_time start_time stop_time analysis_window_start
0 951031334 LM 1276.297339 1276.297013 1276.547221 1275.797013
1 951031154 LM 1276.298206 1276.297013 1276.547221 1275.797013
2 951031243 LM 1276.298739 1276.297013 1276.547221 1275.797013
3 951021543 PM 1276.299007 1276.297013 1276.547221 1275.797013
4 951031253 LM 1276.301272 1276.297013 1276.547221 1275.797013
... ... ... ... ... ... ...
457701 951021607 PM 1572.790671 1572.544573 1572.794779 1572.044573
457702 951039252 RL 1572.792065 1572.544573 1572.794779 1572.044573
457703 951016491 AM 1572.792470 1572.544573 1572.794779 1572.044573
457704 951031263 LM 1572.793746 1572.544573 1572.794779 1572.044573
457705 951021523 PM 1572.793838 1572.544573 1572.794779 1572.044573

457706 rows × 6 columns

Exercise: Subtract the stimulus .start_time column from the .spike_time column to get the spike times relative to stimulus onset. Then, print the smallest spike time in df this value should be very close to -0.5.

Solution
df.spike_time = df.spike_time - df.start_time
df.spike_time.min()
-0.4999979524416176

Exercise: Remove all the the rows in df where the "spike_time" relative to stimulus onset is greater than 1 second and print the maximum spike time.

Solution
df = df[df.spike_time <= 1]
df.spike_time.max()
0.9999981341300099

Exercise: Use sns.histplot to create a histogram of spike times relative to stimulus onset.

Solution
sns.histplot(df, x="spike_time")

Exercise: Use sns.histplot to create a histogram of spike times and use plt.axvline() to add vertical lines that mark the stimulus onset at 0 seconds and the offset at 0.25 seconds (BONUS): try using color="black" and linestyle="--" to make them look nice). Can you see the relationship between the spiking and stimuli?

Solution
sns.histplot(df, x="spike_time")
plt.axvline(x=0, ymin=0, ymax=1, color="black", linestyle='--')
plt.axvline(x=0.25, ymin=0, ymax=1, color="black", linestyle='--')

Exercise: Add a hue to the histogram that encodes the "brain_area".

Solution
sns.histplot(df, x="spike_time", hue="brain_area")

Exercise: Increate the number of bins in the histogram to improve the temporal resolution and limit the x-axis to the first 0.15 second after stimulus onset. Can you see which brain area responds first?

Solution
sns.histplot(df, x="spike_time", hue="brain_area", bins=300)
plt.xlim(0, 0.15)
(0.0, 0.15)

Section 2: Counting Spikes in Discrete Bins

Seaborn’s histplot function gives us a quick and convenient way for visualizing PSTHs. However, if we want to do further analyses on the PSTHs we need to compute and store them ourselves. In this section, we will explore how to divide our analysis into discrete bins and count the number of spikes within each bin. The bin size in an important parameter in many spiking analyses - if the bins are too small the resulting signal will be noisy but if they are too wide we lose temporal resolution.

Code Description
bins = np.arange(start, stop, stepsize) Create an array of evenly spaced bins between start and stop with the given stepsize
df.groupby("col1").col2.value_counts() Group the data by columns "col1" and get the .value_counts() of "col2" for each grouping
df.groupby(["col1", "col2"]).col3.value_counts() Group the data by columns "col1" and "col2" and get the .value_counts() of "col3" for each grouping
df.groupby("col1").col2.value_counts(bins=bins) Group the data by columns "col1" and get the .value_counts() of "col2" in the given bins
df.col1.array.mid Get the midpoints for a series of time intervals
df = df.reset_index() reset the index of df and store the old index in a new column
df.columns = ["new1", "new2"] Rename the columns of df to "new1" and "new2"
sns.lineplot(df, x="col1", y="col2", estimator="std") Plot the standard deviation of "col1" against "col2"
sns.lineplot(df, x="col1", y="col2", hue="col3") Plot the values in "col1" against "col2" and add a hue based on "col3"
sns.lineplot(df, x="col1", y="col2", units="col3", estimator=None) Plot the values in "col1" against "col2" separately for all units in "col3"

Exercise: Use np.arange to create an array of evenly spaced bins between 0 and 1 seconds that are 0.005 seconds wide. Print the number of bins in the array.

Solution
bins = np.arange(0, 1, 0.005)
bins.shape
(200,)

Example: Group the df by "brain_area" and get the .value_counts() of the "spike_time". Then, get the .mid of each interval and store it in the "spike_time" column. Finally, rename the columns to "bin_time" and "spike_count"

psth = df.groupby(["brain_area"]).spike_time.value_counts(bins=bins).reset_index()
psth.spike_time = psth.spike_time.array.mid
psth.columns = ["brain_area", "bin_time", "spike_count"]
psth

brain_area bin_time spike_count
0 AL 0.0625 346
1 AL 0.0575 306
2 AL 0.0675 243
3 AL 0.0875 236
4 AL 0.0475 229
... ... ... ...
1189 V1 0.3525 40
1190 V1 0.3675 40
1191 V1 0.3775 40
1192 V1 0.3625 39
1193 V1 0.3925 27

1194 rows × 3 columns

Exercise: Create an sns.lineplot to plot y="spike_count" across x="bin_time" (Hint: if this takes too long, you can set errorbar=None to speed up computation).

Solution
sns.lineplot(psth, x="bin_time", y="spike_count", errorbar=None)

Exercise: Use sns.lineplot with estimator="std" to plot the standard deviation of the "spike_count" across "bin_time".

Solution
sns.lineplot(psth, x="bin_time", y="spike_count", errorbar=None, estimator="std")

Exercise: Create new bins between -0.2 and 0.8 seconds with a width of 0.025 seconds. Them compute the psth as shown in @exm-bin and plot the "spike_count" across "bin_time".

Solution
bins = np.arange(-0.2, 0.8, 0.025)
psth = df.groupby(["brain_area"]).spike_time.value_counts(bins=bins).reset_index()
psth.spike_time = psth.spike_time.array.mid
psth.columns = ["brain_area", "bin_time", "spike_count"]
sns.lineplot(psth, x="bin_time", y="spike_count", errorbar=None)

Exercise: Reduce the bin width to a value that you think is appropriate, then re-compute the psth and plot it again.

Solution
bins = np.arange(-0.2, 0.8, 0.01)
psth = df.groupby(["brain_area"]).spike_time.value_counts(bins=bins).reset_index()
psth.spike_time = psth.spike_time.array.mid
psth.columns = ["brain_area", "bin_time", "spike_count"]
sns.lineplot(psth, x="bin_time", y="spike_count", errorbar=None)

Exercise: Use sns.lineplot to plot the PSTH for each brain_area separately by setting units="brain_area" and estimator=None.

Solution
sns.lineplot(psth, x="bin_time", y="spike_count", errorbar=None, estimator=None, units="brain_area")

Exercise: Re-calculate the psth as shown in @exm-bin but group the data by "unit_id" instead of "brain_area". Then create a sns.lineplot for every unit by setting estimator=None and units="unit_id" (Hint: adjust the names of the .columns).

Solution
psth = df.groupby(["unit_id"]).spike_time.value_counts(bins=bins).reset_index()
psth.spike_time = psth.spike_time.array.mid
psth.columns = ["unit_id", "bin_time", "spike_count"]
sns.lineplot(
    psth, x="bin_time", y="spike_count", errorbar=None, estimator=None, units="unit_id"
)

Exercise: Re-calculate the psth as shown in @exm-bin but group the data by "unit_id" AND "brain_area". Then create a sns.lineplot of the spike "count" across "spike_time" and add hue="brain_area".

Solution
psth = (
    df.groupby(["unit_id", "brain_area"])
    .spike_time.value_counts(bins=bins)
    .reset_index()
)
psth.spike_time = psth.spike_time.array.mid
psth.columns = ["unit_id", "brain_area", "bin_time", "spike_count"]
sns.lineplot(psth, x="bin_time", y="spike_count", errorbar=None, hue="brain_area")

Section 3: Comparing Firing Rates to a Baseline

In the previous plot, we saw that there is a large variability in the baseline firing rate between brain areas and units. This makes visualizations and statistical comparisons difficult. A common way of dealing with this is to apply a baseline correction. For this, we select a time interval and subtract the average spike count within this interval from the rest of the signal. The resulting PSTH represents the change in spiking relative to the baseline. In this section, we will learn how to select baseline intervals, compute the baseline spiking rate and correct the PSTHs.

Code Description
mask = df.col1 < 1 Create a boolean mask that is True where the column "col1" is smaller than 1
mask = (df.col1 < 1) & (df.col1 > 0) Create a boolean mask that is True where the column "col1" is smaller than 1 and larger than 0
df[mask] Get all rows of df where the mask is True
df["newcol"] = values Create a new column in df called "newcol" and assign it the values
del df["col1"] Delete "col1" from df
df1.merge(df2, on="col1") Merge the data frames df1 and df2 based on the colum "col1"
grouping = df.groupby(["col1"]).col2.mean() Group df by "col1" and get the mean value of "col2" for each grouping
sns.histplot(grouping) Plot the grouping as a histogram
sns.lineplot(psth, x="col1", y="col2", hue="col3") Plot "col2" against "col1" and add a hue for "col3"

Exercises

Example: Compute the baseline spike count for every "unit_id" in the time interval between tmin=-0.1 and tmax=0 seconds.

tmin = -0.1
tmax = 0
mask = (psth.bin_time >= tmin) & (psth.bin_time <= tmax)
baseline = psth[mask].groupby(["unit_id"]).spike_count.mean()
baseline
unit_id
951015628    0.2
951015643    0.0
951015704    0.0
951015717    1.6
951015731    0.0
            ... 
951039462    2.7
951039560    0.4
951039854    0.2
951039863    0.2
951039870    8.1
Name: spike_count, Length: 386, dtype: float64

Exercise: Create a sns.histplot to visualize the distribution of the baseline spike count.

Solution
sns.histplot(baseline)

Example: .merge() the baseline into the psth data frame based on the column "unit_id". Rename it to avoid naming conflicts.

baseline.name = "baseline"
psth = psth.merge(baseline, on="unit_id")
psth

unit_id brain_area bin_time spike_count baseline
0 951015628 AM 0.545 4 0.2
1 951015628 AM 0.035 2 0.2
2 951015628 AM 0.555 2 0.2
3 951015628 AM 0.565 2 0.2
4 951015628 AM 0.595 2 0.2
... ... ... ... ... ...
38209 951039870 RL 0.385 2 8.1
38210 951039870 RL 0.415 1 8.1
38211 951039870 RL 0.355 0 8.1
38212 951039870 RL 0.375 0 8.1
38213 951039870 RL 0.405 0 8.1

38214 rows × 5 columns

Exercise: Subtract the "baseline" from the "spike_count" and write the result to a new column called "baselined_spike_count".

Solution
psth["baselined_spike_count"] = psth.spike_count - psth.baseline

Exercise: Create an sns.lineplot of the "baselined_spike_count" across "time" (Hint: if this takes too long set errorbar=None to speed up computation).

Solution
sns.lineplot(
    psth, x="bin_time", y="baselined_spike_count", hue="brain_area", errorbar=None
)

Exercise: Re-compute the baseline as shown in @exm-baseline but use the time interval from tmin=0 to tmax=0.1. Then, visualize the baseline with a sns.histplot.

Solution
tmin = 0
tmax = 0.1
mask = (psth.bin_time >= tmin) & (psth.bin_time <= tmax)
baseline = psth[mask].groupby(["unit_id"]).spike_count.mean()
sns.histplot(baseline)

Exercise: Delete the existing "baseline" column runnign the code below, then .merge() the new baseline with psth as shown in @emx-merge. Subtract the .baseline from the .spike_count and plot the resulting "baselined_spike_count". Was this baseline interval a good choice?

del psth["baseline"]
Solution
baseline.name = "baseline"
psth = psth.merge(baseline, on="unit_id")
psth["baselined_spike_count"] = psth.spike_count - psth.baseline
sns.lineplot(
    psth, x="bin_time", y="baselined_spike_count", hue="brain_area", errorbar=None
)

Exercise: Delete the existing "baseline" column , then re-compute it using the time interval from tmin=-0.2 to tmax=0, .merge() it with the psth data frame, then compute and plot the `“baselined_spike_count”

Solution
del psth["baseline"]
tmin = -0.2
tmax = 0
mask = (psth.bin_time >= tmin) & (psth.bin_time <= tmax)
baseline = psth[mask].groupby(["unit_id"]).spike_count.mean()
baseline.name = "baseline"
psth = psth.merge(baseline, on="unit_id")
psth["baselined_spike_count"] = psth.spike_count - psth.baseline
sns.lineplot(
    psth, x="bin_time", y="baselined_spike_count", hue="brain_area", errorbar=None
)

Exercise: Plot the "baselined_spike_counts" cross "bin_time" with a hue for "brain_area" but plot every "unit_id" as individual units.

Solution
sns.lineplot(
    psth, x="bin_time", y="baselined_spike_count", hue="brain_area", errorbar=None, units="unit_id", estimator=None
)

Section 4: Identifying Responsive Units

The last plot clearly showed that not all units respond to the stimulus by spiking. Some do not respond at all while others decrease their firing rate below the baseline. This can be problematic if we want to average the PSTHs to estimate the response of larger populations of neurons. To avoid distorting our average, we have to identify which units are responsive. For this purpose, we’ll normalize the PSTHs by applying the formular

psthnormalized=psthbaselinebaseline+ϵ psth_{normalized} = \frac{psth-baseline}{baseline+\epsilon}

The normalized PSTH describes the firing of a neuron relative to the baseline. The regularization coefficient ϵ\epsilon is a small number that is added to the baseline to avoid errors for units with a baseline rate close to 0. Based on the normalized PSTH, we can define a threshold (e.g. 2 times the baseline rate) and categorize each unit above this threshold as responsive. In this section, you’ll explore normalizing the PSTHs while applying different values of ϵ\epsilon and selection thresholds.

Code Description
grouping = df.groupby("col1").col2.max() Group df by "col1" and get the .max() of "col2" for each grouping
mask = grouping > x Create a mask that is True where grouping is greater than x
grouping.reset_index(name="new") Reset the index of grouping and store the old index in a new column under the name="new"
df1.merge(df2, on=["col1", "col2"]) Merge df2 to df1 based on the columns "col1" and "col2"
sns.histplot(grouping) Plot a histogram of the values in grouping
sns.barplot(df, x="col1", hue="col2") Create a bar plot for the values in "col1" for each value in "col2"
plt.axvline(x, ymin, ymax) Add a vertical line at x that goes from ymin to ymax

Exercise: Compute the spike count relative to the baseline by subtracting the "baseline" from the "spike_count" and dividing the result by the "baseline".

Solution
psth["relative_spiking"] = (psth.spike_count - psth.baseline) / psth.baseline
psth

unit_id brain_area bin_time spike_count baselined_spike_count baseline relative_spiking
0 951015628 AM 0.545 4 3.9 0.1 39.000000
1 951015628 AM 0.035 2 1.9 0.1 19.000000
2 951015628 AM 0.555 2 1.9 0.1 19.000000
3 951015628 AM 0.565 2 1.9 0.1 19.000000
4 951015628 AM 0.595 2 1.9 0.1 19.000000
... ... ... ... ... ... ... ...
38209 951039870 RL 0.385 2 -6.2 8.2 -0.756098
38210 951039870 RL 0.415 1 -7.2 8.2 -0.878049
38211 951039870 RL 0.355 0 -8.2 8.2 -1.000000
38212 951039870 RL 0.375 0 -8.2 8.2 -1.000000
38213 951039870 RL 0.405 0 -8.2 8.2 -1.000000

38214 rows × 7 columns

Exercise: Create a sns.lineplot with estimator=None and units="unit_id" to plot the "relative_spiking" for every unit.

Solution
sns.lineplot(
    psth,
    x="bin_time",
    y="relative_spiking",
    hue="brain_area",
    units="unit_id",
    estimator=None,
)

Exercise: Re-compute the "relative_spiking" but add the regularization parameter epsilon to the baseline before dividing to avoid distorting the responses of units with a very low baseline. Then, plot the "relative_spiking" across "bin_time" for each "unit_id".

epsilon = 0.2
Solution
psth["relative_spiking"] = (psth.spike_count - psth.baseline) / (
    psth.baseline + epsilon
)
sns.lineplot(
    psth,
    x="bin_time",
    y="relative_spiking",
    hue="brain_area",
    units="unit_id",
    estimator=None,
)

Exercise: Increase the value of epsilion, then re-compute the the values of the "relative_spiking" and plot them to see the effect of the regularization parameter.

Solution
epsilon = 1
psth["relative_spiking"] = (psth.spike_count - psth.baseline) / (
    psth.baseline + epsilon
)
psth
sns.lineplot(
    psth,
    x="bin_time",
    y="relative_spiking",
    hue="brain_area",
    units="unit_id",
    estimator=None,
)

Exercise: Group the PSTHs by the "unit_id" and compute the .max() of the "relative_spiking" for each unit. Plot them in a histogram and draw a plt.axvline at the threshold defined below.

threshold = 1.5
Solution
sns.histplot(psth.groupby("unit_id").relative_spiking.max())
plt.axvline(x=threshold, ymin=0, ymax=1, color="red")

Example: Group the psth by "unit_id" and "brain_area" and compare the .max() of the "relative_spiking" to the threshold to determine if a unit "is_responsive". Then, reset the index of is_responsive and name the new column "is_responsive"

is_responsive = psth.groupby(["unit_id", "brain_area"]).relative_spiking.max() > threshold
is_responsive = is_responsive.reset_index(name="is_responsive")
is_responsive

unit_id brain_area is_responsive
0 951015628 AM True
1 951015643 AM True
2 951015704 AM False
3 951015717 AM True
4 951015731 AM False
... ... ... ...
381 951039462 RL False
382 951039560 RL True
383 951039854 RL False
384 951039863 RL True
385 951039870 RL True

386 rows × 3 columns

Exercise: Create a sns.barplot with x="is_responsive" and hue="brain_area" to visualize the fraction of units in each area that responds to the stimulus according to the threshold criterion.

Solution
sns.barplot(is_responsive, x="is_responsive", hue="brain_area")

Exercise: Try a different threshold value, and plot a sns.histplot of the unit’s maximum "relative_firing" together with the threshold to check how many units would be rejected by the threshold. Then re-compute the responsive units as shown in @exm-resp and plot the fraction of responsive units per area.

Solution
threshold = 2
sns.histplot(psth.groupby("unit_id").relative_spiking.max())
plt.axvline(x=threshold, ymin=0, ymax=1, color="red")
plt.figure()
is_responsive = (
    psth.groupby(["unit_id", "brain_area"]).relative_spiking.max() > threshold
)
is_responsive = is_responsive.reset_index(name="is_responsive")
sns.barplot(is_responsive, x="is_responsive", hue="brain_area")

Exercise: Merge the is_responsive data frame into the psth data frame with on=["unit_id", "brain_area"].

Solution
psth = psth.merge(is_responsive, on=["unit_id", "brain_area"])

Exercise: Select only the units that are responsive (psth[psth.is_responsive]) and plot their "baselined_spike_count" across "bin_time" for each brain area.

Solution
sns.lineplot(
    psth[psth.is_responsive],
    x="bin_time",
    y="baselined_spike_count",
    hue="brain_area",
    errorbar=None,
)

Exercise: Plot the PSTH of the responsive units for each individual "unit_id".

Solution
sns.lineplot(
    psth[psth.is_responsive],
    x="bin_time",
    y="baselined_spike_count",
    hue="brain_area",
    estimator=None,
    units="unit_id",
    errorbar=None,
)