Statistics with Pingouin

Courses

From Adding Numbers to Analyzing Real Data in Python

Authors

Dr. Nicholas Del Grosso | Dr. Sangeetha Nandakumar | Dr. Ole Bialas | Dr. Atle E. Rimehaug

Download Materials

Pingouin is a statistics package in Python that wraps pandas and scipy-stats, creating more-complete statistics reports and presenting the results in a readable way.

Setup

Import Libraries

import numpy as np
import pandas as pd
import pingouin as pg
import seaborn as sns
import matplotlib.pyplot as plt
from pathlib import Path
import xarray as xr
import owncloud

Download the dataset and load it into a Pandas DataFrame

Path('data').mkdir(exist_ok=True, parents=True)

owncloud.Client.from_public_link('https://uni-bonn.sciebo.de/s/3Uf2gScrvuTPQhB').get_file('/', f'data/steinmetz_2017-01-08_Muller.nc')

True

Section 1: T-Tests

T-tests compare the means of two samples of data generated from a normally-distributed population and compute the probability that they have the same mean.

Code	Statistical Test	Description
`pg.ttest(x, 0)`	One-Sampled T-Test	“Is the mean of my data unlikely to be zero?”
`pg.ttest(x, y)`	Independent-Samples T-Test	“Do the means of x and y indicate they are unlikely to be from the same distribution?”
`pg.ttest(x, y, paired=True)`	Paired-Samples T-test	“Comparing each value directly to its neighbor, are the mean differences of x and y’s data unlikely to be zero?”

Let’s do some analysis on some fake data to get a feel for these two statistics tools.

Generate the Data: Run the code below to create the dataset df.

Exercises

randomizer = np.random.RandomState(17)  # Makes sure the pseudorandom number generators reproduce the same data for us all.
df = pd.DataFrame()
df['A'] = randomizer.normal(0, 1, size=20)
df['B'] = randomizer.normal(0.2, 1, size=20)
df['C'] = randomizer.normal(0.7, 1, size=20)
df['D'] = df.A * 0.3 + randomizer.normal(0, 0.3, size=20)
df.plot.box()

Analyze the Data with T-Tests in Pingouin

Example: A vs 0, One-Sampled T-Test: Is the mean of the normally-distributed population that the dataset A is generated from unlikely to be zero?

pg.ttest(df['A'], 0)

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	0.378652	19	two-sided	0.709144	[-0.45, 0.65]	0.084669	0.248	0.064966

Exercise: B vs 1, One-Sampled T-Test: Is the mean of the normally-distributed population that the dataset B is generated from unlikely to be one?

Solution

pg.ttest(df['B'], 1)

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	-4.449603	19	two-sided	0.000275	[-0.26, 0.55]	0.994961	113.454	0.98794

A vs B, Independent Samples T-Test: Is the mean of the normally-distributed population that the dataset A is generated from unlikely to be the same as the mean of the normally-distributed population that the dataset B is generated from?

pg.ttest(df['A'], df['B'])

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	-0.137189	38	two-sided	0.891606	[-0.7, 0.61]	0.043383	0.311	0.052052

Exercise: A vs C, Independent Samples T-Test: Is the mean of the normally-distributed population that the dataset A is generated from unlikely to be the same as the mean of the normally-distributed population that the dataset C is generated from?

Solution

pg.ttest(df['A'], df['C'])

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	-1.978874	38	two-sided	0.055111	[-1.42, 0.02]	0.625775	1.409	0.487608

Exercise: A vs C, Paired Samples T-Test: Is the mean of the differences between each pair of samples generated from the two normally-distributed populations A and C unlikely to be 0?

Solution

pg.ttest(df['A'], df['C'], paired=True)

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	-1.728432	19	two-sided	0.100128	[-1.55, 0.15]	0.625775	0.818	0.756449

Exercise: A vs D, Paired Samples T-Test: Is the mean of the differences between each pair of samples generated from the two normally-distributed populations A and D unlikely to be 0?

Solution

pg.ttest(df['A'], df['D'], paired=True)

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	0.082615	19	two-sided	0.935022	[-0.45, 0.49]	0.02152	0.233	0.050959

Section 2: T-Tests on Real-World Data: Pupil Position Analysis

In the experiments reported by Steinmetz et al, 2019 in Nature, mice performed a discrimination task where they moved the position of a stimulus using a steering wheel. During the experiment, a camera recorded the pupil position of the subject in the x and y directions.

Code	Description
`np.logical_and(condition1, condition2)`	Element-wise logical AND of two boolean arrays
`sns.kdeplot(data, label='name')`	Create a kernel density estimate plot
`plt.axvline(x, color, linestyle)`	Add a vertical line to the plot
`plt.legend()`	Display the legend on the plot
`pg.ttest(x, y, paired=True/False)`	Perform t-test (paired or independent)

Load and Filter the Data

Run the code below, which loads the file, extracts the variables, filters the data, and transforms it into a dataframe.

Exercises

dset = xr.load_dataset('data/steinmetz_2017-01-08_Muller.nc')
dd = dset[['pupil_x', 'pupil_y', 'contrast_left', 'contrast_right', 'response_type']]
dd = dd.where(
    (dset.active_trials==True) 
    & (dset.response_type != 0)
    & (dset.time >= 0.5)
    & (dset.time <= 1.3)
    , drop=True)
dd['contrast_diff'] = dd['contrast_right'] - dd['contrast_left']
trials = dd.median(dim='time')
rdf = trials.to_dataframe()
rdf.head()

	pupil_x	pupil_y	contrast_left	contrast_right	response_type	contrast_diff
trial
1	-1.019465	0.754972	50.0	0.0	1.0	-50.0
2	-0.184417	0.509199	0.0	50.0	-1.0	50.0
3	0.799240	-0.486496	100.0	25.0	1.0	-75.0
4	-0.294746	-0.218956	0.0	100.0	-1.0	100.0
5	-0.367058	-0.259306	50.0	50.0	1.0	0.0

Extract data from the dataframe rdf, plot its distribution(s), and use a t-test to answer the questions below:

Example: For the trials where contrast_left was greater or equal to 50 and contrast_right was 0, was the average pupil_x position for the trial zero?

left = rdf[(rdf.contrast_left >= 50) & (rdf.contrast_right == 0)]
sns.kdeplot(left.pupil_x, label='left')
plt.axvline(0, color='black', linestyle='dotted')
plt.legend()
pg.ttest(left.pupil_x, 0)

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	3.447425	24	two-sided	0.002098	[0.22, 0.88]	0.689485	18.185	0.910939

Exercise: Where the mice looking at the stimulus that was presented? In trials where there was only one stimulus present (i.e. contrast_left or contrast_right was zero), and the stimulus had at least a contrast level of 50 (quite visible), was there a difference in the average pupil_x position?

Solution

mask_left = np.logical_and(rdf['contrast_left'] >= 50, rdf['contrast_right'] == 0)
df_left = rdf[mask_left]
sns.kdeplot(df_left['pupil_x'], label = 'left')

mask_right = np.logical_and(rdf['contrast_right'] >= 50, rdf['contrast_left'] == 0)
df_right = rdf[mask_right]
sns.kdeplot(df_right['pupil_x'], label = 'right')

plt.legend()

pg.ttest(df_left['pupil_x'], df_right['pupil_x'], paired = True)

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	2.900766	41.883443	two-sided	0.005908	[0.19, 1.07]	0.855661	7.42	0.784252

Exercise: As a sanity check, was there a difference in vertical eye position between the left-stimulus trials and the right-stimulus trials? In trials where there was only one stimulus present (i.e. contrast_left or contrast_right was zero), and the stimulus had at least a contrast level of 50 (quite visible), was there a difference in the average pupil_y position?

Solution

mask_left = np.logical_and(rdf['contrast_left'] >= 50, rdf['contrast_right'] == 0)
df_left = rdf[mask_left]
sns.kdeplot(df_left['pupil_y'], label = 'left')

mask_right = np.logical_and(rdf['contrast_right'] >= 50, rdf['contrast_left'] == 0)
df_right = rdf[mask_right]
sns.kdeplot(df_right['pupil_y'], label = 'right')
plt.legend()

pg.ttest(df_left['pupil_y'], df_right['pupil_y'], paired = False)

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	0.81158	38.482526	two-sided	0.422024	[-0.2, 0.47]	0.247714	0.39	0.125031

Exercise: Did mice look in the direction they turned the wheel? Was there a difference in average pupil_x position between left-turning trials (i.e. response_type is -1) and right-turning trials (i.e. response_type is 1)?

Solution

mask_left = rdf['response_type'] == -1
df_left = rdf[mask_left]
sns.kdeplot(df_left['pupil_x'], label = 'left')

mask_right = rdf['response_type'] == 1
df_right = rdf[mask_right]
sns.kdeplot(df_right['pupil_x'], label = 'right')

plt.legend()

pg.ttest(df_left['pupil_x'], df_right['pupil_x'], paired = False)

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	-2.472363	78.49085	two-sided	0.015585	[-0.59, -0.06]	0.50892	3.159	0.610308

Exercise: As a sanity check, did vertical eye position matter? Was there a difference in average pupil_y position between left-turning trials (i.e. response_type is -1) and right-turning trials (i.e. response_type is 1)?

Solution

mask_left = rdf['response_type'] == -1
df_left = rdf[mask_left]
sns.kdeplot(df_left['pupil_y'], label = 'left')
mask_right = rdf['response_type'] == 1
df_right = rdf[mask_right]
sns.kdeplot(df_right['pupil_y'], label = 'right')
plt.legend()
pg.ttest(df_left['pupil_y'], df_right['pupil_y'], paired = False)

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	1.218473	61.22414	two-sided	0.227719	[-0.1, 0.42]	0.275322	0.443	0.22797

Section 3: Working with Long DataFrames

If data is organized as a long dataframe, it is also possible to have pingouin simply do all the comparisons between the groups. Not only is this very convenient; it also makes it possible for pingouin to do richer analyses.

Code	Description
`pg.anova(data=dfl, dv='value', between=['variable'], detailed=True)`	Analysis of Variance (ANOVA).
`pg.pairwise_tests(data=dfl, dv='value', between='variable', padjust='fdr_bh', parametric=True/False)`	Pairwise T-tests.

Run the cell below to melt the DataFrame df into a long DataFrame dfl to be used in the following exercises.

Exercises

dfl = df.melt(ignore_index=False).reset_index()
dfl.sample(5)

	index	variable	value
63	3	D	-0.544443
36	16	B	0.767783
61	1	D	-0.520330
2	2	A	0.623901
10	10	A	2.171257

Example: Before doing pairwise tests, it’s often helpful to run an ANOVA analysis, which can tell you quickly whether a pairwise difference exists within a collection of potential pairs. Below, run an ANOVA test comparing the dv ‘value’ between values of ‘variable’ to see if a significant difference is expected.

pg.anova(dfl, dv='value', between=['variable'], detailed=True, )

	Source	SS	DF	MS	F	p-unc	np2
0	variable	7.259916	3	2.419972	2.851046	0.042862	0.101157
1	Within	64.508906	76	0.848801	NaN	NaN	NaN

Exercise: Do a t-tests for every possible pairwise comparison of “variable” (e.g. “A” vs “B”, “A” vs “C”, etc). (Tip: it works similar to the pg.anova function.)

Solution

pg.pairwise_tests(dfl, dv = 'value', between = ['variable'])

	Contrast	A	B	Paired	Parametric	T	dof	alternative	p-unc	BF10	hedges
0	variable	A	B	False	True	-0.137189	38.0	two-sided	0.891606	0.311	-0.042521
1	variable	A	C	False	True	-1.978874	38.0	two-sided	0.055111	1.409	-0.613342
2	variable	A	D	False	True	0.068052	38.0	two-sided	0.946101	0.309	0.021092
3	variable	B	C	False	True	-2.139828	38.0	two-sided	0.038851	1.806	-0.663229
4	variable	B	D	False	True	0.302159	38.0	two-sided	0.764179	0.32	0.093653
5	variable	C	D	False	True	2.845162	38.0	two-sided	0.007113	6.461	0.881844

Exercise: Repeat the tests, but this time, adjust the p-value to compensate for multiple comparisons using the ‘fdr’ method. Are the significant differences still significant? How do the corrected p-values compare to the uncorrected p-values?

Solution

pg.pairwise_tests(dfl, dv = 'value', between = ['variable'], padjust = 'fdr_bh')

	Contrast	A	B	Paired	Parametric	T	dof	alternative	p-unc	p-corr	p-adjust	BF10	hedges
0	variable	A	B	False	True	-0.137189	38.0	two-sided	0.891606	0.946101	fdr_bh	0.311	-0.042521
1	variable	A	C	False	True	-1.978874	38.0	two-sided	0.055111	0.110222	fdr_bh	1.409	-0.613342
2	variable	A	D	False	True	0.068052	38.0	two-sided	0.946101	0.946101	fdr_bh	0.309	0.021092
3	variable	B	C	False	True	-2.139828	38.0	two-sided	0.038851	0.110222	fdr_bh	1.806	-0.663229
4	variable	B	D	False	True	0.302159	38.0	two-sided	0.764179	0.946101	fdr_bh	0.32	0.093653
5	variable	C	D	False	True	2.845162	38.0	two-sided	0.007113	0.042681	fdr_bh	6.461	0.881844

Exercise: Do a pairwise test again on the long dataframe, comparing the value of ‘value’ between each value ‘variable’. This time, use a non-parametric comparison (not assuming a normal distribution of each variable), and still adjust for multiple comparisons. Are your results the same?

Solution

pg.pairwise_tests(dfl, dv = 'value', between = ['variable'], padjust = 'fdr_bh', parametric = False)

	Contrast	A	B	Paired	Parametric	U-val	alternative	p-unc	p-corr	p-adjust	hedges
0	variable	A	B	False	False	192.0	two-sided	0.839232	0.967635	fdr_bh	-0.042521
1	variable	A	C	False	False	136.0	two-sided	0.085855	0.171710	fdr_bh	-0.613342
2	variable	A	D	False	False	202.0	two-sided	0.967635	0.967635	fdr_bh	0.021092
3	variable	B	C	False	False	132.0	two-sided	0.067868	0.171710	fdr_bh	-0.663229
4	variable	B	D	False	False	232.0	two-sided	0.394171	0.591256	fdr_bh	0.093653
5	variable	C	D	False	False	273.0	two-sided	0.049864	0.171710	fdr_bh	0.881844

Use the rdf dataframe from the pupil measurement data. Is there a significant different in average horizontal position, when comparing all the contrast_diff levels in the study pairwise?

pg.pairwise_tests(rdf, dv = 'pupil_x', between = ['contrast_diff'], padjust = 'fdr_bh', parametric = False)

	Contrast	A	B	Paired	Parametric	U-val	alternative	p-unc	p-corr	p-adjust	hedges
0	contrast_diff	-100.0	-75.0	False	False	39.0	two-sided	0.841125	1.000000	fdr_bh	0.003482
1	contrast_diff	-100.0	-50.0	False	False	168.0	two-sided	0.054208	0.252391	fdr_bh	0.707904
2	contrast_diff	-100.0	-25.0	False	False	18.0	two-sided	0.600000	0.964286	fdr_bh	0.399767
3	contrast_diff	-100.0	0.0	False	False	147.0	two-sided	0.070109	0.252391	fdr_bh	0.717534
4	contrast_diff	-100.0	25.0	False	False	42.0	two-sided	0.157516	0.472549	fdr_bh	0.644120
5	contrast_diff	-100.0	50.0	False	False	162.0	two-sided	0.003526	0.042318	fdr_bh	1.179199
6	contrast_diff	-100.0	75.0	False	False	40.0	two-sided	0.232680	0.598319	fdr_bh	0.645830
7	contrast_diff	-100.0	100.0	False	False	112.0	two-sided	0.015099	0.135893	fdr_bh	0.998026
8	contrast_diff	-75.0	-50.0	False	False	81.0	two-sided	0.036257	0.218709	fdr_bh	0.904273
9	contrast_diff	-75.0	-25.0	False	False	8.0	two-sided	0.642857	0.964286	fdr_bh	0.759261
10	contrast_diff	-75.0	0.0	False	False	72.0	two-sided	0.036451	0.218709	fdr_bh	1.122901
11	contrast_diff	-75.0	25.0	False	False	21.0	two-sided	0.066667	0.252391	fdr_bh	1.397436
12	contrast_diff	-75.0	50.0	False	False	76.0	two-sided	0.003302	0.042318	fdr_bh	1.577027
13	contrast_diff	-75.0	75.0	False	False	19.0	two-sided	0.171429	0.474725	fdr_bh	1.106513
14	contrast_diff	-75.0	100.0	False	False	56.0	two-sided	0.002997	0.042318	fdr_bh	1.534350
15	contrast_diff	-50.0	-25.0	False	False	15.0	two-sided	0.842105	1.000000	fdr_bh	-0.266070
16	contrast_diff	-50.0	0.0	False	False	124.0	two-sided	0.909802	1.000000	fdr_bh	-0.060808
17	contrast_diff	-50.0	25.0	False	False	34.0	two-sided	1.000000	1.000000	fdr_bh	-0.005404
18	contrast_diff	-50.0	50.0	False	False	163.0	two-sided	0.084222	0.275635	fdr_bh	0.568939
19	contrast_diff	-50.0	75.0	False	False	36.0	two-sided	0.897243	1.000000	fdr_bh	0.023733
20	contrast_diff	-50.0	100.0	False	False	101.0	two-sided	0.436422	0.785560	fdr_bh	0.368721
21	contrast_diff	-25.0	0.0	False	False	18.0	two-sided	0.720588	1.000000	fdr_bh	0.275528
22	contrast_diff	-25.0	25.0	False	False	5.0	two-sided	0.800000	1.000000	fdr_bh	0.258185
23	contrast_diff	-25.0	50.0	False	False	22.0	two-sided	0.266667	0.640000	fdr_bh	0.814132
24	contrast_diff	-25.0	75.0	False	False	4.0	two-sided	1.000000	1.000000	fdr_bh	0.228942
25	contrast_diff	-25.0	100.0	False	False	14.0	two-sided	0.484848	0.831169	fdr_bh	0.653006
26	contrast_diff	0.0	25.0	False	False	31.0	two-sided	0.961300	1.000000	fdr_bh	0.065340
27	contrast_diff	0.0	50.0	False	False	149.0	two-sided	0.057631	0.252391	fdr_bh	0.713574
28	contrast_diff	0.0	75.0	False	False	33.0	two-sided	0.809598	1.000000	fdr_bh	0.096288
29	contrast_diff	0.0	100.0	False	False	93.0	two-sided	0.331685	0.663370	fdr_bh	0.510201
30	contrast_diff	25.0	50.0	False	False	38.0	two-sided	0.327451	0.663370	fdr_bh	0.596998
31	contrast_diff	25.0	75.0	False	False	8.0	two-sided	1.000000	1.000000	fdr_bh	0.030278
32	contrast_diff	25.0	100.0	False	False	24.0	two-sided	0.635365	0.964286	fdr_bh	0.416863
33	contrast_diff	50.0	75.0	False	False	18.0	two-sided	0.327451	0.663370	fdr_bh	-0.535031
34	contrast_diff	50.0	100.0	False	False	55.0	two-sided	0.395863	0.750056	fdr_bh	-0.222583
35	contrast_diff	75.0	100.0	False	False	21.0	two-sided	0.945055	1.000000	fdr_bh	0.348996

Section 4: A Gallery of Plots : 2D Histograms with `sns.jointplot()`

Let’s try out a few different plots, for comparing two different continuous variables’ distributions.

Code	Description
`sns.jointplot(df, x='colA', y='colB')`	Makes a joint histogram
`sns.jointplot(df, x='colA', y='colB', kind='kde')`	Makes a joint plot with kernel density estimate
`sns.jointplot(df, x='colA', y='colB', kind='hex')`	Makes a joint plot with hexagonal bins
`sns.jointplot(df, x='colA', y='colB', hue='colC')`	Makes a joint plot colored by a categorical variable

Exercises

Example: Make a joint plot comparing the pupil_x and pupil_y variables.

sns.jointplot(rdf, x="pupil_x", y="pupil_y");

Exercise: Make a joint plot comparing the pupil_x and pupil_y variables, using kind='kde'.

Solution

sns.jointplot(rdf, x='pupil_x', y='pupil_y', kind = 'kde')

Exercise: Make a joint plot comparing the pupil_x and pupil_y variables, using kind='kde' and setting hue to 'response_type'.

Solution

sns.jointplot(rdf, x='pupil_x', y='pupil_y', kind = 'kde', hue='response_type')

Exercise: Make a joint plot comparing the pupil_x and pupil_y variables, using kind='hex'.

Solution

sns.jointplot(rdf, x='pupil_x', y='pupil_y', kind='hex')

Section 5: Further Reading

Nice article on Pingouin here: https://towardsdatascience.com/the-new-kid-on-the-statistics-in-python-block-pingouin-6b353a1db57c

Nice summary of the different effect size metrics and when to pick which: https://www.socscistatistics.com/effectsize/default3.aspx

Section 6: Extra: Correlations and Simpson’s Paradox

Code	Description
`pg.corr(A, B)`	The correlation between variables A and B
`pg.pairwise_corr(flowers)`	Pairwise correlations
`df.plot.scatter(x='A', y='B')`	Make a scatter plot
`sns.lmplot(data=df, x='Col1', y='Col2')`	Make a scatter plot with a regression line
`sns.lmplot(data=df, x='Col1', y='Col2', hue='Col3')`	Make a scatter plot with a regression line, colored by a categorical variable

Run the cell below to download the iris dataset for this section:

Exercises

from bokeh.sampledata import iris
flowers = iris.flowers
flowers.head()

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa

Example: Is there a correlation between flowers’ sepal widths and their petal lengths?

pg.corr(flowers['sepal_width'], flowers['petal_length'])

	n	r	CI95%	p-val	BF10	power
pearson	150	-0.42844	[-0.55, -0.29]	4.513314e-08	2.762e+05	0.999847

Exercise: Use seaborn’s lmplot() function to plot a regression line showing the sepal widths and petal lengths. Does the direction of this line match the results? Looking closely at the data’s distribution, what is missing from this story?

Solution

sns.lmplot(data=flowers, x='sepal_width', y = 'petal_length')

Exercise: This time factor in the species variable into the lmplot() visualization. What kind of relationship do you see between sepal_width and petal_length now?

Solution

sns.lmplot(data=flowers, x='sepal_width', y='petal_length', hue='species')

Demo

Example: What’s happening here is that a third variable, "species", is influencing both the petal and the sepal shape; this effect is called “Simpson’s Paradox”. Unfortunately, at this time (as far as I know) pengouin doesn’t have a way to split a correlation into multiple groups; one way to do this, though, is to use a df.groupby() analysis, concatenating the results together.

analyses = []
for name, group in flowers.groupby('species'):
    analysis = pg.corr(group['sepal_width'], group['petal_length'])
    analysis['name'] = name
    analyses.append(analysis)
results = pd.concat(analyses)
results

	n	r	CI95%	p-val	BF10	power	name
pearson	50	0.177700	[-0.11, 0.43]	0.216979	0.37	0.236771	setosa
pearson	50	0.560522	[0.33, 0.73]	0.000023	1061.04	0.992245	versicolor
pearson	50	0.401045	[0.14, 0.61]	0.003898	10.094	0.836041	virginica

To summarize, accounting for the right variables is a big part of science; we should always be aware that the patterns we see in our data can be turned completely upside down when new aspects of a problem are understood and factored in!

Statistics with Pingouin

Authors

Setup

Import Libraries

Download the dataset and load it into a Pandas DataFrame

Section 1: T-Tests

Exercises

Analyze the Data with T-Tests in Pingouin

Section 2: T-Tests on Real-World Data: Pupil Position Analysis

Load and Filter the Data

Exercises

Section 3: Working with Long DataFrames

Exercises

Section 4: A Gallery of Plots : 2D Histograms with sns.jointplot()

Exercises

Section 5: Further Reading

Section 6: Extra: Correlations and Simpson’s Paradox

Exercises

Demo

Section 4: A Gallery of Plots : 2D Histograms with `sns.jointplot()`