Intro to Python and Numpy

Courses

From Adding Numbers to Analyzing Real Data in Python

Authors

Dr. Nicholas Del Grosso | Dr. Sangeetha Nandakumar | Dr. Ole Bialas | Dr. Atle E. Rimehaug

Download Materials

In this notebook, you will learn how to represent different kinds of data in Python. You will get a first look at creating arrays in Numpy and also analyze some real neuroscience data. Finally, you are going to explore the differences in performance between Numpy and built-in Python functions.

Setup

Import Libraries

import owncloud
from pathlib import Path

Download Data

Path('data').mkdir(exist_ok=True, parents=True)

owncloud.Client.from_public_link('https://uni-bonn.sciebo.de/s/3bRwjQ3p7S3f7Wi').get_file('/', 'spikes.npy')

True

Section 1: Storing Data in Variables

In the first section, you are going to learn how to represent different kinds of data and store them in variables. You will encounter four basic data types: integers, floating-point numbers, Boolean values and text strings. You are also going to use lists which are collections of data. Data can be assigned to a variable using the = operator which takes the value on the right and assigns it to the variable on the left. In this sense, a variable is simply a container that we can use to store and access data. The data type of a variable can be determined with the type() function. We can also convert variables from one type to another - for example, the int() function will try to convert a variable to an integer. Finally, Python provides operators for the arithmetic operations like addition +, subtraction -, multiplication * and division /. Let’s test how this works!

Code	Description
`x = 3.14`	Assign the floating-point number `3.14` to the variable `x`
`x = True`	Assign the boolean value `True` to the variable `x`
`x = "hello"`	Assign the string `"hello"` to the variable `x`
`x = [1,2,3]`	Assign the list of integers `[1,2,3]` to the variable `x`
`type(x)`	Get the data type of variable `x`
`int(x)`	Convert the variable `x` to an integer, if possible
`+`, `-`, `*`, `/`	Add, subtract, multiply, divide values

Exercises

Example: Assign the integer value 1 to a variable called one and print its type().

one = 1
type(one)

int

Exercise: Subtract 0.5 from the variable one.

Solution

one - 0.5

0.5

Exercise: Assign the floating value 0.001 to a variable called small and print its type.

Solution

small = 0.001
type(small)

float

Exercise: Assign the Boolean value False to a variable called this_is_false and convert it to an integer.

Solution

this_is_false = False
int(this_is_false)

Exercise: Assign the Boolean value True to a variable called this_is_true and convert it to an integer.

Solution

this_is_true = True
int(this_is_true)

Exercise: Assign the string value "goodbye" to a variable called goodbye and print its type.

Solution

goodbye = 'goodbye'
type(goodbye)

str

Exercise: Add the string "hello" to the variable goodbye.

Solution

goodbye = goodbye + 'hello'
goodbye

'goodbyehello'

Exercise: Create a list with the numbers 1 through 6 to a variable called dice and print its type.

Solution

dice = [1,2,3,4,5,6]
dice

[1, 2, 3, 4, 5, 6]

Exercise: Multiply the list dice by 2. What happens?

Solution

dice*2

[1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]

Exercise: Try to add 1 to the list. What error message do you observe?

Solution

Section 2: Analyzing Neural Spiking Data with Numpy

Numpy offers many useful functions for data analysis - let’s test them on real neuroscience data! In this section, you will load and analyze the spiking of a neuron in the primary visual cortex of a mouse. The spikes are represented as a sorted list of time points where spikes were observed. For example, [0.05, 0.24, 1.5] indicates that a spike was observed 50, 240 and 1500 milliseconds after the start of the recording. Using the functions below, we can answer some interesting questions about the firing behavior of a given neuron.

Code	Description
`import numpy as np`	Import the module `numpy` under the alias `np`
`x = np.load("data.npy")`	Load the file `"data.npy"` into an array and assign it to the variable `x`
`np.size(x)` or `x.size`	Get the total number of element stored in the array `x`
`np.min(x)` or `x.min()`	Get the minimum value of the array `x`
`np.max(x)` or `x.max()`	Get the maximum value of the array `x`
`np.sum(x)` or `x.sum()`	Compute the sum of all values in the array `x`
`np.mean(x)` or `x.mean()`	Compute the mean of all values in the array `x`
`np.std(x)` or `x.std()`	Compute the standard deviation of all values in the array `x`
`np.diff(x)`	Compute the difference between consecutive elements in the array `x`

Exercise: Import the Numpy module under the alias np.

Solution

import numpy as np

Exercise: Load the file "spikes.npy" into a Numpy array.

Solution

spikes = np.load('spikes.npy')

Exercise: What is the total number of spikes in this recording?

Solution

np.size(spikes)

Exercise: What is the duration of the recording (assuming the recording stopped after the last spike was recorded)?

Solution

spikes.max()

298.4843451836275

Exercise: Compute the neuron’s average firing rate (the total number of spikes divided by the duration of the recording).

Solution

np.size(spikes)/spikes.max()

2.415537067970653

Exercise: Compute the inter-spike intervals (i.e. the time differences between subsequent spikes).

Solution

np.diff(spikes)

array([0.05456682, 0.15250043, 0.26966743, 0.03310009, 0.27270077, ...,
       0.05906683, 0.27290077, 0.2008339 , 0.30426752, 0.17040048])

Exercise: What is the average inter-spike interval for this neuron?

Solution

isi = np.diff(spikes)
isi.mean()

0.4144865856420776

Exercise: What is the standard deviation of inter-spike intervals for this neuron?

Solution

np.diff(spikes).std()

0.47663480650055273

Exercise: What is the shortest time between two spikes?

Solution

np.diff(spikes).min()

0.0005666682648097776

Section 3: Creating Arrays in Numpy

Numpy also offers many functions for generating arrays. The simplest way to create an array is to convert a list but there are other functions for specific purposes like generating arrays of random numbers or numbers within a certain range. Like variables, Numpy arrays can have different data types. The type of an array is stored in the .dtype attribute. In this section, you will create and explore different kinds of arrays.

Code	Description
`x = np.array([2,5,3])`	Create an array from the list `[2,5,3]` and assign it to the variable `x`
`x = np.random.randn(100)`	Create an array with 100 normally-distributed random numbers and assign it to the variable `x`
`x = np.arange(2,7)`	Create an array with all integers between 2 and (not including) 7 and assign it to the variable `x`
`x = np.arange(2,7,0.3)`	Create an array with evenly spaced values between 2 and 7 with a step size of 0.3 and assign it to the variable `x`
`x = np.linspace(2,3,10)`	Create an array with 10 evenly spaced values between 2 and 3 and assign it to the variable `x`
`x.dtype`	Get the data type of the numpy array `x`

Exercises

Example: Create an array from the list [1, 2, 3], assign it to the variable a and display its type.

a = np.array([1,2,3])
a

array([1, 2, 3])

Exercise: Multiply the array a by 2 and add 1 to it

Solution

a + 1

array([2, 3, 4])

Exercise: Create an array from the list [0.1, 0.2, 0.3], assign it to the variable b and display its type.

Solution

b = np.array([0.1,0.2,0.3])
type(b)

numpy.ndarray

Exercise: Create an array from the list [1, True, "a"], assign it to the variable c and display its type.

Solution

c = np.array([1, True, "text"])
type(c[0])

numpy.str_

Exercise: Try to add 1 to the variable c. What error message do you observe?

Solution

Exercise: Make an array containing the integers from 1 to 15.

Solution

array_of_numbers = np.arange(1,15,1)
array_of_numbers

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

Exercise: Create an array that contains all even numbers up to and including 100.

Solution

np.arange(0,100+2,2)

array([  0,   2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24,
        26,  28,  30,  32,  34,  36,  38,  40,  42,  44,  46,  48,  50,
        52,  54,  56,  58,  60,  62,  64,  66,  68,  70,  72,  74,  76,
        78,  80,  82,  84,  86,  88,  90,  92,  94,  96,  98, 100])

Exercise: Make an array of only 6 evenly-spaced numbers between 1 and 10.

Solution

np.linspace(1,15,15)

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,
       14., 15.])

Exercise: Create an array of 10 normally-distributed random numbers and compute its mean and standard deviation.

Solution

x = np.random.randn(10)
x, x.mean(), x.std()

(array([ 0.27299568, -0.64684942, -1.30570342, -1.95991131, -2.08556602,
        -0.37904938, -0.45618923, -0.62556584, -0.64754801,  0.39586097]),
 -0.7437525972608371,
 0.7858819849146609)

Exercise: Now, create arrays with 100 and 1000 normally-distributed random numbers and compute their means and standard deviations.

Solution

x = np.random.randn(100)
x.mean(), x.std()

(-0.06324362027843711, 0.9665345203945966)

x = np.random.randn(1000)
x.mean(), x.std()

(-0.017241949802111974, 0.9806111510322726)

Section 4: Quantifying Numpy’s Performance

One of the key advantages of Numpy is that it is a lot faster than basic Python. How much faster? Let’s find out! The code below creates an array of ten thousand random numbers as well as a list with exactly the same data. We can use these to test how Numpy compares to basic Python with respect to performance.

Exercises

my_array = np.random.randn(10000)
my_list = list(my_array)

sum(my_list)

4.249007775753211

np.sum(my_array)

4.249007775753224

To time our code, we are going to use the %%timeit command. Adding %%timeit at the top of a cell makes it so that running that cell displays the time it took to run the code. By default, the code is executed ten times in a loop and the result is averaged over all loops. This procedure is repeated seven times so that we get one average duration for each run. The reported numbers are the average duration across the seven runs and its standard deviation.

Example: Estimate the time for computing the sum of my_list using Python’s built-in sum() method with %%timeit.

%%timeit
sum(my_list)

782 μs ± 9.73 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Exercise: Use %%timeit to estimate how long it takes to compute np.sum() of my_array.

Solution

%%timeit

np.sum(my_array)

9.4 μs ± 126 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Exercise: Use %%timeit to estimate how long it takes for Python’s built-in max() function to find the maximum of my_list.

Solution

%%timeit

max(my_list)

297 μs ± 3.42 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Exercise: Use %%timeit to estimate how long it takes for the np.max() function to find the maximum of my_array.

Solution

%%timeit

np.max(my_array)

9.69 μs ± 259 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Exercise: The code below estimates the time it takes to multiply every element of my_list by 2. Use %%timeit to test how long it takes to multiply my_array by 2 (Hint: use the * operator).

Solution

%%timeit
[item*2 for item in my_list]

1.18 ms ± 27.4 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%%timeit

my_array*2

7.07 μs ± 55.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Exercise: What is faster: multiplying an array by 2 or adding the array to itself?

Solution

%%timeit

my_array+my_array

12 μs ± 422 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)