Organizing Data into Dictionaries

Organizing Data into Dictionaries

Author
Dr. Nicholas Del Grosso

In this section, we delve into the practical application of Python data structures such as dictionaries, lists, and strings, focusing particularly on extracting and managing metadata. As neuroscience grad students, you are likely encountering experimental data that needs organized and structured management. Metadata, or data about data, often comes in various formats, and filenames can be a rich source. This section guides you in extracting and manipulating this information to make your data analysis more structured and less error-prone.

Section 1: Key-Value Mappings: Dictionaries

As neuroscience researchers, we often encounter scenarios where you need to associate specific values with unique identifiers – be it experimental conditions, subject details, or measurement parameters. This section introduces dictionaries in Python, a versatile data structure ideal for storing and retrieving data through key-value pairs:

{"name": "Emma",   "Date": "2022-07-23"}
#--Key----Value-   -Key-------Value----

You start with basic dictionary operations, such as creating, adding, and accessing elements. This hands-on experience familiarizes you with dictionary syntax and operations. The exercises are designed to reflect realistic use cases, such as storing and accessing metadata from experimental recordings.

Code Description
data = {} Makes an empty Dict
data = {‘a’: 3, ‘b’: 5} Makes a Dict with two items: “a” and “b”
data[‘a’] Accesses the value associated with key ‘a’
data[‘c’] = 7 Adds a new key-value pair ‘c’: 7 to the Dict
list(data.keys()) Retrieves a list of all keys in the Dict

The image dict describes how researcher Tom’s recording is formatted:

Exercises

image = {'height': 1920, 'width': 1080, 'format': 'RGB', 'order': 'F'}
image
{'height': 1920, 'width': 1080, 'format': 'RGB', 'order': 'F'}

Example: Write the code to print out the width of the image, by accessing the "width" key:

image['width']
1080

Exercise: What is the height of the image?

Solution
image['height']
1920

Exercise: How are the pixel data in the image formatted?

Solution
image['format']
'RGB'

What does the error message say, if you use the same syntax to find out which key has the value 1080 ? What does this tell you about how key-value maps like Dictionaries are designed for?

Solution
image[1080]

Exercise: Make a dictionary: Reorganize the code below: tell Python that the three variables below all belong together by putting them into a dictionary called session.

Solution
subject = "Josie"
date = "2023-07-23"
group = "control"

session = {'subject': subject, 'date': date, 'group': group}
session
{'subject': 'Josie', 'date': '2023-07-23', 'group': 'control'}

Exercise: Check that the dictionary is constructed properly by getting the subject from it. It should show “Josie”

Solution
session['subject']
'Josie'
default_session = {'subject': 'Ken', 'experimenter': 'Barbie', 'time': '09:00', 'notes': 'Nothing new.'}
today_vars = {'subject':  'Allan', 'notes': 'Did a good job.'}
session1 = default_session | today_vars
session1
{'subject': 'Allan',
 'experimenter': 'Barbie',
 'time': '09:00',
 'notes': 'Did a good job.'}

Section 2: Analysing Data stored in Dicts

The challenge with analyzing dict data is that dicts are not “sequences”, and neither are dict_keys() or dict_values(), so before putting them into a statistics function we should first turn dict_values() into a list using the list() function. For example:

>>> data = {'x': 1, 'y': 2}

>>> data.values()
dict_values([1, 2])

>>> list(data.values())
[1, 2]

>>> np.mean(list(data.values()))
1.5

Useful Functions for the below Exercises:

Function Example Description
len() len(the_dict) The total number of items
np.mean() np.mean(list(the_dict.values()) The mean of the dict’s values
np.min() np.min(list(the_dict.values())) The minimum of the dict’s values

Let’s get some practice querying dicts and calculating some statistics on dicts using Numpy.

Exercises

import numpy as np

Exercise: Using the following dict, calculate what was the average hours of sleep that our friends got last night:

hours_of_sleep = {'Jason': 5, 'Kimberly': 9, 'Billy': 7, 'Trini': 6, 'Zack': 8}
Solution
np.mean(list(hours_of_sleep.values()))
7.0

Exercise: How many total people in the following dataset were in our sleep study?

hours_of_sleep = {'Jason': 5, 'Kimberly': 9, 'Billy': 7, 'Trini': 6, 'Zack': 8}
Solution
len(hours_of_sleep)
5

Exercise: What was the average amount of sleep on day 2 in the following dataset?

hours_of_sleep = {
    'Day1': [5, 7, 3, 3, 4, 6, 8, 9],
    'Day2': [5, 7, 8, 5, 6, 7, 8, 4],
}
Solution
np.mean(hours_of_sleep['Day2'])
6.25

Use the following dataset to answer the questions below

Tip: you can index multiple times (e.g. data['Monday']['Morning'] or data['Monday'].keys())

hours_of_sleep = {
    'Day1': {'Jason': 5, 'Kimberly': 9, 'Billy': 7, 'Trini': 6, 'Zack': 8},
    'Day2': {'Billy': 10, 'Kimberly': 7, 'Trini': 7, 'Jason': 4},
    'Day3': {'Trini': 8, 'Zack': 6, 'Jason': 9, 'Billy': 9},
}

Example: How many hours of sleep did Trini get on Day 2?

hours_of_sleep['Day2']['Trini']
7

Exercise: How many hours of sleep did Billy get on Day 1?

Solution
hours_of_sleep['Day1']['Billy']
7

Exercise: How much sleep did Zack get on Day 3?

Solution
hours_of_sleep['Day3']['Zack']
6

Exercise: How many people were in the study on Day 1?

Solution
len(hours_of_sleep['Day1'])
5

Exercise: How many people were still in the study on Day 3?

Solution
len(hours_of_sleep['Day3'])
4

Exercise: Was the average amount of sleep higher on day one or day three?

Solution
day1 = np.mean(list(hours_of_sleep['Day1'].values()))
day3 = np.mean(list(hours_of_sleep['Day3'].values()))
day1, day3
(7.0, 8.0)