Data Formats
Once you have a data model in memory, you need to store it on disk in a way that is efficient, self-describing, and shareable. This unit covers widely used file formats for data sharing in modern neuroscience: HDF5 and NWB. The homework session is on how to use h5py to store, read, and compress structured data. The first in-class sessions on NWB teaches you how to read NWB files with both h5py and pynwb. The second in-class session focuses on how to build an NWB file from scratch and write it to disk.
Tools
Sessions
Structured Scientific Data with HDF5: Design, Access, and Compression
A practical introduction to HDF5 using h5py: storing structured data and metadata, reading data efficiently, and trading CPU for disk with compression.
Working with NWB Files using h5py and pynwb
How to use h5py and pynwb to read NWB files.
Create NWB files with pynwb
How to use pynwb to create NWB files.
Extra 1: Understanding and Controlling Memory Usage in Numpy
In this session, we explore how Numpy uses memory, calculating how much space our data really takes, examine how arrays are created, and investigate when memory is copied, reused, or temporarily expanded.
Extra 2: Data Representation and Disk IO: Performance Beyond RAM
In this session, we measure what happens when we write arrays to disk, compare text and binary formats, and explore how data types determine both memory usage and file size.