File and Data Management

Courses

Explore database management with SQL, DuckDB, HDF5, and JSON to seamlessly integrate and analyze complex neuroscience datasets.

Author

Dr. Nicholas Del Grosso

Neuroscience is evolving rapidly, with experimental data becoming increasingly complex. How can you seamlessly integrate vast and diverse datasets for insightful analysis and easy sharing? And how would your process improve if, instead of having to write long scripts, you could analyze data with just a few lines of code?

In this course, discover the power of database management systems, a game-changer in neuroscience research. We will dive into the world of SQL and learn about DuckDB SQL engine, which makes it easy to apply industry-standard data organization methods to research data as a relational database – no server management needed! You’ll also gain hands-on experience with HDF5 and JSON for key-value data storage and learn how to combine various management techniques for optimal convenience and performance by building hybrid database systems.

By the end, you’ll be adept at writing Python scripts to create and extract data from databases, query large databases in SQL, store complex data in HDF5, manage your work with Git, and publish your projects on GitHub.

Prerequisites: This workshop is ideal for Neuroscience Researchers at any level (Masters, PhD Candidate, Postdoc, PI) with some background in data analysis using Matlab, Python, or R.

Credits

Dr. Nicholas Del Grosso

Installation

To run the course materials on your own machine:

Install VSCode as your editor
Install pixi or alternatively conda to create virtual Python environments (see the lessons on environment and package management)
Download the materials for a lesson using the "Download Materials" button
Extract the zip file and open the notebook in VSCode
In VSCode, open a new terminal and install the environment:

pixi install

conda env create -f environment.yml
conda activate file_and_data_management

Course Contents

Organizing Structured Data

Organizing Data into Dictionaries

Data into dictionaries for key-value mapping

Extracting Metadata from strings

Extracting meaningful information from filenames

Navigating and Searching through Local and Remote Filesystems

Navigating the filesystem

Managing and Navigating files and directories

Sciebo/NextCloud/Owncloud Folders as a Remote Filesystem

Managing project workspaces with Pixi

Analyzing JSON-, CSV, and Parquet data using SQL in DuckDB

Writing and Reading Metadata with Serialized with JSON

Organizing complex metadata with JSON

SQL Queries over Data Files With DuckDB

Managing and manipulating data in database

Aggregating Data Using Statistical Functions in SQL

Simple aggregations on databases