Navigating the filesystem
Author
Let’s delve into the essential skills of navigating and managing files and directories, a fundamental aspect of handling experimental data in neuroscience research. We will explore various commands and techniques to efficiently organize and access your experimental data, ensuring seamless integration into your analysis workflow.
Setup
Download Data
from pathlib import Path
paths = [
"data/exp1/joey_2021-05-01_001/spikes.npy",
"data/exp1/joey_2021-05-02_001/spikes.npy",
"data/exp1/joey_2021-05-02_001/lfps.h5",
"data/exp1/phoebe_2021-05-02_001/spikes.npy",
"data/exp1/phoebe_2021-05-03_001/spikes.npy",
"data/exp1/phoebe_2021-05-03_001/lfps.h5",
"data/exp1/phoebe_2021-05-04_001/spikes.npy",
]
for path in paths:
path = Path(path)
path.parent.mkdir(exist_ok=True, parents=True)
path.touch()Import Libraries
from pathlib import Path
import fsspec
from fsspec.implementations.github import GithubFileSystem
import pandas as pdSection 1: Using the pathlib library
The pathlib module in Python introduces an object-oriented approach to file system paths–. This section is designed to familiarize you with this powerful library, enhancing your ability to handle file paths and directories with more flexibility and intuitiveness. We’ll cover basic operations like listing directories, globbing for pattern matching, and more, all through the lens of object-oriented programming.
| Command | Description |
|---|---|
from pathlib import Path |
|
Path.cwd() |
Gets the current working directory. |
Path('.').resolve() |
Also gets the current working directory. |
path = Path('./data') |
Make a Path object located in the data folder of the working directory. |
list(path.iterdir()) |
List all the files and folders in the specified path |
new_path = path.joinpath("raw") |
Append the “/raw” folder to the current path |
new_path = path / "raw" |
Also append the “/raw” folder to the current path. |
glob.glob('*.h5') |
Search for files that end in “.h5” in the current path. |
glob.glob('data*') |
Search for files that start with “data” in the current path. |
glob.glob('./**/data*') |
Search for files that start with “data” in the any subfolder in the current path. |
Exercise: What is the current working directory?
Solution
Path.cwd()PosixPath('/home/olebi/projects/new-learning-platform/notebooks/file_and_data_management/02_filesystem_navigation/01_pathlib')Path('.').resolve()PosixPath('/home/olebi/projects/new-learning-platform/notebooks/file_and_data_management/02_filesystem_navigation/01_pathlib')Path().resolve()PosixPath('/home/olebi/projects/new-learning-platform/notebooks/file_and_data_management/02_filesystem_navigation/01_pathlib')Exercise: What files and folders are inside the current working directory?
Solution
list(Path().iterdir())[PosixPath('data'), PosixPath('index.ipynb')]Exercise: What Files and folders are inside the “data” directory?
Solution
list(Path("data").iterdir())[PosixPath('data/seaborn-images'), PosixPath('data/exp1')]Exercise: What Files and Folders are inside the “exp1” directory, inside the “data” directory?
Solution
list(Path("data/exp1").iterdir())[PosixPath('data/exp1/joey_2021-05-01_001'),
PosixPath('data/exp1/joey_2021-05-02_001'),
PosixPath('data/exp1/phoebe_2021-05-02_001'),
PosixPath('data/exp1/phoebe_2021-05-03_001'),
PosixPath('data/exp1/phoebe_2021-05-04_001')]list(Path().joinpath("data").joinpath("exp1").iterdir())[PosixPath('data/exp1/joey_2021-05-01_001'),
PosixPath('data/exp1/joey_2021-05-02_001'),
PosixPath('data/exp1/phoebe_2021-05-02_001'),
PosixPath('data/exp1/phoebe_2021-05-03_001'),
PosixPath('data/exp1/phoebe_2021-05-04_001')]Exercise: What folders in exp1 start with the subject “phoebe”?
Hint: use Path().glob().
Solution
list(Path("data/exp1").glob("phoebe*"))[PosixPath('data/exp1/phoebe_2021-05-02_001'),
PosixPath('data/exp1/phoebe_2021-05-03_001'),
PosixPath('data/exp1/phoebe_2021-05-04_001')]Exercise: What folders in exp1 start with the subject “joey”?
Solution
list(Path("data/exp1").glob("joey*"))[PosixPath('data/exp1/joey_2021-05-01_001'),
PosixPath('data/exp1/joey_2021-05-02_001')]Exercise: What folders in exp1 were recorded on the 2nd of May?
Hint: glob on the date part of the filename.
Solution
list(Path("data/exp1").glob("*2021-05-02*"))[PosixPath('data/exp1/joey_2021-05-02_001'),
PosixPath('data/exp1/phoebe_2021-05-02_001')]Exercise: What files have the “.h5” file extension (include all files in any subfolders of exp1)?
Solution
list(Path("data/exp1").glob("**/*.h5"))[PosixPath('data/exp1/joey_2021-05-02_001/lfps.h5'),
PosixPath('data/exp1/phoebe_2021-05-03_001/lfps.h5')]Exercise: What files have the “.npy” file extension (include all files in any subfolders of exp1)?
Solution
list(Path("data/exp1").glob("**/*.npy"))[PosixPath('data/exp1/joey_2021-05-01_001/spikes.npy'),
PosixPath('data/exp1/joey_2021-05-02_001/spikes.npy'),
PosixPath('data/exp1/phoebe_2021-05-02_001/spikes.npy'),
PosixPath('data/exp1/phoebe_2021-05-03_001/spikes.npy'),
PosixPath('data/exp1/phoebe_2021-05-04_001/spikes.npy')]Exercise: Which of phoebe’s files contain lfp data?
Solution
list(Path("data/exp1").glob("phoebe*/**/lfps*"))[PosixPath('data/exp1/phoebe_2021-05-03_001/lfps.h5')]Section 2: Accessing Remote File Systems using fsspec:
In modern neuroscience research, accessing and manipulating data stored in remote file systems is increasingly common. This section introduces fsspec, a library for interacting with various file systems, including remote and cloud-based storage. We’ll explore how to list, search, and manage files on different remote systems, an invaluable skill in a data-intensive field like neuroscience.
| Code | Description |
|---|---|
fs.ls() |
Lists all files and directories in the current directory of the filesystem. |
fs.glob('*.h5') |
Searches for files matching a specified pattern (in this case, all files ending with ‘.h5’) in the current directory and subdirectories. |
fs.makedirs() |
Creates a new directory at the specified path, including any necessary intermediate directories. |
fs.removedirs() |
Removes directories recursively. Deletes a directory and, if it’s empty, its parent directories as well. |
fs.rm() |
Removes (deletes) a file or directory. |
fs.read_text() |
Reads the contents of a file and returns it as a string. |
fs.read_bytes() |
Reads the contents of a file and returns it as bytes. |
fs.download() |
Downloads a file from the remote filesystem to the local filesystem. |
GitHub Repos as a Remote Filesystem
GitHub, a platform widely used for code sharing and collaboration, can also serve as a remote filesystem for data storage and retrieval. This section guides you through using GitHub repositories for accessing and managing data files, leveraging the GithubFileSystem class in fsspec.
from fsspec.implementations.github import GithubFileSystem
fs = GithubFileSystem(org="ibehave-ibots", repo="iBOTS-Tools")Exercises
Example: List all the files in the root directory of https://github.com/mwaskom/seaborn-data
fs = GithubFileSystem(org="mwaskom", repo="seaborn-data")
fs.ls("/")['README.md',
'anagrams.csv',
'anscombe.csv',
'attention.csv',
'brain_networks.csv',
'car_crashes.csv',
'dataset_names.txt',
'diamonds.csv',
'dots.csv',
'dowjones.csv',
'exercise.csv',
'flights.csv',
'fmri.csv',
'geyser.csv',
'glue.csv',
'healthexp.csv',
'iris.csv',
'mpg.csv',
'penguins.csv',
'planets.csv',
'png',
'process',
'raw',
'seaice.csv',
'taxis.csv',
'tips.csv',
'titanic.csv']Exercise: List all the files whose filenames start with the letter “p” (i.e. “glob” the files)
Solution
fs.glob("p*")['penguins.csv', 'planets.csv', 'png', 'process']Exercise: List all the files whose filenames end in the “CSV” extension.
Solution
fs.glob("*.csv", )['anagrams.csv',
'anscombe.csv',
'attention.csv',
'brain_networks.csv',
'car_crashes.csv',
'diamonds.csv',
'dots.csv',
'dowjones.csv',
'exercise.csv',
'flights.csv',
'fmri.csv',
'geyser.csv',
'glue.csv',
'healthexp.csv',
'iris.csv',
'mpg.csv',
'penguins.csv',
'planets.csv',
'seaice.csv',
'taxis.csv',
'tips.csv',
'titanic.csv']Exercise: List all the PNG image files in the “png” folder.
Solution
fs.ls("png")['png/img1.png',
'png/img2.png',
'png/img3.png',
'png/img4.png',
'png/img5.png',
'png/img6.png']Exercise: Download all the PNG image files in the “png” folder.
Solution
fs.download("png/*", "data/seaborn-images") # note: need glob (has to download files, apparantly)Exercise: Read and print the text contents of the “anscombe.csv” file. What data is inside this file?
Solution
print(fs.read_text("/anscombe.csv").replace(',', '\t'))dataset x y
I 10.0 8.04
I 8.0 6.95
I 13.0 7.58
I 9.0 8.81
I 11.0 8.33
I 14.0 9.96
I 6.0 7.24
I 4.0 4.26
I 12.0 10.84
I 7.0 4.82
I 5.0 5.68
II 10.0 9.14
II 8.0 8.14
II 13.0 8.74
II 9.0 8.77
II 11.0 9.26
II 14.0 8.1
II 6.0 6.13
II 4.0 3.1
II 12.0 9.13
II 7.0 7.26
II 5.0 4.74
III 10.0 7.46
III 8.0 6.77
III 13.0 12.74
III 9.0 7.11
III 11.0 7.81
III 14.0 8.84
III 6.0 6.08
III 4.0 5.39
III 12.0 8.15
III 7.0 6.42
III 5.0 5.73
IV 8.0 6.58
IV 8.0 5.76
IV 8.0 7.71
IV 8.0 8.84
IV 8.0 8.47
IV 8.0 7.04
IV 8.0 5.25
IV 19.0 12.5
IV 8.0 5.56
IV 8.0 7.91
IV 8.0 6.89DeepLabCut: Answer the following questions about the DeepLabCut GitHub Repo: https://github.com/DeepLabCut/DeepLabCut
Exercise: What files are in the root directory of the DeepLabCut repo?
Solution
fs = GithubFileSystem(org="DeepLabCut", repo="DeepLabCut")
fs.ls("/")['.circleci',
'.codespellrc',
'.github',
'.gitignore',
'.pre-commit-config.yaml',
'AUTHORS',
'CODE_OF_CONDUCT.md',
'CONTRIBUTING.md',
'LICENSE',
'NOTICE.yml',
'README.md',
'_config.yml',
'_toc.yml',
'conda-environments',
'deeplabcut',
'dlc.py',
'docker',
'docs',
'examples',
'pyproject.toml',
'reinstall.sh',
'requirements.txt',
'setup.py',
'tests',
'testscript_cli.py',
'tools']Exercise: How many files or folders are in the “openfield-Pranav-2018-10-30” folder, which is in the “examples” folder? (Tip: the len() function can be helpful here.)
Solution
fs.glob("examples/open*/*")['examples/openfield-Pranav-2018-10-30/config.yaml',
'examples/openfield-Pranav-2018-10-30/labeled-data',
'examples/openfield-Pranav-2018-10-30/videos']Exercise: How many files are there, if you include every single file or folder in all the subfolders of the openfield example?
Solution
len(fs.glob("examples/open*/**"))124Exercise: Download all the “labeled-data” files in the openfield example (fs.download(recursive=True))
Solution
fs.download("examples/open*/labeled-data", "deeplabcut/pranav/labeled-data", recursive=True)