Chunk-Based Analysis to Reduce Memory Usage, using XArray, HDF5, and Dask
Author
Scientific datasets are increasingly too large to load into memory all at once. High-resolution imaging, large simulation outputs, and long experimental recordings can easily exceed the RAM available on a typical workstation. If analysis requires loading the entire dataset before computation begins, many otherwise simple operations become impossible.
This notebook introduces a different approach: chunk-based analysis. Instead of loading an entire dataset into memory, data can be processed in smaller pieces. Modern scientific Python tools — particularly XArray, NetCDF/HDF5, and Dask—make this possible while keeping the code readable and close to the mathematical operations scientists want to perform. The goal is to understand how these tools cooperate to reduce memory pressure and enable scalable analysis pipelines.
The notebook proceeds in stages. First, you will learn the basic structure of XArray objects, which add labeled dimensions and metadata to NumPy-style arrays. Next, you will see how these arrays can be saved to NetCDF/HDF5 files, enabling efficient on-disk storage with compression and encoding options. Finally, you will explore how lazy loading and chunked computation allow large analyses to run without loading the entire dataset into memory, and how Dask coordinates the computation behind the scenes.
Setup
Import Packages
import dask.distributed
import netCDF4
import numpy as np
import xarray as xrUtility Functions
from contextlib import contextmanager
import os
import sys
import matplotlib.pyplot as plt
from memory_profiler import memory_usage
import pandas as pd
def _generate_calcium_data_file(fname="calcium_imaging_session.nc", nx=256, ny=256, nt=4000, n_cells=12, baseline=0.15, noise_sd=0.05) -> None:
import numpy as np
import xarray as xr
time = np.linspace(2, 20, nt)
rng = np.random.default_rng(42)
# coordinate grids
x = np.arange(nx)
y = np.arange(ny)
X, Y = np.meshgrid(x, y, indexing="ij")
# ----------------------------
# Background signal
# ----------------------------
# Start with baseline + pixel noise
data = baseline + rng.normal(0, noise_sd, size=(nx, ny, nt))
# Add slow global drift over time
slow_drift = (
0.03 * np.sin(2 * np.pi * time / 9.0)
+ 0.02 * np.sin(2 * np.pi * time / 3.7 + 1.2)
)
data += slow_drift[None, None, :]
# Add a weak spatial background gradient
spatial_bg = 0.03 * ((X / nx) + (Y / ny))
data += spatial_bg[:, :, None]
# ----------------------------
# Shared neuropil-like signal
# ----------------------------
shared_events = rng.poisson(0.003, size=nt).astype(float)
kernel_len = 120
tau = 18
kernel = np.exp(-np.arange(kernel_len) / tau)
shared_trace = np.convolve(shared_events, kernel, mode="full")[:nt]
shared_trace /= shared_trace.max() + 1e-12
shared_trace *= 0.05
data += shared_trace[None, None, :]
# Add synthetic cells
cell_masks = []
cell_traces = []
for i in range(n_cells):
# random cell center, avoiding borders
cx = rng.integers(20, nx - 20)
cy = rng.integers(20, ny - 20)
# random elliptical Gaussian footprint
sx = rng.uniform(3, 8)
sy = rng.uniform(3, 8)
amp = rng.uniform(0.2, 0.8)
footprint = np.exp(-(((X - cx) ** 2) / (2 * sx**2) + ((Y - cy) ** 2) / (2 * sy**2)))
footprint *= amp
# sparse spike/event train
event_rate = rng.uniform(0.002, 0.01)
spikes = rng.poisson(event_rate, size=nt).astype(float)
# calcium decay kernel
tau_decay = rng.uniform(8, 30)
kernel_len = 200
decay_kernel = np.exp(-np.arange(kernel_len) / tau_decay)
trace = np.convolve(spikes, decay_kernel, mode="full")[:nt]
# normalize and scale
if trace.max() > 0:
trace = trace / trace.max()
trace *= rng.uniform(0.3, 1.2)
# add a tiny bit of within-cell temporal noise
trace += rng.normal(0, 0.01, size=nt)
trace = np.clip(trace, 0, None)
# add cell contribution to movie
data += footprint[:, :, None] * trace[None, None, :]
cell_masks.append(footprint)
cell_traces.append(trace)
# Clamp to nonnegative values
data = np.clip(data, 0, None)
# Build xarray object
movie = xr.DataArray(
data,
name="image",
dims=["x", "y", "time"],
coords={
"x": x,
"y": y,
"time": time,
},
attrs={
"description": "Synthetic calcium imaging session with noisy background, drift, shared activity, and localized active cells"
},
)
movie.to_netcdf(fname)
def _format_duration(seconds: float, precision: int = 1) -> str:
"""
Takes a time in seconds and returns a string (e.g. ) that is more human-readable.
Looking to do this in a real project? Some alternatives:
- `humanize`: https://humanize.readthedocs.io/en/latest/
"""
if seconds < 0:
raise ValueError("Duration must be non-negative")
units = [("s", 1), ("ms", 1e-3), ("µs", 1e-6)]
for unit, scale in units:
if seconds >= scale:
value = seconds / scale
return f"{value:.{precision}f} {unit}"
else:
return f"{seconds / 1e-9:.{precision}f} ns"
@contextmanager
def _trace_lines_of(fun):
"a (very) basic line tracer. Collects (timestamp, lineno) for each executed line inside a function."
import time
try:
target_code = fun.__code__
except AttributeError:
yield []
return
target_frame = None
records = []
def tracer(frame, event, arg):
nonlocal target_frame
if event == "call" and frame.f_code is target_code:
target_frame = frame
return tracer
elif frame is target_frame:
if event == "line":
records.append((frame.f_lineno, time.perf_counter()))
elif event == "return":
target_frame = None
return tracer
old_trace = sys.gettrace()
sys.settrace(tracer)
try:
yield records
finally:
sys.settrace(old_trace)
def _sample_memory(fun, interval=.00005):
# Collect memory traces and line number timings
with _trace_lines_of(fun) as line_trace:
memory_trace = memory_usage(fun, interval=interval, timestamps=True)
# Make Comparable DataFrames out of the two datasets
line_trace_df = pd.DataFrame(line_trace, columns=['Line', 'Time'])
if len(line_trace_df) > 0:
line_trace_df.Time -= line_trace_df.Time[0]
memory_trace = memory_trace[1:]
memory_trace_df = pd.DataFrame(memory_trace, columns=['Memory', 'Time'])
memory_trace_df['Time'] -= memory_trace_df['Time'][0]
memory_trace_df['Memory'] -= memory_trace_df['Memory'][0]
return line_trace_df, memory_trace_df
def _plot_memory(data: pd.DataFrame, x='Time', y='Memory', ax=None):
"Makes a line plot."
peak_memory_mb = round(data[y].max(), 1)
total_time = _format_duration(data[x].max())
ax = ax if ax is not None else plt.gca()
ax.plot(data[x], data[y])
ax.fill_between(data[x], data[y], 0, alpha=0.3)
ax.set(xlabel='Time (s)', ylabel='Memory (MB)', title=f"Total Time: {total_time} -- Peak Memory: {peak_memory_mb} MB")
ax.margins(y=0)
ylim_max = data[y].max() * 1.05 if data[y].max() > 1 else 1
ax.set_ylim(0, ylim_max)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
return ax
def _plot_line_numbers(data: pd.DataFrame, x='Time', text='Line', linestyle='--', color='gray', alpha=0.3, fontsize=6, ax=None):
"Makes vertical lines with text above them."
ax = ax if ax is not None else plt.gca()
ymin, ymax = ax.get_ylim()
y_text = ymax - 0.04 * (ymax - ymin)
for _, row in data.iterrows():
ax.axvline(row[x], linestyle=linestyle, alpha=alpha, color=color)
ax.text(row[x], y_text, str(int(row[text])), rotation=90, ha="right", va="bottom", fontsize=fontsize)
return ax
def _analyze_memory(*funs, interval=.00005, linestyle='--', color='gray', alpha=0.3, fontsize=6):
"Convenient wrapper function: records memory traces of provided functions and makes the plot"
if len(funs) == 1:
fig, axes = plt.subplots();
axes = [axes]
else:
fig, axes = plt.subplots(nrows=len(funs), sharex=True)
print(axes)
for ax, fun in zip(axes, funs):
lines, memory = _sample_memory(fun, interval=interval)
# ax = ax if ax is not None else plt.gca()
_plot_memory(memory, ax=ax)
_plot_line_numbers(lines, linestyle=linestyle, color=color, alpha=alpha, fontsize=fontsize, ax=ax)
y_max = max([ax.get_ylim()[1] for ax in axes])
for ax in axes:
ax.set_ylim(0, y_max)
plt.tight_layout()
def _format_bytes(bytes: float, precision: int = 2) -> str:
"""
Takes a time in seconds and returns a string (e.g. ) that is more human-readable.
Looking to do this in a real project? Some alternatives:
- `humanfriendly`: https://pypi.org/project/humanfriendly/#getting-started
"""
if bytes < 0:
raise ValueError("bytes must be non-negative")
units = [("KB", 1000), ("MB", 1_000_000), ("GB", 1_000_000_000), ("TB", 1_000_000_000_000)]
for unit, scale in reversed(units):
if bytes >= scale:
value = bytes / scale
return f"{value:.{precision}f} {unit}"
else:
return f"{bytes} B"
def _file_size(path: str) -> int:
return os.path.getsize(path)
def _print_file_size(path: str, label='') -> None:
text = _format_bytes(_file_size(path))
if label:
text = label + ': ' + text
print(text)
class utils:
analyze_memory = _analyze_memory
generate_calcium_data_file = _generate_calcium_data_file
print_file_size = _print_file_sizeSection 1: Intro to Working with XArray
NumPy arrays are powerful but minimal: they store numerical data without any built-in information about what the axes represent. In scientific datasets, however, each axis usually corresponds to meaningful quantities such as space, time, wavelength, or experimental condition. When working with multidimensional data, it is often useful to label these dimensions explicitly.
XArray extends NumPy by adding labels and metadata to multidimensional arrays. Each array can have named dimensions, coordinate values, and descriptive metadata attached to it. This makes many operations clearer and safer, since calculations can reference dimensions by name rather than by numeric index. In addition, XArray integrates naturally with file formats such as NetCDF and with distributed computing tools like Dask.
Exercises
The exercises in this section introduce the core concepts of XArray: treating a DataArray like a NumPy array, labeling dimensions, attaching coordinates, and adding metadata to describe the data.
xr.DataArray() as a Numpy-Like Array
At its core, an XArray DataArray wraps a NumPy array. Most numerical operations that work on NumPy arrays also work on DataArray objects, including slicing, aggregation, and arithmetic. This means that existing NumPy-style code often requires very little modification to work with labeled arrays.
Example: Create a three-dimensional DataArray from a Numpy array:
da = xr.DataArray(
data=np.random.random(size=(10, 20, 30))
)
da<xarray.DataArray (dim_0: 10, dim_1: 20, dim_2: 30)> Size: 48kB
array([[[0.04114747, 0.75038331, 0.13638907, ..., 0.38833257,
0.28897767, 0.44763658],
[0.05093347, 0.84213056, 0.73766516, ..., 0.01407856,
0.76522943, 0.87498363],
[0.84178601, 0.30799076, 0.2984225 , ..., 0.37395342,
0.16885954, 0.44853357],
...,
[0.85808347, 0.16014808, 0.04753766, ..., 0.8107473 ,
0.93155405, 0.27736656],
[0.05476814, 0.56464578, 0.40849327, ..., 0.41397046,
0.40700989, 0.872461 ],
[0.33342521, 0.57842884, 0.20717272, ..., 0.56226436,
0.26117397, 0.78730901]],
[[0.41578296, 0.97891988, 0.94396026, ..., 0.51148956,
0.32645908, 0.49573464],
[0.0894115 , 0.40364212, 0.90407579, ..., 0.7082974 ,
0.59801165, 0.55842448],
[0.32918622, 0.2603817 , 0.54499274, ..., 0.43422843,
0.56081601, 0.7011575 ],
...
[0.67786071, 0.45865382, 0.27930756, ..., 0.84763965,
0.71848224, 0.12861828],
[0.3438912 , 0.66496328, 0.962331 , ..., 0.73381751,
0.38001691, 0.64951477],
[0.51649295, 0.17889422, 0.9333006 , ..., 0.4241998 ,
0.6520835 , 0.53247383]],
[[0.35375472, 0.06168421, 0.79528549, ..., 0.73293053,
0.54893905, 0.99901047],
[0.45519009, 0.25676456, 0.70768052, ..., 0.64053184,
0.83816718, 0.24689528],
[0.07249795, 0.92133683, 0.38432126, ..., 0.21450932,
0.98230673, 0.76157872],
...,
[0.68634487, 0.97082859, 0.64525841, ..., 0.39506538,
0.2014205 , 0.07881245],
[0.94521576, 0.14213673, 0.01033459, ..., 0.9120943 ,
0.68981696, 0.56170583],
[0.29622479, 0.7068755 , 0.24623064, ..., 0.30720207,
0.64914055, 0.66220053]]])
Dimensions without coordinates: dim_0, dim_1, dim_2- dim_0: 10
- dim_1: 20
- dim_2: 30
- 0.04115 0.7504 0.1364 0.7344 0.5692 ... 0.07554 0.3072 0.6491 0.6622
array([[[0.04114747, 0.75038331, 0.13638907, ..., 0.38833257, 0.28897767, 0.44763658], [0.05093347, 0.84213056, 0.73766516, ..., 0.01407856, 0.76522943, 0.87498363], [0.84178601, 0.30799076, 0.2984225 , ..., 0.37395342, 0.16885954, 0.44853357], ..., [0.85808347, 0.16014808, 0.04753766, ..., 0.8107473 , 0.93155405, 0.27736656], [0.05476814, 0.56464578, 0.40849327, ..., 0.41397046, 0.40700989, 0.872461 ], [0.33342521, 0.57842884, 0.20717272, ..., 0.56226436, 0.26117397, 0.78730901]], [[0.41578296, 0.97891988, 0.94396026, ..., 0.51148956, 0.32645908, 0.49573464], [0.0894115 , 0.40364212, 0.90407579, ..., 0.7082974 , 0.59801165, 0.55842448], [0.32918622, 0.2603817 , 0.54499274, ..., 0.43422843, 0.56081601, 0.7011575 ], ... [0.67786071, 0.45865382, 0.27930756, ..., 0.84763965, 0.71848224, 0.12861828], [0.3438912 , 0.66496328, 0.962331 , ..., 0.73381751, 0.38001691, 0.64951477], [0.51649295, 0.17889422, 0.9333006 , ..., 0.4241998 , 0.6520835 , 0.53247383]], [[0.35375472, 0.06168421, 0.79528549, ..., 0.73293053, 0.54893905, 0.99901047], [0.45519009, 0.25676456, 0.70768052, ..., 0.64053184, 0.83816718, 0.24689528], [0.07249795, 0.92133683, 0.38432126, ..., 0.21450932, 0.98230673, 0.76157872], ..., [0.68634487, 0.97082859, 0.64525841, ..., 0.39506538, 0.2014205 , 0.07881245], [0.94521576, 0.14213673, 0.01033459, ..., 0.9120943 , 0.68981696, 0.56170583], [0.29622479, 0.7068755 , 0.24623064, ..., 0.30720207, 0.64914055, 0.66220053]]])
Exercise: Select the first 5 rows of da, using the slicing synatax x[:10, :, :]
Solution
da[:5, :, :]<xarray.DataArray (dim_0: 5, dim_1: 20, dim_2: 30)> Size: 24kB
array([[[0.04114747, 0.75038331, 0.13638907, ..., 0.38833257,
0.28897767, 0.44763658],
[0.05093347, 0.84213056, 0.73766516, ..., 0.01407856,
0.76522943, 0.87498363],
[0.84178601, 0.30799076, 0.2984225 , ..., 0.37395342,
0.16885954, 0.44853357],
...,
[0.85808347, 0.16014808, 0.04753766, ..., 0.8107473 ,
0.93155405, 0.27736656],
[0.05476814, 0.56464578, 0.40849327, ..., 0.41397046,
0.40700989, 0.872461 ],
[0.33342521, 0.57842884, 0.20717272, ..., 0.56226436,
0.26117397, 0.78730901]],
[[0.41578296, 0.97891988, 0.94396026, ..., 0.51148956,
0.32645908, 0.49573464],
[0.0894115 , 0.40364212, 0.90407579, ..., 0.7082974 ,
0.59801165, 0.55842448],
[0.32918622, 0.2603817 , 0.54499274, ..., 0.43422843,
0.56081601, 0.7011575 ],
...
[0.69525271, 0.64018338, 0.65966751, ..., 0.34683598,
0.00548341, 0.97329194],
[0.80950062, 0.64463964, 0.27341711, ..., 0.68010487,
0.58527527, 0.17991659],
[0.72386847, 0.04767042, 0.4784509 , ..., 0.25828601,
0.23017327, 0.83358268]],
[[0.27685799, 0.68762621, 0.03612248, ..., 0.68830425,
0.43521138, 0.48475464],
[0.13150606, 0.64611598, 0.67140634, ..., 0.72932653,
0.55523022, 0.58939696],
[0.99564735, 0.67435163, 0.4850405 , ..., 0.00560686,
0.48457899, 0.52615165],
...,
[0.03060629, 0.05426987, 0.81598002, ..., 0.50067055,
0.71105902, 0.8936419 ],
[0.05694798, 0.54300376, 0.98750746, ..., 0.11035621,
0.29567352, 0.82197459],
[0.84906294, 0.14679959, 0.2824754 , ..., 0.85234413,
0.89728047, 0.23293677]]])
Dimensions without coordinates: dim_0, dim_1, dim_2- dim_0: 5
- dim_1: 20
- dim_2: 30
- 0.04115 0.7504 0.1364 0.7344 0.5692 ... 0.01969 0.8523 0.8973 0.2329
array([[[0.04114747, 0.75038331, 0.13638907, ..., 0.38833257, 0.28897767, 0.44763658], [0.05093347, 0.84213056, 0.73766516, ..., 0.01407856, 0.76522943, 0.87498363], [0.84178601, 0.30799076, 0.2984225 , ..., 0.37395342, 0.16885954, 0.44853357], ..., [0.85808347, 0.16014808, 0.04753766, ..., 0.8107473 , 0.93155405, 0.27736656], [0.05476814, 0.56464578, 0.40849327, ..., 0.41397046, 0.40700989, 0.872461 ], [0.33342521, 0.57842884, 0.20717272, ..., 0.56226436, 0.26117397, 0.78730901]], [[0.41578296, 0.97891988, 0.94396026, ..., 0.51148956, 0.32645908, 0.49573464], [0.0894115 , 0.40364212, 0.90407579, ..., 0.7082974 , 0.59801165, 0.55842448], [0.32918622, 0.2603817 , 0.54499274, ..., 0.43422843, 0.56081601, 0.7011575 ], ... [0.69525271, 0.64018338, 0.65966751, ..., 0.34683598, 0.00548341, 0.97329194], [0.80950062, 0.64463964, 0.27341711, ..., 0.68010487, 0.58527527, 0.17991659], [0.72386847, 0.04767042, 0.4784509 , ..., 0.25828601, 0.23017327, 0.83358268]], [[0.27685799, 0.68762621, 0.03612248, ..., 0.68830425, 0.43521138, 0.48475464], [0.13150606, 0.64611598, 0.67140634, ..., 0.72932653, 0.55523022, 0.58939696], [0.99564735, 0.67435163, 0.4850405 , ..., 0.00560686, 0.48457899, 0.52615165], ..., [0.03060629, 0.05426987, 0.81598002, ..., 0.50067055, 0.71105902, 0.8936419 ], [0.05694798, 0.54300376, 0.98750746, ..., 0.11035621, 0.29567352, 0.82197459], [0.84906294, 0.14679959, 0.2824754 , ..., 0.85234413, 0.89728047, 0.23293677]]])
Exercise: Compute the mean, using either DataArray.mean() or np.mean()
Solution
da.mean()<xarray.DataArray ()> Size: 8B array(0.49582556)
- 0.4958
array(0.49582556)
np.mean(da)<xarray.DataArray ()> Size: 8B array(0.49582556)
- 0.4958
array(0.49582556)
Exercise: Compute the mean over the third axis, using da.mean(axis=2)
Solution
da.mean(axis=2)<xarray.DataArray (dim_0: 10, dim_1: 20)> Size: 2kB
array([[0.36543086, 0.51930887, 0.50379079, 0.52659501, 0.52329708,
0.45206561, 0.44555284, 0.47871553, 0.51089349, 0.48598282,
0.52021286, 0.37824522, 0.59671092, 0.615205 , 0.52748188,
0.52641723, 0.67116061, 0.42539296, 0.49287799, 0.47272362],
[0.55609468, 0.4810388 , 0.5259652 , 0.59836652, 0.54134858,
0.48422569, 0.46709276, 0.52790002, 0.45980352, 0.48049169,
0.43466852, 0.55785632, 0.57304273, 0.55303505, 0.517658 ,
0.45762595, 0.47264817, 0.51178683, 0.48442642, 0.59459949],
[0.44657057, 0.56822554, 0.49829036, 0.54690908, 0.53317257,
0.52955974, 0.44061672, 0.47899669, 0.50972067, 0.51375136,
0.5035676 , 0.47461658, 0.57139615, 0.38634051, 0.49474923,
0.53198677, 0.50013701, 0.42897054, 0.50942051, 0.47081368],
[0.46629911, 0.41400256, 0.49320932, 0.49706875, 0.50986913,
0.47141487, 0.44684023, 0.54404585, 0.45654007, 0.52816059,
0.51185483, 0.41566652, 0.49721622, 0.52734675, 0.39633733,
0.46176637, 0.55182298, 0.44828229, 0.53683044, 0.49020325],
[0.53004363, 0.49924226, 0.46172736, 0.48735078, 0.45591984,
0.50744314, 0.51750811, 0.55884772, 0.5100851 , 0.43773424,
0.47074307, 0.55014578, 0.45866962, 0.46788254, 0.50701368,
0.56608109, 0.41439794, 0.45354272, 0.5338212 , 0.54764117],
[0.54622386, 0.50704904, 0.52853719, 0.48871144, 0.40561274,
0.39808459, 0.52884049, 0.56943325, 0.54578676, 0.51050244,
0.45564268, 0.51644497, 0.50357821, 0.51080586, 0.44878332,
0.43235936, 0.43539606, 0.51422296, 0.61260323, 0.47746059],
[0.57749112, 0.55907827, 0.48373934, 0.51663829, 0.48428476,
0.58290814, 0.50846196, 0.51487716, 0.48403421, 0.60375554,
0.50852149, 0.47286028, 0.44520898, 0.52491347, 0.53538122,
0.51516768, 0.50169217, 0.46943158, 0.37626364, 0.44376613],
[0.50950841, 0.47831953, 0.59670342, 0.3995389 , 0.52428618,
0.50096481, 0.53531568, 0.46965357, 0.50080524, 0.37695733,
0.45476231, 0.4805418 , 0.42229679, 0.47802709, 0.45059819,
0.37335441, 0.55677103, 0.46432355, 0.55098146, 0.55196238],
[0.51830114, 0.51199971, 0.47828723, 0.41539535, 0.47819357,
0.49026753, 0.47132953, 0.48441641, 0.51107812, 0.49218128,
0.41817107, 0.54548401, 0.45529535, 0.50994582, 0.38968902,
0.47973765, 0.42666402, 0.48926506, 0.60631024, 0.55265594],
[0.53724441, 0.55331959, 0.58556178, 0.38731547, 0.47589156,
0.53687486, 0.45980396, 0.43617056, 0.54166356, 0.49557229,
0.4859638 , 0.48989926, 0.44355795, 0.52572281, 0.4783072 ,
0.5768246 , 0.43944115, 0.4844711 , 0.49882065, 0.43162461]])
Dimensions without coordinates: dim_0, dim_1- dim_0: 10
- dim_1: 20
- 0.3654 0.5193 0.5038 0.5266 0.5233 ... 0.4394 0.4845 0.4988 0.4316
array([[0.36543086, 0.51930887, 0.50379079, 0.52659501, 0.52329708, 0.45206561, 0.44555284, 0.47871553, 0.51089349, 0.48598282, 0.52021286, 0.37824522, 0.59671092, 0.615205 , 0.52748188, 0.52641723, 0.67116061, 0.42539296, 0.49287799, 0.47272362], [0.55609468, 0.4810388 , 0.5259652 , 0.59836652, 0.54134858, 0.48422569, 0.46709276, 0.52790002, 0.45980352, 0.48049169, 0.43466852, 0.55785632, 0.57304273, 0.55303505, 0.517658 , 0.45762595, 0.47264817, 0.51178683, 0.48442642, 0.59459949], [0.44657057, 0.56822554, 0.49829036, 0.54690908, 0.53317257, 0.52955974, 0.44061672, 0.47899669, 0.50972067, 0.51375136, 0.5035676 , 0.47461658, 0.57139615, 0.38634051, 0.49474923, 0.53198677, 0.50013701, 0.42897054, 0.50942051, 0.47081368], [0.46629911, 0.41400256, 0.49320932, 0.49706875, 0.50986913, 0.47141487, 0.44684023, 0.54404585, 0.45654007, 0.52816059, 0.51185483, 0.41566652, 0.49721622, 0.52734675, 0.39633733, 0.46176637, 0.55182298, 0.44828229, 0.53683044, 0.49020325], [0.53004363, 0.49924226, 0.46172736, 0.48735078, 0.45591984, 0.50744314, 0.51750811, 0.55884772, 0.5100851 , 0.43773424, 0.47074307, 0.55014578, 0.45866962, 0.46788254, 0.50701368, 0.56608109, 0.41439794, 0.45354272, 0.5338212 , 0.54764117], [0.54622386, 0.50704904, 0.52853719, 0.48871144, 0.40561274, 0.39808459, 0.52884049, 0.56943325, 0.54578676, 0.51050244, 0.45564268, 0.51644497, 0.50357821, 0.51080586, 0.44878332, 0.43235936, 0.43539606, 0.51422296, 0.61260323, 0.47746059], [0.57749112, 0.55907827, 0.48373934, 0.51663829, 0.48428476, 0.58290814, 0.50846196, 0.51487716, 0.48403421, 0.60375554, 0.50852149, 0.47286028, 0.44520898, 0.52491347, 0.53538122, 0.51516768, 0.50169217, 0.46943158, 0.37626364, 0.44376613], [0.50950841, 0.47831953, 0.59670342, 0.3995389 , 0.52428618, 0.50096481, 0.53531568, 0.46965357, 0.50080524, 0.37695733, 0.45476231, 0.4805418 , 0.42229679, 0.47802709, 0.45059819, 0.37335441, 0.55677103, 0.46432355, 0.55098146, 0.55196238], [0.51830114, 0.51199971, 0.47828723, 0.41539535, 0.47819357, 0.49026753, 0.47132953, 0.48441641, 0.51107812, 0.49218128, 0.41817107, 0.54548401, 0.45529535, 0.50994582, 0.38968902, 0.47973765, 0.42666402, 0.48926506, 0.60631024, 0.55265594], [0.53724441, 0.55331959, 0.58556178, 0.38731547, 0.47589156, 0.53687486, 0.45980396, 0.43617056, 0.54166356, 0.49557229, 0.4859638 , 0.48989926, 0.44355795, 0.52572281, 0.4783072 , 0.5768246 , 0.43944115, 0.4844711 , 0.49882065, 0.43162461]])
Labeling the Data and the Dimensions: name= and dims=
One of XArray’s most useful features is the ability to name dimensions explicitly. Instead of referring to axes by position—such as “axis 0” or “axis 2”—operations can refer to dimensions using meaningful labels like “x”, “y”, or “time”.
Once dimensions are named, many operations become easier to read and harder to misuse. For example, computing the mean across time can be expressed as mean(dim=“time”), which clearly communicates the intent of the calculation.
Exercise: Make a new da 3-dimensional array variable using xr.DataArray(), this time additionally setting name="image" and dims=['x', 'y', 'time']
Solution
da = xr.DataArray(
data=np.random.random(size=(10, 20, 30)),
name='image',
dims=['x', 'y', 'time']
)
da<xarray.DataArray 'image' (x: 10, y: 20, time: 30)> Size: 48kB
array([[[0.67710939, 0.30888335, 0.45708531, ..., 0.19024812,
0.57919888, 0.4943471 ],
[0.6595412 , 0.05768168, 0.41122793, ..., 0.42941126,
0.02140642, 0.13401437],
[0.47115825, 0.03549168, 0.47427858, ..., 0.11232079,
0.15764563, 0.62205852],
...,
[0.28629424, 0.28075838, 0.90142574, ..., 0.09790786,
0.66381097, 0.76691221],
[0.56623202, 0.6275929 , 0.5333757 , ..., 0.73606387,
0.67496555, 0.01273267],
[0.6557586 , 0.71014066, 0.5542154 , ..., 0.67508891,
0.21545972, 0.04647963]],
[[0.46266484, 0.27270618, 0.37350087, ..., 0.7071857 ,
0.89672212, 0.25277119],
[0.73110233, 0.39435732, 0.07033897, ..., 0.90960717,
0.91212683, 0.85858381],
[0.54258139, 0.11662828, 0.70256949, ..., 0.85428115,
0.03788034, 0.63837203],
...
[0.36590867, 0.27510572, 0.46112712, ..., 0.33816704,
0.07879941, 0.38753586],
[0.29461655, 0.75531933, 0.07085249, ..., 0.52368795,
0.6893739 , 0.49329903],
[0.24075614, 0.3107854 , 0.24904419, ..., 0.00986273,
0.73754247, 0.11510364]],
[[0.65561021, 0.98253874, 0.70322808, ..., 0.43339667,
0.21292771, 0.95580234],
[0.71432444, 0.32570543, 0.89027762, ..., 0.16513184,
0.45325214, 0.84195001],
[0.89453391, 0.08796845, 0.97497481, ..., 0.1163913 ,
0.67460762, 0.55036986],
...,
[0.79907291, 0.12435965, 0.4183625 , ..., 0.75998778,
0.54159547, 0.64966451],
[0.57047831, 0.65203531, 0.30976315, ..., 0.27428619,
0.9356512 , 0.63459768],
[0.87974006, 0.32322905, 0.91136299, ..., 0.35024879,
0.94698808, 0.50808183]]])
Dimensions without coordinates: x, y, time- x: 10
- y: 20
- time: 30
- 0.6771 0.3089 0.4571 0.2478 0.8169 ... 0.3813 0.3502 0.947 0.5081
array([[[0.67710939, 0.30888335, 0.45708531, ..., 0.19024812, 0.57919888, 0.4943471 ], [0.6595412 , 0.05768168, 0.41122793, ..., 0.42941126, 0.02140642, 0.13401437], [0.47115825, 0.03549168, 0.47427858, ..., 0.11232079, 0.15764563, 0.62205852], ..., [0.28629424, 0.28075838, 0.90142574, ..., 0.09790786, 0.66381097, 0.76691221], [0.56623202, 0.6275929 , 0.5333757 , ..., 0.73606387, 0.67496555, 0.01273267], [0.6557586 , 0.71014066, 0.5542154 , ..., 0.67508891, 0.21545972, 0.04647963]], [[0.46266484, 0.27270618, 0.37350087, ..., 0.7071857 , 0.89672212, 0.25277119], [0.73110233, 0.39435732, 0.07033897, ..., 0.90960717, 0.91212683, 0.85858381], [0.54258139, 0.11662828, 0.70256949, ..., 0.85428115, 0.03788034, 0.63837203], ... [0.36590867, 0.27510572, 0.46112712, ..., 0.33816704, 0.07879941, 0.38753586], [0.29461655, 0.75531933, 0.07085249, ..., 0.52368795, 0.6893739 , 0.49329903], [0.24075614, 0.3107854 , 0.24904419, ..., 0.00986273, 0.73754247, 0.11510364]], [[0.65561021, 0.98253874, 0.70322808, ..., 0.43339667, 0.21292771, 0.95580234], [0.71432444, 0.32570543, 0.89027762, ..., 0.16513184, 0.45325214, 0.84195001], [0.89453391, 0.08796845, 0.97497481, ..., 0.1163913 , 0.67460762, 0.55036986], ..., [0.79907291, 0.12435965, 0.4183625 , ..., 0.75998778, 0.54159547, 0.64966451], [0.57047831, 0.65203531, 0.30976315, ..., 0.27428619, 0.9356512 , 0.63459768], [0.87974006, 0.32322905, 0.91136299, ..., 0.35024879, 0.94698808, 0.50808183]]])
Exercise: Select the fourth time sample using da.sel(time=4)
Solution
da.sel(time=4)<xarray.DataArray 'image' (x: 10, y: 20)> Size: 2kB
array([[0.81690343, 0.7209225 , 0.37353862, 0.09224265, 0.03095172,
0.1034145 , 0.63437993, 0.17861791, 0.2149326 , 0.11016639,
0.11142093, 0.2261374 , 0.35131947, 0.95335097, 0.64888917,
0.02271543, 0.26942174, 0.26820578, 0.49959026, 0.11380439],
[0.35311137, 0.4148705 , 0.41419894, 0.60659794, 0.09539062,
0.71539388, 0.95375252, 0.07784231, 0.84844934, 0.11753521,
0.78966488, 0.75023638, 0.28176082, 0.79697694, 0.12724592,
0.03527607, 0.99368476, 0.88356642, 0.76887476, 0.68415071],
[0.42463613, 0.14438795, 0.40494972, 0.67949229, 0.53039794,
0.81529368, 0.06234376, 0.99362028, 0.79408995, 0.38304623,
0.09001437, 0.98109669, 0.24317204, 0.63199111, 0.12674025,
0.50320134, 0.15266813, 0.53977947, 0.71372985, 0.24760575],
[0.57205422, 0.95364895, 0.34793683, 0.23919657, 0.12988433,
0.63331466, 0.52231607, 0.33278371, 0.62182073, 0.4274041 ,
0.42886689, 0.34358668, 0.4653964 , 0.3645093 , 0.19066942,
0.23865388, 0.42829744, 0.11948408, 0.03752981, 0.95159085],
[0.41837782, 0.36533384, 0.25933047, 0.08754735, 0.49177214,
0.58898798, 0.53283701, 0.43497849, 0.98585099, 0.62895794,
0.52711377, 0.47866854, 0.67471869, 0.9185391 , 0.75206332,
0.55107556, 0.73957786, 0.94516618, 0.42911865, 0.48761669],
[0.04747765, 0.24085401, 0.16609059, 0.53680168, 0.79971536,
0.95924525, 0.38889558, 0.62757553, 0.69035109, 0.88295853,
0.67377996, 0.84171832, 0.82412218, 0.08195431, 0.79791916,
0.08518784, 0.43338793, 0.52470145, 0.62609036, 0.8949754 ],
[0.56147571, 0.37038801, 0.75341007, 0.39167142, 0.31499963,
0.65011844, 0.72642505, 0.51515493, 0.30194453, 0.59054549,
0.2370653 , 0.20692905, 0.13203398, 0.71813268, 0.69640391,
0.04455144, 0.62728676, 0.82446057, 0.74412542, 0.37767237],
[0.10371527, 0.94428154, 0.38452835, 0.64577214, 0.09205662,
0.06029028, 0.72665883, 0.83662477, 0.26651286, 0.83489503,
0.0988879 , 0.40095675, 0.32867554, 0.99772062, 0.6113875 ,
0.93377601, 0.69657599, 0.6114565 , 0.07554179, 0.00437493],
[0.30154039, 0.23765152, 0.68592597, 0.40720714, 0.41215613,
0.57588725, 0.53873454, 0.09855938, 0.86329038, 0.06073814,
0.18346947, 0.89362387, 0.66958095, 0.82418249, 0.76896116,
0.63618435, 0.48571894, 0.96678574, 0.88264642, 0.73664438],
[0.71095648, 0.63434897, 0.83616863, 0.99057802, 0.5831305 ,
0.39098512, 0.79564113, 0.69978499, 0.7932254 , 0.11968389,
0.18836944, 0.50066022, 0.67891818, 0.45608767, 0.36538924,
0.42506311, 0.33150542, 0.19117472, 0.10269446, 0.37903476]])
Dimensions without coordinates: x, y- x: 10
- y: 20
- 0.8169 0.7209 0.3735 0.09224 0.03095 ... 0.3315 0.1912 0.1027 0.379
array([[0.81690343, 0.7209225 , 0.37353862, 0.09224265, 0.03095172, 0.1034145 , 0.63437993, 0.17861791, 0.2149326 , 0.11016639, 0.11142093, 0.2261374 , 0.35131947, 0.95335097, 0.64888917, 0.02271543, 0.26942174, 0.26820578, 0.49959026, 0.11380439], [0.35311137, 0.4148705 , 0.41419894, 0.60659794, 0.09539062, 0.71539388, 0.95375252, 0.07784231, 0.84844934, 0.11753521, 0.78966488, 0.75023638, 0.28176082, 0.79697694, 0.12724592, 0.03527607, 0.99368476, 0.88356642, 0.76887476, 0.68415071], [0.42463613, 0.14438795, 0.40494972, 0.67949229, 0.53039794, 0.81529368, 0.06234376, 0.99362028, 0.79408995, 0.38304623, 0.09001437, 0.98109669, 0.24317204, 0.63199111, 0.12674025, 0.50320134, 0.15266813, 0.53977947, 0.71372985, 0.24760575], [0.57205422, 0.95364895, 0.34793683, 0.23919657, 0.12988433, 0.63331466, 0.52231607, 0.33278371, 0.62182073, 0.4274041 , 0.42886689, 0.34358668, 0.4653964 , 0.3645093 , 0.19066942, 0.23865388, 0.42829744, 0.11948408, 0.03752981, 0.95159085], [0.41837782, 0.36533384, 0.25933047, 0.08754735, 0.49177214, 0.58898798, 0.53283701, 0.43497849, 0.98585099, 0.62895794, 0.52711377, 0.47866854, 0.67471869, 0.9185391 , 0.75206332, 0.55107556, 0.73957786, 0.94516618, 0.42911865, 0.48761669], [0.04747765, 0.24085401, 0.16609059, 0.53680168, 0.79971536, 0.95924525, 0.38889558, 0.62757553, 0.69035109, 0.88295853, 0.67377996, 0.84171832, 0.82412218, 0.08195431, 0.79791916, 0.08518784, 0.43338793, 0.52470145, 0.62609036, 0.8949754 ], [0.56147571, 0.37038801, 0.75341007, 0.39167142, 0.31499963, 0.65011844, 0.72642505, 0.51515493, 0.30194453, 0.59054549, 0.2370653 , 0.20692905, 0.13203398, 0.71813268, 0.69640391, 0.04455144, 0.62728676, 0.82446057, 0.74412542, 0.37767237], [0.10371527, 0.94428154, 0.38452835, 0.64577214, 0.09205662, 0.06029028, 0.72665883, 0.83662477, 0.26651286, 0.83489503, 0.0988879 , 0.40095675, 0.32867554, 0.99772062, 0.6113875 , 0.93377601, 0.69657599, 0.6114565 , 0.07554179, 0.00437493], [0.30154039, 0.23765152, 0.68592597, 0.40720714, 0.41215613, 0.57588725, 0.53873454, 0.09855938, 0.86329038, 0.06073814, 0.18346947, 0.89362387, 0.66958095, 0.82418249, 0.76896116, 0.63618435, 0.48571894, 0.96678574, 0.88264642, 0.73664438], [0.71095648, 0.63434897, 0.83616863, 0.99057802, 0.5831305 , 0.39098512, 0.79564113, 0.69978499, 0.7932254 , 0.11968389, 0.18836944, 0.50066022, 0.67891818, 0.45608767, 0.36538924, 0.42506311, 0.33150542, 0.19117472, 0.10269446, 0.37903476]])
Exercise: Select the first-through fifth rows by name using da.sel(x=slice(0, 5))
Solution
da.sel(x=slice(0, 5))<xarray.DataArray 'image' (x: 5, y: 20, time: 30)> Size: 24kB
array([[[0.67710939, 0.30888335, 0.45708531, ..., 0.19024812,
0.57919888, 0.4943471 ],
[0.6595412 , 0.05768168, 0.41122793, ..., 0.42941126,
0.02140642, 0.13401437],
[0.47115825, 0.03549168, 0.47427858, ..., 0.11232079,
0.15764563, 0.62205852],
...,
[0.28629424, 0.28075838, 0.90142574, ..., 0.09790786,
0.66381097, 0.76691221],
[0.56623202, 0.6275929 , 0.5333757 , ..., 0.73606387,
0.67496555, 0.01273267],
[0.6557586 , 0.71014066, 0.5542154 , ..., 0.67508891,
0.21545972, 0.04647963]],
[[0.46266484, 0.27270618, 0.37350087, ..., 0.7071857 ,
0.89672212, 0.25277119],
[0.73110233, 0.39435732, 0.07033897, ..., 0.90960717,
0.91212683, 0.85858381],
[0.54258139, 0.11662828, 0.70256949, ..., 0.85428115,
0.03788034, 0.63837203],
...
[0.41128742, 0.76827003, 0.13690284, ..., 0.35153952,
0.21000871, 0.89327571],
[0.73643851, 0.28246673, 0.55631256, ..., 0.20599446,
0.26434225, 0.99029251],
[0.37146648, 0.43607167, 0.64789792, ..., 0.40202998,
0.57516726, 0.17500694]],
[[0.31257181, 0.61820674, 0.52070954, ..., 0.96581995,
0.23534712, 0.41627359],
[0.87671843, 0.20151553, 0.9917814 , ..., 0.44273185,
0.20544541, 0.37368288],
[0.26560331, 0.22749109, 0.94030219, ..., 0.24526497,
0.78634043, 0.07908533],
...,
[0.82781059, 0.29590796, 0.95242948, ..., 0.67263742,
0.38092654, 0.89350527],
[0.57307289, 0.73484916, 0.36874063, ..., 0.02862867,
0.92761903, 0.49103933],
[0.4093979 , 0.14409232, 0.98184898, ..., 0.38638863,
0.67179747, 0.79713569]]])
Dimensions without coordinates: x, y, time- x: 5
- y: 20
- time: 30
- 0.6771 0.3089 0.4571 0.2478 0.8169 ... 0.1034 0.3864 0.6718 0.7971
array([[[0.67710939, 0.30888335, 0.45708531, ..., 0.19024812, 0.57919888, 0.4943471 ], [0.6595412 , 0.05768168, 0.41122793, ..., 0.42941126, 0.02140642, 0.13401437], [0.47115825, 0.03549168, 0.47427858, ..., 0.11232079, 0.15764563, 0.62205852], ..., [0.28629424, 0.28075838, 0.90142574, ..., 0.09790786, 0.66381097, 0.76691221], [0.56623202, 0.6275929 , 0.5333757 , ..., 0.73606387, 0.67496555, 0.01273267], [0.6557586 , 0.71014066, 0.5542154 , ..., 0.67508891, 0.21545972, 0.04647963]], [[0.46266484, 0.27270618, 0.37350087, ..., 0.7071857 , 0.89672212, 0.25277119], [0.73110233, 0.39435732, 0.07033897, ..., 0.90960717, 0.91212683, 0.85858381], [0.54258139, 0.11662828, 0.70256949, ..., 0.85428115, 0.03788034, 0.63837203], ... [0.41128742, 0.76827003, 0.13690284, ..., 0.35153952, 0.21000871, 0.89327571], [0.73643851, 0.28246673, 0.55631256, ..., 0.20599446, 0.26434225, 0.99029251], [0.37146648, 0.43607167, 0.64789792, ..., 0.40202998, 0.57516726, 0.17500694]], [[0.31257181, 0.61820674, 0.52070954, ..., 0.96581995, 0.23534712, 0.41627359], [0.87671843, 0.20151553, 0.9917814 , ..., 0.44273185, 0.20544541, 0.37368288], [0.26560331, 0.22749109, 0.94030219, ..., 0.24526497, 0.78634043, 0.07908533], ..., [0.82781059, 0.29590796, 0.95242948, ..., 0.67263742, 0.38092654, 0.89350527], [0.57307289, 0.73484916, 0.36874063, ..., 0.02862867, 0.92761903, 0.49103933], [0.4093979 , 0.14409232, 0.98184898, ..., 0.38638863, 0.67179747, 0.79713569]]])
Exercise: Compute the Mean image over time by name, using da.mean(dim='time'):
Solution
da.mean(dim='time')<xarray.DataArray 'image' (x: 10, y: 20)> Size: 2kB
array([[0.47843092, 0.46969608, 0.40644396, 0.50741432, 0.50431421,
0.48563248, 0.53645447, 0.45694158, 0.47792393, 0.49790484,
0.47847525, 0.45882576, 0.50619661, 0.50320613, 0.50071791,
0.41972101, 0.45554273, 0.48766826, 0.5085251 , 0.50531774],
[0.49078497, 0.52379173, 0.48343256, 0.52503334, 0.45745951,
0.54254514, 0.61334868, 0.40266028, 0.47112354, 0.46579182,
0.54729357, 0.46197744, 0.47021517, 0.46994462, 0.47692702,
0.51305033, 0.56584108, 0.42563489, 0.51674477, 0.58459544],
[0.44949127, 0.60548923, 0.48180034, 0.45403246, 0.59818813,
0.56062785, 0.45900812, 0.57637942, 0.49849704, 0.49103374,
0.54039484, 0.42094949, 0.48788003, 0.47691933, 0.42459847,
0.58556951, 0.53766951, 0.54855299, 0.48729667, 0.46217485],
[0.37252296, 0.63062867, 0.504824 , 0.52336255, 0.55739432,
0.49561938, 0.48572253, 0.48198813, 0.45012286, 0.59467673,
0.55435092, 0.38695287, 0.53432442, 0.47676637, 0.50862611,
0.41253874, 0.46182589, 0.48332374, 0.47428668, 0.54903787],
[0.42500643, 0.54558215, 0.47353567, 0.48548281, 0.50231371,
0.51240519, 0.46619329, 0.47241953, 0.45909955, 0.55562628,
0.58132156, 0.45478005, 0.53310118, 0.54784863, 0.47495015,
0.4817479 , 0.54321249, 0.58288687, 0.50915776, 0.47254236],
[0.47258876, 0.48908605, 0.49056801, 0.55310662, 0.65789466,
0.56912281, 0.59585348, 0.5372052 , 0.53167562, 0.45888481,
0.39769294, 0.46414352, 0.49553688, 0.58052805, 0.39814518,
0.52548734, 0.54697939, 0.50181692, 0.4965746 , 0.49904745],
[0.50088195, 0.54168772, 0.57419232, 0.41886657, 0.38611848,
0.47733109, 0.38949948, 0.49882494, 0.56745354, 0.48883225,
0.45925509, 0.46427744, 0.46913585, 0.48387636, 0.4858951 ,
0.49172214, 0.4274939 , 0.5149629 , 0.49961429, 0.5545395 ],
[0.61660792, 0.49845409, 0.52420791, 0.62849252, 0.4095893 ,
0.4342628 , 0.48062861, 0.60759781, 0.42381866, 0.47382528,
0.3666136 , 0.60449872, 0.5064963 , 0.47201591, 0.5073689 ,
0.46681923, 0.59064752, 0.56942485, 0.50850607, 0.51502411],
[0.43756182, 0.52026911, 0.51650468, 0.4421631 , 0.57237277,
0.4562016 , 0.50262569, 0.52268015, 0.50294336, 0.43493523,
0.57542953, 0.49229615, 0.48417751, 0.51001346, 0.52059171,
0.56349546, 0.4723652 , 0.47318401, 0.49233718, 0.48746748],
[0.58153839, 0.57155364, 0.47683888, 0.61983997, 0.44372875,
0.47244084, 0.44871863, 0.45402516, 0.50615253, 0.56414166,
0.55305249, 0.43315097, 0.50376687, 0.44821274, 0.6061095 ,
0.42601648, 0.55315635, 0.44721002, 0.55584759, 0.5059227 ]])
Dimensions without coordinates: x, y- x: 10
- y: 20
- 0.4784 0.4697 0.4064 0.5074 0.5043 ... 0.5532 0.4472 0.5558 0.5059
array([[0.47843092, 0.46969608, 0.40644396, 0.50741432, 0.50431421, 0.48563248, 0.53645447, 0.45694158, 0.47792393, 0.49790484, 0.47847525, 0.45882576, 0.50619661, 0.50320613, 0.50071791, 0.41972101, 0.45554273, 0.48766826, 0.5085251 , 0.50531774], [0.49078497, 0.52379173, 0.48343256, 0.52503334, 0.45745951, 0.54254514, 0.61334868, 0.40266028, 0.47112354, 0.46579182, 0.54729357, 0.46197744, 0.47021517, 0.46994462, 0.47692702, 0.51305033, 0.56584108, 0.42563489, 0.51674477, 0.58459544], [0.44949127, 0.60548923, 0.48180034, 0.45403246, 0.59818813, 0.56062785, 0.45900812, 0.57637942, 0.49849704, 0.49103374, 0.54039484, 0.42094949, 0.48788003, 0.47691933, 0.42459847, 0.58556951, 0.53766951, 0.54855299, 0.48729667, 0.46217485], [0.37252296, 0.63062867, 0.504824 , 0.52336255, 0.55739432, 0.49561938, 0.48572253, 0.48198813, 0.45012286, 0.59467673, 0.55435092, 0.38695287, 0.53432442, 0.47676637, 0.50862611, 0.41253874, 0.46182589, 0.48332374, 0.47428668, 0.54903787], [0.42500643, 0.54558215, 0.47353567, 0.48548281, 0.50231371, 0.51240519, 0.46619329, 0.47241953, 0.45909955, 0.55562628, 0.58132156, 0.45478005, 0.53310118, 0.54784863, 0.47495015, 0.4817479 , 0.54321249, 0.58288687, 0.50915776, 0.47254236], [0.47258876, 0.48908605, 0.49056801, 0.55310662, 0.65789466, 0.56912281, 0.59585348, 0.5372052 , 0.53167562, 0.45888481, 0.39769294, 0.46414352, 0.49553688, 0.58052805, 0.39814518, 0.52548734, 0.54697939, 0.50181692, 0.4965746 , 0.49904745], [0.50088195, 0.54168772, 0.57419232, 0.41886657, 0.38611848, 0.47733109, 0.38949948, 0.49882494, 0.56745354, 0.48883225, 0.45925509, 0.46427744, 0.46913585, 0.48387636, 0.4858951 , 0.49172214, 0.4274939 , 0.5149629 , 0.49961429, 0.5545395 ], [0.61660792, 0.49845409, 0.52420791, 0.62849252, 0.4095893 , 0.4342628 , 0.48062861, 0.60759781, 0.42381866, 0.47382528, 0.3666136 , 0.60449872, 0.5064963 , 0.47201591, 0.5073689 , 0.46681923, 0.59064752, 0.56942485, 0.50850607, 0.51502411], [0.43756182, 0.52026911, 0.51650468, 0.4421631 , 0.57237277, 0.4562016 , 0.50262569, 0.52268015, 0.50294336, 0.43493523, 0.57542953, 0.49229615, 0.48417751, 0.51001346, 0.52059171, 0.56349546, 0.4723652 , 0.47318401, 0.49233718, 0.48746748], [0.58153839, 0.57155364, 0.47683888, 0.61983997, 0.44372875, 0.47244084, 0.44871863, 0.45402516, 0.50615253, 0.56414166, 0.55305249, 0.43315097, 0.50376687, 0.44821274, 0.6061095 , 0.42601648, 0.55315635, 0.44721002, 0.55584759, 0.5059227 ]])
Exercise: The time points are stored in the numpy array t below. Use mask = t > 40; da.sel(time=mask) to select only the data corresponding to time points greater than 40:
t = np.linspace(0, 100, 30)Solution
mask = t > 40
da.sel(time=mask)<xarray.DataArray 'image' (x: 10, y: 20, time: 18)> Size: 29kB
array([[[0.3801502 , 0.04820498, 0.89874884, ..., 0.19024812,
0.57919888, 0.4943471 ],
[0.62321878, 0.10624042, 0.68024044, ..., 0.42941126,
0.02140642, 0.13401437],
[0.05602627, 0.02598415, 0.32258628, ..., 0.11232079,
0.15764563, 0.62205852],
...,
[0.73502167, 0.7191653 , 0.52243755, ..., 0.09790786,
0.66381097, 0.76691221],
[0.4117305 , 0.09230545, 0.02878121, ..., 0.73606387,
0.67496555, 0.01273267],
[0.60857306, 0.96885987, 0.1138432 , ..., 0.67508891,
0.21545972, 0.04647963]],
[[0.09413247, 0.00475213, 0.51825391, ..., 0.7071857 ,
0.89672212, 0.25277119],
[0.40472999, 0.01018857, 0.11182434, ..., 0.90960717,
0.91212683, 0.85858381],
[0.24924832, 0.14484967, 0.62068823, ..., 0.85428115,
0.03788034, 0.63837203],
...
[0.47594781, 0.40575236, 0.59922647, ..., 0.33816704,
0.07879941, 0.38753586],
[0.58649034, 0.83478326, 0.98797737, ..., 0.52368795,
0.6893739 , 0.49329903],
[0.54401046, 0.55875543, 0.65777507, ..., 0.00986273,
0.73754247, 0.11510364]],
[[0.12587492, 0.33950829, 0.85387088, ..., 0.43339667,
0.21292771, 0.95580234],
[0.41198532, 0.84913138, 0.17903529, ..., 0.16513184,
0.45325214, 0.84195001],
[0.92331714, 0.61395529, 0.19310102, ..., 0.1163913 ,
0.67460762, 0.55036986],
...,
[0.92368158, 0.66873584, 0.20778289, ..., 0.75998778,
0.54159547, 0.64966451],
[0.93139689, 0.3658875 , 0.44009 , ..., 0.27428619,
0.9356512 , 0.63459768],
[0.99162561, 0.70124652, 0.29062643, ..., 0.35024879,
0.94698808, 0.50808183]]])
Dimensions without coordinates: x, y, time- x: 10
- y: 20
- time: 18
- 0.3802 0.0482 0.8987 0.2366 0.3519 ... 0.3813 0.3502 0.947 0.5081
array([[[0.3801502 , 0.04820498, 0.89874884, ..., 0.19024812, 0.57919888, 0.4943471 ], [0.62321878, 0.10624042, 0.68024044, ..., 0.42941126, 0.02140642, 0.13401437], [0.05602627, 0.02598415, 0.32258628, ..., 0.11232079, 0.15764563, 0.62205852], ..., [0.73502167, 0.7191653 , 0.52243755, ..., 0.09790786, 0.66381097, 0.76691221], [0.4117305 , 0.09230545, 0.02878121, ..., 0.73606387, 0.67496555, 0.01273267], [0.60857306, 0.96885987, 0.1138432 , ..., 0.67508891, 0.21545972, 0.04647963]], [[0.09413247, 0.00475213, 0.51825391, ..., 0.7071857 , 0.89672212, 0.25277119], [0.40472999, 0.01018857, 0.11182434, ..., 0.90960717, 0.91212683, 0.85858381], [0.24924832, 0.14484967, 0.62068823, ..., 0.85428115, 0.03788034, 0.63837203], ... [0.47594781, 0.40575236, 0.59922647, ..., 0.33816704, 0.07879941, 0.38753586], [0.58649034, 0.83478326, 0.98797737, ..., 0.52368795, 0.6893739 , 0.49329903], [0.54401046, 0.55875543, 0.65777507, ..., 0.00986273, 0.73754247, 0.11510364]], [[0.12587492, 0.33950829, 0.85387088, ..., 0.43339667, 0.21292771, 0.95580234], [0.41198532, 0.84913138, 0.17903529, ..., 0.16513184, 0.45325214, 0.84195001], [0.92331714, 0.61395529, 0.19310102, ..., 0.1163913 , 0.67460762, 0.55036986], ..., [0.92368158, 0.66873584, 0.20778289, ..., 0.75998778, 0.54159547, 0.64966451], [0.93139689, 0.3658875 , 0.44009 , ..., 0.27428619, 0.9356512 , 0.63459768], [0.99162561, 0.70124652, 0.29062643, ..., 0.35024879, 0.94698808, 0.50808183]]])
Labeling each Axis using Coordinates and Attributes
Beyond naming dimensions, XArray allows each axis to have coordinate values that describe the physical meaning of each index. For example, a time axis might correspond to timestamps, or a spatial axis might correspond to pixel positions.
Coordinates make it possible to select data based on meaningful values rather than raw indices. For example, selecting frames after a particular time point can be done using coordinate values instead of computing index positions manually.
Example: Run the code below to make a new da using xr.DataArray(), this time additionally mapping the time axis to the time points themselves using coords=:
da = xr.DataArray(
data=np.random.random(size=(10, 20, 30)),
name='image',
dims=['x', 'y', 'time'],
coords = {
'time': np.linspace(0, 100, 30),
}
)
da<xarray.DataArray 'image' (x: 10, y: 20, time: 30)> Size: 48kB
array([[[0.46188057, 0.56766766, 0.35679791, ..., 0.05633519,
0.6566564 , 0.28389431],
[0.58015032, 0.64239196, 0.59039491, ..., 0.49590085,
0.02281968, 0.11329427],
[0.5039821 , 0.60760961, 0.83575356, ..., 0.37388575,
0.41638332, 0.27047329],
...,
[0.66524982, 0.38093567, 0.52518851, ..., 0.09357031,
0.12206716, 0.54444632],
[0.84532887, 0.97704918, 0.91157941, ..., 0.37410448,
0.14586067, 0.74378509],
[0.32688785, 0.61309651, 0.95989322, ..., 0.83371582,
0.64154624, 0.77752146]],
[[0.09202725, 0.90281392, 0.49982754, ..., 0.73266458,
0.25561141, 0.48023462],
[0.02234091, 0.98852295, 0.62247615, ..., 0.63447814,
0.94441917, 0.09651057],
[0.01004742, 0.66161957, 0.50444871, ..., 0.02655767,
0.97403606, 0.16546788],
...
[0.85371778, 0.43043134, 0.76959364, ..., 0.71519278,
0.67391388, 0.76497901],
[0.22501216, 0.52742085, 0.1762034 , ..., 0.80517868,
0.93740406, 0.40259905],
[0.0798965 , 0.12092623, 0.00821162, ..., 0.9359866 ,
0.07810915, 0.77949279]],
[[0.34816838, 0.71699975, 0.24623201, ..., 0.97537935,
0.07112402, 0.79217533],
[0.21306198, 0.2743723 , 0.63966886, ..., 0.86183832,
0.36204873, 0.92305822],
[0.15225632, 0.27889629, 0.62959152, ..., 0.82866566,
0.23851943, 0.98939333],
...,
[0.80892984, 0.67120648, 0.79454465, ..., 0.80222849,
0.22851522, 0.57351216],
[0.67077438, 0.14226303, 0.90061353, ..., 0.1229384 ,
0.90847576, 0.03312194],
[0.48467367, 0.77017557, 0.23214696, ..., 0.46349595,
0.64632291, 0.27783926]]])
Coordinates:
* time (time) float64 240B 0.0 3.448 6.897 10.34 ... 93.1 96.55 100.0
Dimensions without coordinates: x, y- x: 10
- y: 20
- time: 30
- 0.4619 0.5677 0.3568 0.6823 0.8425 ... 0.1187 0.4635 0.6463 0.2778
array([[[0.46188057, 0.56766766, 0.35679791, ..., 0.05633519, 0.6566564 , 0.28389431], [0.58015032, 0.64239196, 0.59039491, ..., 0.49590085, 0.02281968, 0.11329427], [0.5039821 , 0.60760961, 0.83575356, ..., 0.37388575, 0.41638332, 0.27047329], ..., [0.66524982, 0.38093567, 0.52518851, ..., 0.09357031, 0.12206716, 0.54444632], [0.84532887, 0.97704918, 0.91157941, ..., 0.37410448, 0.14586067, 0.74378509], [0.32688785, 0.61309651, 0.95989322, ..., 0.83371582, 0.64154624, 0.77752146]], [[0.09202725, 0.90281392, 0.49982754, ..., 0.73266458, 0.25561141, 0.48023462], [0.02234091, 0.98852295, 0.62247615, ..., 0.63447814, 0.94441917, 0.09651057], [0.01004742, 0.66161957, 0.50444871, ..., 0.02655767, 0.97403606, 0.16546788], ... [0.85371778, 0.43043134, 0.76959364, ..., 0.71519278, 0.67391388, 0.76497901], [0.22501216, 0.52742085, 0.1762034 , ..., 0.80517868, 0.93740406, 0.40259905], [0.0798965 , 0.12092623, 0.00821162, ..., 0.9359866 , 0.07810915, 0.77949279]], [[0.34816838, 0.71699975, 0.24623201, ..., 0.97537935, 0.07112402, 0.79217533], [0.21306198, 0.2743723 , 0.63966886, ..., 0.86183832, 0.36204873, 0.92305822], [0.15225632, 0.27889629, 0.62959152, ..., 0.82866566, 0.23851943, 0.98939333], ..., [0.80892984, 0.67120648, 0.79454465, ..., 0.80222849, 0.22851522, 0.57351216], [0.67077438, 0.14226303, 0.90061353, ..., 0.1229384 , 0.90847576, 0.03312194], [0.48467367, 0.77017557, 0.23214696, ..., 0.46349595, 0.64632291, 0.27783926]]]) - time(time)float640.0 3.448 6.897 ... 96.55 100.0
array([ 0. , 3.448276, 6.896552, 10.344828, 13.793103, 17.241379, 20.689655, 24.137931, 27.586207, 31.034483, 34.482759, 37.931034, 41.37931 , 44.827586, 48.275862, 51.724138, 55.172414, 58.62069 , 62.068966, 65.517241, 68.965517, 72.413793, 75.862069, 79.310345, 82.758621, 86.206897, 89.655172, 93.103448, 96.551724, 100. ])
- timePandasIndex
PandasIndex(Index([ 0.0, 3.4482758620689653, 6.896551724137931, 10.344827586206897, 13.793103448275861, 17.241379310344826, 20.689655172413794, 24.137931034482758, 27.586206896551722, 31.034482758620687, 34.48275862068965, 37.93103448275862, 41.37931034482759, 44.82758620689655, 48.275862068965516, 51.72413793103448, 55.172413793103445, 58.62068965517241, 62.068965517241374, 65.51724137931033, 68.9655172413793, 72.41379310344827, 75.86206896551724, 79.3103448275862, 82.75862068965517, 86.20689655172413, 89.6551724137931, 93.10344827586206, 96.55172413793103, 100.0], dtype='float64', name='time'))
Exercise: Use da.sel(time = slice(40, None)) to select only the data corresponding to time points greater than 40, without first creating a mask:
Solution
da.sel(time = slice(40, None))<xarray.DataArray 'image' (x: 10, y: 20, time: 18)> Size: 29kB
array([[[0.83686132, 0.42845474, 0.44557469, ..., 0.05633519,
0.6566564 , 0.28389431],
[0.57298279, 0.17370296, 0.69313108, ..., 0.49590085,
0.02281968, 0.11329427],
[0.69213285, 0.21450153, 0.99096409, ..., 0.37388575,
0.41638332, 0.27047329],
...,
[0.07942635, 0.58554461, 0.01508784, ..., 0.09357031,
0.12206716, 0.54444632],
[0.17035337, 0.79151936, 0.58735636, ..., 0.37410448,
0.14586067, 0.74378509],
[0.4267523 , 0.78713569, 0.92095774, ..., 0.83371582,
0.64154624, 0.77752146]],
[[0.04377784, 0.11835502, 0.2540089 , ..., 0.73266458,
0.25561141, 0.48023462],
[0.84141178, 0.65626098, 0.20320923, ..., 0.63447814,
0.94441917, 0.09651057],
[0.31242241, 0.40528523, 0.19022224, ..., 0.02655767,
0.97403606, 0.16546788],
...
[0.33485026, 0.17928519, 0.06393263, ..., 0.71519278,
0.67391388, 0.76497901],
[0.57569317, 0.8408589 , 0.28079959, ..., 0.80517868,
0.93740406, 0.40259905],
[0.48696678, 0.27770608, 0.90097863, ..., 0.9359866 ,
0.07810915, 0.77949279]],
[[0.57542 , 0.86748088, 0.39047068, ..., 0.97537935,
0.07112402, 0.79217533],
[0.48733402, 0.14941463, 0.75924524, ..., 0.86183832,
0.36204873, 0.92305822],
[0.13135676, 0.8255285 , 0.28076253, ..., 0.82866566,
0.23851943, 0.98939333],
...,
[0.48887299, 0.48989821, 0.35578766, ..., 0.80222849,
0.22851522, 0.57351216],
[0.63862842, 0.18332819, 0.56574068, ..., 0.1229384 ,
0.90847576, 0.03312194],
[0.02250793, 0.41551062, 0.52261051, ..., 0.46349595,
0.64632291, 0.27783926]]])
Coordinates:
* time (time) float64 144B 41.38 44.83 48.28 51.72 ... 93.1 96.55 100.0
Dimensions without coordinates: x, y- x: 10
- y: 20
- time: 18
- 0.8369 0.4285 0.4456 0.4524 0.7207 ... 0.1187 0.4635 0.6463 0.2778
array([[[0.83686132, 0.42845474, 0.44557469, ..., 0.05633519, 0.6566564 , 0.28389431], [0.57298279, 0.17370296, 0.69313108, ..., 0.49590085, 0.02281968, 0.11329427], [0.69213285, 0.21450153, 0.99096409, ..., 0.37388575, 0.41638332, 0.27047329], ..., [0.07942635, 0.58554461, 0.01508784, ..., 0.09357031, 0.12206716, 0.54444632], [0.17035337, 0.79151936, 0.58735636, ..., 0.37410448, 0.14586067, 0.74378509], [0.4267523 , 0.78713569, 0.92095774, ..., 0.83371582, 0.64154624, 0.77752146]], [[0.04377784, 0.11835502, 0.2540089 , ..., 0.73266458, 0.25561141, 0.48023462], [0.84141178, 0.65626098, 0.20320923, ..., 0.63447814, 0.94441917, 0.09651057], [0.31242241, 0.40528523, 0.19022224, ..., 0.02655767, 0.97403606, 0.16546788], ... [0.33485026, 0.17928519, 0.06393263, ..., 0.71519278, 0.67391388, 0.76497901], [0.57569317, 0.8408589 , 0.28079959, ..., 0.80517868, 0.93740406, 0.40259905], [0.48696678, 0.27770608, 0.90097863, ..., 0.9359866 , 0.07810915, 0.77949279]], [[0.57542 , 0.86748088, 0.39047068, ..., 0.97537935, 0.07112402, 0.79217533], [0.48733402, 0.14941463, 0.75924524, ..., 0.86183832, 0.36204873, 0.92305822], [0.13135676, 0.8255285 , 0.28076253, ..., 0.82866566, 0.23851943, 0.98939333], ..., [0.48887299, 0.48989821, 0.35578766, ..., 0.80222849, 0.22851522, 0.57351216], [0.63862842, 0.18332819, 0.56574068, ..., 0.1229384 , 0.90847576, 0.03312194], [0.02250793, 0.41551062, 0.52261051, ..., 0.46349595, 0.64632291, 0.27783926]]]) - time(time)float6441.38 44.83 48.28 ... 96.55 100.0
array([ 41.37931 , 44.827586, 48.275862, 51.724138, 55.172414, 58.62069 , 62.068966, 65.517241, 68.965517, 72.413793, 75.862069, 79.310345, 82.758621, 86.206897, 89.655172, 93.103448, 96.551724, 100. ])
- timePandasIndex
PandasIndex(Index([ 41.37931034482759, 44.82758620689655, 48.275862068965516, 51.72413793103448, 55.172413793103445, 58.62068965517241, 62.068965517241374, 65.51724137931033, 68.9655172413793, 72.41379310344827, 75.86206896551724, 79.3103448275862, 82.75862068965517, 86.20689655172413, 89.6551724137931, 93.10344827586206, 96.55172413793103, 100.0], dtype='float64', name='time'))
Adding Descriptions to the data
Support for basic data descriptions is quite extensive. Things like units, long names for plotting, processing history, and even descriptions for explaining each part of the data are supported by adding the data to a dictionary attached to xarray objects called attrs. Some keys are recognized by other tooling (e.g. units, description, long_name), but for the most part, any kind of key-value combination is supported for metadata.
Example: Run the code below to create a new da DataArray using DataArray, this time with extra attributes describing the main variables.
time = xr.DataArray(
data = np.linspace(0, 100, 30),
name = 'time',
dims=['time'],
attrs = {
'units': 's',
'description': 'time samples for each image frame'
}
)
da = xr.DataArray(
data=np.random.random(size=(10, 20, 30)),
name='image',
dims=['x', 'y', 'time'],
coords = {
'time': time,
},
attrs = {
'units': 'brightness',
'description': 'a generated random image stack',
'long_name': 'calcium image pixel brightness',
}
)
da<xarray.DataArray 'image' (x: 10, y: 20, time: 30)> Size: 48kB
array([[[4.35496650e-01, 1.39860289e-01, 5.76075894e-01, ...,
1.35828490e-01, 3.23799397e-01, 1.07731768e-01],
[4.15392354e-01, 6.89231091e-01, 1.62120190e-01, ...,
9.73108830e-01, 1.85465688e-01, 9.39971562e-01],
[6.75804059e-01, 6.30469961e-01, 6.44439396e-01, ...,
2.10046370e-01, 7.32814223e-01, 8.73014448e-01],
...,
[8.47559592e-01, 8.73858054e-01, 8.35539489e-01, ...,
1.89592622e-01, 9.04254151e-01, 2.20712074e-01],
[9.35658273e-01, 4.81834729e-01, 1.27097967e-02, ...,
5.67628947e-01, 6.67647998e-01, 7.29321847e-01],
[6.13921318e-01, 1.64942405e-01, 2.86663115e-01, ...,
8.63679855e-01, 4.42827205e-01, 7.77789488e-01]],
[[1.51234474e-01, 6.54155400e-01, 5.48704160e-01, ...,
6.14159540e-01, 2.34606080e-01, 3.36681628e-01],
[9.92149874e-01, 9.63872664e-01, 4.92213151e-01, ...,
3.93095095e-01, 4.89412507e-01, 6.08654658e-01],
[5.37480845e-01, 5.00420566e-01, 1.92164140e-01, ...,
9.06850090e-01, 3.10698928e-01, 3.07288262e-01],
...
[1.99030780e-01, 2.33889088e-01, 4.25963944e-01, ...,
7.57721772e-02, 3.09179984e-02, 1.52252551e-02],
[2.75607620e-01, 8.97809607e-01, 1.09684499e-04, ...,
1.16773014e-01, 6.32046940e-01, 1.89419867e-01],
[3.49003337e-01, 2.21797962e-01, 7.39140922e-01, ...,
7.78807244e-01, 7.95221138e-02, 2.98407146e-01]],
[[1.41137599e-01, 1.20473007e-01, 5.20769885e-01, ...,
4.56489725e-01, 9.25368638e-01, 9.80714344e-01],
[9.73212545e-01, 1.00600066e-01, 5.95941059e-01, ...,
5.77546236e-01, 5.35467949e-01, 4.31949006e-01],
[3.76814095e-01, 9.85196306e-01, 1.18982638e-01, ...,
5.14847642e-01, 7.32485135e-01, 4.54910505e-01],
...,
[9.22120925e-01, 6.09786725e-01, 6.11135546e-01, ...,
7.14720948e-01, 3.05674544e-01, 1.08655577e-01],
[9.31272102e-01, 7.09868764e-02, 9.22907586e-01, ...,
8.87035004e-01, 2.62764053e-01, 6.04462399e-01],
[1.10832118e-01, 4.95192941e-01, 9.50893051e-01, ...,
4.16903438e-01, 5.13032394e-01, 6.69712544e-01]]])
Coordinates:
* time (time) float64 240B 0.0 3.448 6.897 10.34 ... 93.1 96.55 100.0
Dimensions without coordinates: x, y
Attributes:
units: brightness
description: A generated random image stack
long_name: calcium image pixel brightness- x: 10
- y: 20
- time: 30
- 0.4355 0.1399 0.5761 0.4103 0.7932 ... 0.1676 0.4169 0.513 0.6697
array([[[4.35496650e-01, 1.39860289e-01, 5.76075894e-01, ..., 1.35828490e-01, 3.23799397e-01, 1.07731768e-01], [4.15392354e-01, 6.89231091e-01, 1.62120190e-01, ..., 9.73108830e-01, 1.85465688e-01, 9.39971562e-01], [6.75804059e-01, 6.30469961e-01, 6.44439396e-01, ..., 2.10046370e-01, 7.32814223e-01, 8.73014448e-01], ..., [8.47559592e-01, 8.73858054e-01, 8.35539489e-01, ..., 1.89592622e-01, 9.04254151e-01, 2.20712074e-01], [9.35658273e-01, 4.81834729e-01, 1.27097967e-02, ..., 5.67628947e-01, 6.67647998e-01, 7.29321847e-01], [6.13921318e-01, 1.64942405e-01, 2.86663115e-01, ..., 8.63679855e-01, 4.42827205e-01, 7.77789488e-01]], [[1.51234474e-01, 6.54155400e-01, 5.48704160e-01, ..., 6.14159540e-01, 2.34606080e-01, 3.36681628e-01], [9.92149874e-01, 9.63872664e-01, 4.92213151e-01, ..., 3.93095095e-01, 4.89412507e-01, 6.08654658e-01], [5.37480845e-01, 5.00420566e-01, 1.92164140e-01, ..., 9.06850090e-01, 3.10698928e-01, 3.07288262e-01], ... [1.99030780e-01, 2.33889088e-01, 4.25963944e-01, ..., 7.57721772e-02, 3.09179984e-02, 1.52252551e-02], [2.75607620e-01, 8.97809607e-01, 1.09684499e-04, ..., 1.16773014e-01, 6.32046940e-01, 1.89419867e-01], [3.49003337e-01, 2.21797962e-01, 7.39140922e-01, ..., 7.78807244e-01, 7.95221138e-02, 2.98407146e-01]], [[1.41137599e-01, 1.20473007e-01, 5.20769885e-01, ..., 4.56489725e-01, 9.25368638e-01, 9.80714344e-01], [9.73212545e-01, 1.00600066e-01, 5.95941059e-01, ..., 5.77546236e-01, 5.35467949e-01, 4.31949006e-01], [3.76814095e-01, 9.85196306e-01, 1.18982638e-01, ..., 5.14847642e-01, 7.32485135e-01, 4.54910505e-01], ..., [9.22120925e-01, 6.09786725e-01, 6.11135546e-01, ..., 7.14720948e-01, 3.05674544e-01, 1.08655577e-01], [9.31272102e-01, 7.09868764e-02, 9.22907586e-01, ..., 8.87035004e-01, 2.62764053e-01, 6.04462399e-01], [1.10832118e-01, 4.95192941e-01, 9.50893051e-01, ..., 4.16903438e-01, 5.13032394e-01, 6.69712544e-01]]]) - time(time)float640.0 3.448 6.897 ... 96.55 100.0
- units :
- s
- description :
- time samples for each image frame
array([ 0. , 3.448276, 6.896552, 10.344828, 13.793103, 17.241379, 20.689655, 24.137931, 27.586207, 31.034483, 34.482759, 37.931034, 41.37931 , 44.827586, 48.275862, 51.724138, 55.172414, 58.62069 , 62.068966, 65.517241, 68.965517, 72.413793, 75.862069, 79.310345, 82.758621, 86.206897, 89.655172, 93.103448, 96.551724, 100. ])
- timePandasIndex
PandasIndex(Index([ 0.0, 3.4482758620689653, 6.896551724137931, 10.344827586206897, 13.793103448275861, 17.241379310344826, 20.689655172413794, 24.137931034482758, 27.586206896551722, 31.034482758620687, 34.48275862068965, 37.93103448275862, 41.37931034482759, 44.82758620689655, 48.275862068965516, 51.72413793103448, 55.172413793103445, 58.62068965517241, 62.068965517241374, 65.51724137931033, 68.9655172413793, 72.41379310344827, 75.86206896551724, 79.3103448275862, 82.75862068965517, 86.20689655172413, 89.6551724137931, 93.10344827586206, 96.55172413793103, 100.0], dtype='float64', name='time'))
- units :
- brightness
- description :
- A generated random image stack
- long_name :
- calcium image pixel brightness
Exercise: View the attributes of the da DataArray with da.attrs
Solution
da.attrs{'units': 'brightness',
'description': 'A generated random image stack',
'long_name': 'calcium image pixel brightness'}Exercise: View the attributes of the time coordinate on the da DataArray with da.time.attrs:
Solution
da.time.attrs{'units': 's', 'description': 'time samples for each image frame'}Exercise: Plot the mean pixel brightness over time and check that some attributes are used automatically in the plot, with da.mean(dim=['x', 'y']).plot():
Solution
da.mean(dim=['x', 'y']).plot();There are many, many more features that XArray provides to add convenience to an analysis, but this should be enough to get us started.
Section 2: Creating HDF5-based NetCDF4 Files with XArray
Once data is organized in an XArray structure, it can easily be saved to disk using scientific file formats such as NetCDF4, which is built on top of the HDF5 storage system. These formats are widely used in scientific computing because they support structured metadata, multidimensional datasets, and efficient storage of large arrays.
Saving data in these formats allows large datasets to be stored and accessed efficiently without requiring them to be fully loaded into memory. It also makes the data portable and accessible to tools outside of Python.
Exercises
Exercise: Use da.to_netcdf(), using the engine='netcdf4' option, to create an HDF5-compatible file called example.nc.
time = xr.DataArray(
data = np.linspace(0, 100, 30),
name = 'time',
dims=['time'],
attrs = {
'units': 's',
'description': 'time samples for each image frame'
}
)
da = xr.DataArray(
data=np.random.random(size=(10, 20, 30)),
name='image',
dims=['x', 'y', 'time'],
coords = {
'time': time,
},
attrs = {
'units': 'brightness',
'description': 'A generated random image stack',
'long_name': 'calcium image pixel brightness',
}
)
da<xarray.DataArray 'image' (x: 10, y: 20, time: 30)> Size: 48kB
array([[[0.29989757, 0.8463877 , 0.40663498, ..., 0.24629935,
0.95116593, 0.70166196],
[0.77588588, 0.65273534, 0.55998213, ..., 0.66929204,
0.77767039, 0.50514103],
[0.61162945, 0.12880813, 0.41674473, ..., 0.97652357,
0.00874667, 0.10117324],
...,
[0.27236194, 0.80752062, 0.32870814, ..., 0.33127256,
0.27448837, 0.04049907],
[0.75803561, 0.3023437 , 0.40953296, ..., 0.41166149,
0.05782473, 0.60460466],
[0.54209502, 0.77177583, 0.81081577, ..., 0.32133847,
0.86516611, 0.92231743]],
[[0.52935632, 0.71359921, 0.95500389, ..., 0.49523286,
0.34773767, 0.60304061],
[0.50776562, 0.085269 , 0.38566092, ..., 0.81686683,
0.78306769, 0.67995772],
[0.8333419 , 0.44137973, 0.33703999, ..., 0.46596195,
0.34835274, 0.87634407],
...
[0.1856812 , 0.03144001, 0.95300717, ..., 0.06460216,
0.06975456, 0.59354467],
[0.81160282, 0.56820701, 0.14425859, ..., 0.19822809,
0.22702473, 0.1643339 ],
[0.4079898 , 0.48072856, 0.6481851 , ..., 0.59914815,
0.49579475, 0.40973475]],
[[0.36633622, 0.15680477, 0.59611613, ..., 0.53226308,
0.95268626, 0.97610412],
[0.03067763, 0.93109174, 0.20496723, ..., 0.55886482,
0.95605829, 0.55176005],
[0.12329406, 0.4358826 , 0.46839423, ..., 0.69402656,
0.06989202, 0.84391449],
...,
[0.6833302 , 0.21588141, 0.66163522, ..., 0.73072985,
0.72357252, 0.15604505],
[0.20859619, 0.53943158, 0.67767281, ..., 0.83549851,
0.02629358, 0.46670397],
[0.92274174, 0.07095888, 0.63037707, ..., 0.53206717,
0.12700903, 0.25080094]]])
Coordinates:
* time (time) float64 240B 0.0 3.448 6.897 10.34 ... 93.1 96.55 100.0
Dimensions without coordinates: x, y
Attributes:
units: brightness
description: A generated random image stack
long_name: calcium image pixel brightness- x: 10
- y: 20
- time: 30
- 0.2999 0.8464 0.4066 0.9765 0.7405 ... 0.03308 0.5321 0.127 0.2508
array([[[0.29989757, 0.8463877 , 0.40663498, ..., 0.24629935, 0.95116593, 0.70166196], [0.77588588, 0.65273534, 0.55998213, ..., 0.66929204, 0.77767039, 0.50514103], [0.61162945, 0.12880813, 0.41674473, ..., 0.97652357, 0.00874667, 0.10117324], ..., [0.27236194, 0.80752062, 0.32870814, ..., 0.33127256, 0.27448837, 0.04049907], [0.75803561, 0.3023437 , 0.40953296, ..., 0.41166149, 0.05782473, 0.60460466], [0.54209502, 0.77177583, 0.81081577, ..., 0.32133847, 0.86516611, 0.92231743]], [[0.52935632, 0.71359921, 0.95500389, ..., 0.49523286, 0.34773767, 0.60304061], [0.50776562, 0.085269 , 0.38566092, ..., 0.81686683, 0.78306769, 0.67995772], [0.8333419 , 0.44137973, 0.33703999, ..., 0.46596195, 0.34835274, 0.87634407], ... [0.1856812 , 0.03144001, 0.95300717, ..., 0.06460216, 0.06975456, 0.59354467], [0.81160282, 0.56820701, 0.14425859, ..., 0.19822809, 0.22702473, 0.1643339 ], [0.4079898 , 0.48072856, 0.6481851 , ..., 0.59914815, 0.49579475, 0.40973475]], [[0.36633622, 0.15680477, 0.59611613, ..., 0.53226308, 0.95268626, 0.97610412], [0.03067763, 0.93109174, 0.20496723, ..., 0.55886482, 0.95605829, 0.55176005], [0.12329406, 0.4358826 , 0.46839423, ..., 0.69402656, 0.06989202, 0.84391449], ..., [0.6833302 , 0.21588141, 0.66163522, ..., 0.73072985, 0.72357252, 0.15604505], [0.20859619, 0.53943158, 0.67767281, ..., 0.83549851, 0.02629358, 0.46670397], [0.92274174, 0.07095888, 0.63037707, ..., 0.53206717, 0.12700903, 0.25080094]]]) - time(time)float640.0 3.448 6.897 ... 96.55 100.0
- units :
- s
- description :
- time samples for each image frame
array([ 0. , 3.448276, 6.896552, 10.344828, 13.793103, 17.241379, 20.689655, 24.137931, 27.586207, 31.034483, 34.482759, 37.931034, 41.37931 , 44.827586, 48.275862, 51.724138, 55.172414, 58.62069 , 62.068966, 65.517241, 68.965517, 72.413793, 75.862069, 79.310345, 82.758621, 86.206897, 89.655172, 93.103448, 96.551724, 100. ])
- timePandasIndex
PandasIndex(Index([ 0.0, 3.4482758620689653, 6.896551724137931, 10.344827586206897, 13.793103448275861, 17.241379310344826, 20.689655172413794, 24.137931034482758, 27.586206896551722, 31.034482758620687, 34.48275862068965, 37.93103448275862, 41.37931034482759, 44.82758620689655, 48.275862068965516, 51.72413793103448, 55.172413793103445, 58.62068965517241, 62.068965517241374, 65.51724137931033, 68.9655172413793, 72.41379310344827, 75.86206896551724, 79.3103448275862, 82.75862068965517, 86.20689655172413, 89.6551724137931, 93.10344827586206, 96.55172413793103, 100.0], dtype='float64', name='time'))
- units :
- brightness
- description :
- A generated random image stack
- long_name :
- calcium image pixel brightness
Solution
da.to_netcdf('example.nc', engine='netcdf4')Exercise: Open the example.nc file in the HDF5 Viewer at https://myhdf5.hdfgroup.org/ to verify it is a valid HDF5 file, and use it to do the following tasks:
- View the
timevariable as a line plot. - View the
timevalues themselves in a matrix. - View the Image data as a heatmap.
- Find the “description” attribute for the
imagevariable (hint: check theinspecttab)
Changing the encoding for a Variable to Save Space using Compression: zlib and complevel
Large scientific datasets often contain patterns that can be compressed effectively. NetCDF and HDF5 support built-in compression options, which can significantly reduce file size without altering the meaning of the data.
Compression settings such as zlib and complevel allow the file format to store the data more efficiently on disk. In some cases—particularly for structured or low-entropy data—the resulting file can be much smaller while remaining fully compatible with standard tools. Other compression libraries are also supported, but here we’ll just focus on the big picture: when and where compression helps (all options found here, for the interested)
The exercises in this section explore how compression affects file size for different types of data.
Example: Save the da DataArray below with two different encodings: one without zlib compression, and one with zlib compression. How big of a file size reduction is there?
da = xr.DataArray(np.random.random(1_000_000), name='data')da.to_netcdf('data1.nc', engine='netcdf4')
utils.print_file_size('data1.nc', 'No Compression ')
da.to_netcdf('data2.nc', engine='netcdf4', encoding={'data': {'zlib': True, 'complevel': 4}})
utils.print_file_size('data2.nc', 'Yes Compression', )No Compression : 8.01 MB
Yes Compression: 6.75 MBExercise: Save the da DataArray below with two different encodings: one without zlib compression, and one with zlib compression. How big of a file size reduction is there?
da = xr.DataArray(np.linspace(0, 10, 1_000_000), name='data')Solution
da.to_netcdf('data1.nc', engine='netcdf4')
utils.print_file_size('data1.nc', 'No Compression ')
da.to_netcdf('data2.nc', engine='netcdf4', encoding={'data': {'zlib': True, 'complevel': 4}})
utils.print_file_size('data2.nc', 'Yes Compression', )No Compression : 8.01 MB
Yes Compression: 321.15 KBExercise: Save the da DataArray below with two different encodings: one without zlib compression, and one with zlib compression. How big of a file size reduction is there?
da = xr.DataArray(np.arange(1_000_000), name='data')da.to_netcdf('data1.nc', engine='netcdf4')
utils.print_file_size('data1.nc', 'No Compression ')
da.to_netcdf('data2.nc', engine='netcdf4', encoding={'data': {'zlib': True, 'complevel': 4}})
utils.print_file_size('data2.nc', 'Yes Compression', )No Compression : 8.01 MB
Yes Compression: 27.65 KBExercise: Save the da DataArray below with two different encodings: one without zlib compression, and one with zlib compression. How big of a file size reduction is there?
da = xr.DataArray(np.zeros(1_000_000), name='data')da.to_netcdf('data1.nc', engine='netcdf4')
utils.print_file_size('data1.nc', 'No Compression ')
da.to_netcdf('data2.nc', engine='netcdf4', encoding={'data': {'zlib': True, 'complevel': 4}})
utils.print_file_size('data2.nc', 'Yes Compression', )No Compression : 8.01 MB
Yes Compression: 16.03 KBSection 3: Analysis Requires Memory: Monitoring Memory Usage for Chained Pipelines
Even when data is stored efficiently on disk, analysis pipelines can still consume large amounts of memory. This is particularly true when multiple operations are chained together, since intermediate results may temporarily allocate large arrays.
In this section, we monitor the memory usage of different analysis pipelines while performing simple computations on an imaging dataset. By observing how memory usage changes over time, it becomes easier to see how different data-loading strategies affect resource consumption.
These experiments illustrate an important principle: the way data is loaded and processed can matter just as much as the analysis itself.
Exercises
utils.generate_calcium_data_file('calcium_data.nc')Example: Compute Two Different Mean Projections: one over all pixels, and one over a selection of the frame (a “Region of Interest”)
- Update the functions and run the cell
def mean_all_data():
return (
xr.load_dataarray('calcium_data.nc')
.mean(dim='time')
)
def mean_roi_data():
return (
xr.load_dataarray('calcium_data.nc')
.sel(x=slice(0, 100), y=slice(0, 100))
.mean(dim='time')
)- Check that the functions work: plot each of the generated images.
mean_all_data().plot()
plt.figure()
mean_roi_data().plot()- How much memory do each of the two functions use? Plot a comparison between the two functions.
utils.analyze_memory(
mean_all_data,
mean_roi_data,
)Exercise: Modify the second of the mean-projection functions below to use xr.open_dataarray(), which opens the file but doesn’t load the data until it is requested.
- Update the functions and run the cell
def mean_roi_data():
return (
xr.load_dataarray('calcium_data.nc')
.sel(x=slice(0, 100), y=slice(0, 100))
.mean(dim='time')
)
def mean_roi_data2():
return (
xr.load_dataarray('calcium_data.nc')
.sel(x=slice(0, 100), y=slice(0, 100))
.mean(dim='time')
)Solution
def mean_roi_data():
return (
xr.load_dataarray('calcium_data.nc')
.sel(x=slice(0, 100), y=slice(0, 100))
.mean(dim='time')
)
def mean_roi_data2():
return (
xr.open_dataarray('calcium_data.nc')
.sel(x=slice(0, 100), y=slice(0, 100))
.mean(dim='time')
)- Check that the functions still work as before: plot each of the generated images and compare them; despite having different code, they should show the same result.
Solution
mean_roi_data().plot()
plt.figure()
mean_roi_data2().plot()- How much memory do each of the two functions use? Plot a comparison between the two functions. Is there a significant diffrence between the two?
Solution
utils.analyze_memory(
mean_roi_data,
mean_roi_data2,
)Exercise: Modify the third of the mean-projection functions below to use xr.open_dataarray(chunks='auto'), which opens the file, but doesn’t load the data until the full computation is requested (i.e. add .compute()) to the end of the pipeline.
- Update the functions and run the cell
def mean_roi_data():
return (
xr.load_dataarray('calcium_data.nc')
.sel(x=slice(0, 100), y=slice(0, 100))
.mean(dim='time')
)
def mean_roi_data2():
return (
xr.open_dataarray('calcium_data.nc')
.sel(x=slice(0, 100), y=slice(0, 100))
.mean(dim='time')
)
def mean_roi_data3():
return (
xr.open_dataarray('calcium_data.nc')
.sel(x=slice(0, 100), y=slice(0, 100))
.mean(dim='time')
.compute()
)Solution
def mean_roi_data():
return (
xr.load_dataarray('calcium_data.nc')
.sel(x=slice(0, 100), y=slice(0, 100))
.mean(dim='time')
)
def mean_roi_data2():
return (
xr.open_dataarray('calcium_data.nc')
.sel(x=slice(0, 100), y=slice(0, 100))
.mean(dim='time')
)
def mean_roi_data3():
return (
xr.open_dataarray('calcium_data.nc', chunks='auto')
.sel(x=slice(0, 100), y=slice(0, 100))
.mean(dim='time')
.compute()
)- Check that the the functions all still work as before: plot each of the generated images and compare them; despite having different code, they should show the same result.
Solution
mean_roi_data().plot()
plt.figure()
mean_roi_data2().plot()
plt.figure()
mean_roi_data3().plot()- How much memory do each of the three functions use? Plot a comparison between the three functions. Is there a significant diffrence between the three?
Solution
utils.analyze_memory(
mean_roi_data,
mean_roi_data2,
mean_roi_data3,
)Section 4: Monitoring Dask’s Workflow
When Dask executes chunked computations, it constructs a task graph describing how different pieces of the computation depend on each other. A distributed Dask client provides a dashboard that visualizes this process in real time, showing how tasks are scheduled, executed, and completed across workers.
The dashboard allows users to observe how work is distributed, how memory usage changes over time, and how intermediate results flow through the computation graph. This visibility is especially valuable when analyzing large datasets or debugging performance issues.
Exercise: Uncomment and run the following code to shift computations out of the process, into a “Distributed Dask” client (note: this will make it so that the utils.analyze_memory() function can no longer access the dask-processed data; monitoring will have to be done from the dask workers). Open the resulting web page and browse through the sections.
#import dask.distributed
#client = dask.distributed.Client()
#clientExercise: Run the following code over and over, while simultaneously viewing the client monitoring dashboard, looking at different sections of the dashboard, and answer the following questions:
- How many workers are processing the data?
- Do the workers just break up the work evenly from the beginning, or is there more complex cooperation happening between them?
- How many different tasks was the workflow broken down into?
- Is memory released between tasks?
- Is there a simple relationship between each each task, or is there a more complex compute graph being run?
(xr.open_dataarray('calcium_data.nc', chunks='auto')
.sel(x=slice(0, 100), y=slice(0, 100))
.mean(dim='time')
.rolling(x=7, center=True).mean()
.dropna('x')
.rolling(y=7, center=True).mean()
.dropna('y')
.compute()
.plot()
)