Creating Xarray DataArrays
Authors
Setup
Import Libraries
import xarray as xr
import numpy as npThus far, we have worked with data organized in separate numpy arrays, but when working with real data, keeping track of many arrays stored in separate variables and how they relate to one another can quickly become complicated. Xarray helps you store multiple arrays and information about their relationships together in one DataArray, making it easier for the researcher to work with and understand the data.
In this notebook, we will look at how you can put multiple arrays together in xarray DataArrays, access the data stored in a DataArray, and how save it to in a file.
Section 1: Labeling the Indices of a DataArray’s Dimensions and Accessing the Data
| Code | Description |
|---|---|
da = xr.DataArray(data=x, coords={'time': y}, name='sensor') |
Make a DataArray from the equal-length arrays x and y, describing x as a sensor data and y as the time points for each measurement. |
da = xr.DataArray(data=x, coords={'time': y, 'channel': z}, name='sensor') |
Make a 2D DataArray from x, y, and z, where z is the channel names in the sensor data. |
da['time'] |
Get all time points at which data was recorded |
da['channel'] |
Get the names of all channels |
da.loc[1:1.5] |
Get the sensor data from time points 1-1.5 secs. |
da.loc[1:1.5, :] |
Get the sensor data from time points 1-1.5 secs, and all channels. |
da.loc[1:1.5, ['CHAN-2, 'CHAN-4]] |
Get the sensor data from time points 1-1.5 secs and the channels labeled ‘CHAN-2’ and ‘CHAN-4’. |
da.sel(channel='CHAN-3') |
Get the sensor data across the whole time period from the channel labeled ‘CHAN-3’ |
Exercises
Example: A DataArray can be made by simply passing a regular numpy array with the data to the xarray DataArray constructor.
data = np.random.random(size = 10)
dataarray([0.58489453, 0.30931659, 0.35022568, 0.50525519, 0.83252742,
0.78679708, 0.66900383, 0.65463909, 0.55992887, 0.37307137])data_xr = xr.DataArray(data)
data_xr<xarray.DataArray (dim_0: 10)> Size: 80B array([0.58489453, 0.30931659, 0.35022568, 0.50525519, 0.83252742, 0.78679708, 0.66900383, 0.65463909, 0.55992887, 0.37307137]) Dimensions without coordinates: dim_0
- dim_0: 10
- 0.5849 0.3093 0.3502 0.5053 0.8325 0.7868 0.669 0.6546 0.5599 0.3731
array([0.58489453, 0.30931659, 0.35022568, 0.50525519, 0.83252742, 0.78679708, 0.66900383, 0.65463909, 0.55992887, 0.37307137])
Example: When we display the resulting DataArray, we see that there is more information that can be added. That’s the strength and benefit of DataArrays; but we’re not taking advantage of it in the example above. In the following example, we include time information - the month - for which a given data point is recorded. In this hypothetical scenario, it’s the sale of hiking boots in a sportswear store over the course of a year.
months = ['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
hiking_boots_sold = np.random.randint(low=2, high = 50, size = len(months))
hiking_boots_soldarray([39, 6, 6, 40, 6, 33, 38, 22, 32, 48, 18, 3])data_boots = xr.DataArray(
data=hiking_boots_sold,
coords={'month': months},
name='sale_hiking_boots',
)
data_boots<xarray.DataArray 'sale_hiking_boots' (month: 12)> Size: 96B array([39, 6, 6, 40, 6, 33, 38, 22, 32, 48, 18, 3]) Coordinates:
- month (month) <U9 432B 'Januar' 'February' … 'November' 'December'xarray.DataArray‘sale_hiking_boots’
- month: 12
- 39 6 6 40 6 33 38 22 32 48 18 3
array([39, 6, 6, 40, 6, 33, 38, 22, 32, 48, 18, 3])
- month(month)<U9'Januar' 'February' … 'December'
array(['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], dtype='<U9')
- monthPandasIndex
PandasIndex(Index(['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], dtype='object', name='month'))
Exercise: Make a DataArray out of the following variables containing numpy arrays.
Solution
days = np.linspace(1,365,365,dtype=int)
hours_of_sunlight = np.random.uniform(low=0, high = 16, size=len(days))data_sun = xr.DataArray(
data = hours_of_sunlight,
coords = {'day': days},
name = 'hours_of_sunlight_over_year'
)
data_sun<xarray.DataArray 'hours_of_sunlight_over_year' (day: 365)> Size: 3kB array([8.23006743e+00, 7.21694392e+00, 3.74655378e+00, 8.92461200e+00, 9.38304235e+00, 8.92751010e+00, 3.77495059e+00, 1.31636567e+01, 2.45902953e+00, 8.36089037e+00, 7.09654038e+00, 1.48757403e+01, 1.16164646e+01, 9.80515275e+00, 7.08769855e+00, 8.52507807e+00, 3.11079725e+00, 7.91321694e+00, 1.11614207e+01, 1.48038049e+01, 8.91518155e+00, 5.33508301e-01, 2.47056017e+00, 6.91468774e+00, 1.17036281e+01, 8.77655657e+00, 7.17807369e+00, 1.36831349e+00, 4.61231869e-01, 4.64060336e+00, 6.83088895e+00, 6.06679933e+00, 2.45094377e+00, 7.01031672e-01, 1.11239269e+01, 1.43514494e+01, 3.29001670e-01, 1.02294294e+01, 6.81727355e+00, 9.18526246e+00, 7.79739073e+00, 1.41748651e+01, 1.09333584e+01, 5.92642428e+00, 7.79074386e+00, 4.53065382e-01, 1.12618476e+01, 6.21238801e+00, 1.12555484e+01, 1.20050358e+01, 6.02066390e+00, 1.53692216e+01, 1.59851780e+01, 1.76377694e+00, 1.58066919e+01, 7.57641734e+00, 3.90439098e+00, 9.46307778e+00, 1.02639021e+00, 4.35392115e+00, 3.46629355e+00, 8.00917271e+00, 1.56924200e+01, 3.35086470e+00, 5.81554633e+00, 7.80686597e+00, 8.82128041e+00, 1.51389988e+01, 8.24378132e+00, 5.40782027e+00, 5.95643967e+00, 1.33394170e+00, 1.47696565e+01, 1.47813671e+00, 1.48453840e+01, 6.26934476e+00, 5.41620327e+00, 6.11286275e+00, 6.49687445e+00, 1.08877683e+01, … 9.84325244e+00, 1.01528923e+00, 3.15687929e+00, 9.77263943e+00, 9.96574333e+00, 9.42822240e+00, 6.55878955e-01, 1.17890842e+00, 6.50210208e+00, 1.30461117e+01, 1.27016934e+01, 1.74379897e+00, 5.08618410e+00, 1.11982102e+01, 1.18590474e+01, 2.71737149e+00, 1.46177633e+01, 1.31257279e+01, 1.16233467e+01, 6.16799363e+00, 4.28178084e+00, 7.83740481e+00, 1.29375046e+01, 6.06850805e+00, 1.32126785e+01, 6.39787765e+00, 1.56068444e+01, 1.51391095e+01, 1.51110788e+01, 5.43832742e+00, 8.26582415e+00, 1.87781332e+00, 6.74993425e+00, 9.21053554e+00, 1.35123060e+01, 1.53834993e+01, 1.02323070e+01, 1.28699584e+01, 1.17486530e+01, 4.87753529e+00, 6.55916363e+00, 4.16106549e+00, 1.24088568e+01, 2.53296482e+00, 2.81155073e+00, 9.35439130e+00, 9.83815547e+00, 1.20805543e+01, 9.91077458e+00, 1.17941470e+01, 3.73621124e+00, 6.84984768e-01, 7.11782867e+00, 1.49531769e+01, 1.32211583e+01, 3.99443547e+00, 2.04934036e+00, 1.38590038e+01, 1.17655673e+01, 7.11167642e+00, 4.77290576e+00, 3.36814979e+00, 5.25550318e+00, 7.29273883e+00, 1.57836152e+01, 1.29942957e+01, 1.26501592e+01, 1.38997670e+01, 9.35196810e+00, 1.58813368e+01, 3.86108730e+00, 1.54960821e+01, 1.16770869e+01, 1.38766505e+01, 1.25933240e+01, 2.27569293e+00, 8.90981941e+00]) Coordinates:
- day (day) int64 3kB 1 2 3 4 5 6 7 8 … 358 359 360 361 362 363 364 365xarray.DataArray‘hours_of_sunlight_over_year’
- day: 365
- 8.23 7.217 3.747 8.925 9.383 8.928 … 11.68 13.88 12.59 2.276 8.91
array([8.23006743e+00, 7.21694392e+00, 3.74655378e+00, 8.92461200e+00, 9.38304235e+00, 8.92751010e+00, 3.77495059e+00, 1.31636567e+01, 2.45902953e+00, 8.36089037e+00, 7.09654038e+00, 1.48757403e+01, 1.16164646e+01, 9.80515275e+00, 7.08769855e+00, 8.52507807e+00, 3.11079725e+00, 7.91321694e+00, 1.11614207e+01, 1.48038049e+01, 8.91518155e+00, 5.33508301e-01, 2.47056017e+00, 6.91468774e+00, 1.17036281e+01, 8.77655657e+00, 7.17807369e+00, 1.36831349e+00, 4.61231869e-01, 4.64060336e+00, 6.83088895e+00, 6.06679933e+00, 2.45094377e+00, 7.01031672e-01, 1.11239269e+01, 1.43514494e+01, 3.29001670e-01, 1.02294294e+01, 6.81727355e+00, 9.18526246e+00, 7.79739073e+00, 1.41748651e+01, 1.09333584e+01, 5.92642428e+00, 7.79074386e+00, 4.53065382e-01, 1.12618476e+01, 6.21238801e+00, 1.12555484e+01, 1.20050358e+01, 6.02066390e+00, 1.53692216e+01, 1.59851780e+01, 1.76377694e+00, 1.58066919e+01, 7.57641734e+00, 3.90439098e+00, 9.46307778e+00, 1.02639021e+00, 4.35392115e+00, 3.46629355e+00, 8.00917271e+00, 1.56924200e+01, 3.35086470e+00, 5.81554633e+00, 7.80686597e+00, 8.82128041e+00, 1.51389988e+01, 8.24378132e+00, 5.40782027e+00, 5.95643967e+00, 1.33394170e+00, 1.47696565e+01, 1.47813671e+00, 1.48453840e+01, 6.26934476e+00, 5.41620327e+00, 6.11286275e+00, 6.49687445e+00, 1.08877683e+01, … 9.84325244e+00, 1.01528923e+00, 3.15687929e+00, 9.77263943e+00, 9.96574333e+00, 9.42822240e+00, 6.55878955e-01, 1.17890842e+00, 6.50210208e+00, 1.30461117e+01, 1.27016934e+01, 1.74379897e+00, 5.08618410e+00, 1.11982102e+01, 1.18590474e+01, 2.71737149e+00, 1.46177633e+01, 1.31257279e+01, 1.16233467e+01, 6.16799363e+00, 4.28178084e+00, 7.83740481e+00, 1.29375046e+01, 6.06850805e+00, 1.32126785e+01, 6.39787765e+00, 1.56068444e+01, 1.51391095e+01, 1.51110788e+01, 5.43832742e+00, 8.26582415e+00, 1.87781332e+00, 6.74993425e+00, 9.21053554e+00, 1.35123060e+01, 1.53834993e+01, 1.02323070e+01, 1.28699584e+01, 1.17486530e+01, 4.87753529e+00, 6.55916363e+00, 4.16106549e+00, 1.24088568e+01, 2.53296482e+00, 2.81155073e+00, 9.35439130e+00, 9.83815547e+00, 1.20805543e+01, 9.91077458e+00, 1.17941470e+01, 3.73621124e+00, 6.84984768e-01, 7.11782867e+00, 1.49531769e+01, 1.32211583e+01, 3.99443547e+00, 2.04934036e+00, 1.38590038e+01, 1.17655673e+01, 7.11167642e+00, 4.77290576e+00, 3.36814979e+00, 5.25550318e+00, 7.29273883e+00, 1.57836152e+01, 1.29942957e+01, 1.26501592e+01, 1.38997670e+01, 9.35196810e+00, 1.58813368e+01, 3.86108730e+00, 1.54960821e+01, 1.16770869e+01, 1.38766505e+01, 1.25933240e+01, 2.27569293e+00, 8.90981941e+00])
- day(day)int641 2 3 4 5 6 … 361 362 363 364 365
array([ 1, 2, 3, …, 363, 364, 365])
- dayPandasIndex
PandasIndex(Index([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, … 356, 357, 358, 359, 360, 361, 362, 363, 364, 365], dtype='int64', name='day', length=365))
Exercise: Get the array containing the days throughout the year.
Solution
data_sun['day']<xarray.DataArray 'day' (day: 365)> Size: 3kB array([ 1, 2, 3, …, 363, 364, 365]) Coordinates:
- day (day) int64 3kB 1 2 3 4 5 6 7 8 … 358 359 360 361 362 363 364 365xarray.DataArray‘day’
- day: 365
- 1 2 3 4 5 6 7 8 9 10 11 … 356 357 358 359 360 361 362 363 364 365
array([ 1, 2, 3, …, 363, 364, 365])
- day(day)int641 2 3 4 5 6 … 361 362 363 364 365
array([ 1, 2, 3, …, 363, 364, 365])
- dayPandasIndex
PandasIndex(Index([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, … 356, 357, 358, 359, 360, 361, 362, 363, 364, 365], dtype='int64', name='day', length=365))
Exercise: Get the data on hours of sunlight for day number 3 through 11 using the loc method.
Solution
data_sun.loc[3:11]<xarray.DataArray 'hours_of_sunlight_over_year' (day: 9)> Size: 72B array([ 3.74655378, 8.924612 , 9.38304235, 8.9275101 , 3.77495059, 13.16365669, 2.45902953, 8.36089037, 7.09654038]) Coordinates:
- day (day) int64 72B 3 4 5 6 7 8 9 10 11xarray.DataArray‘hours_of_sunlight_over_year’
- day: 9
- 3.747 8.925 9.383 8.928 3.775 13.16 2.459 8.361 7.097
array([ 3.74655378, 8.924612 , 9.38304235, 8.9275101 , 3.77495059, 13.16365669, 2.45902953, 8.36089037, 7.09654038])
- day(day)int643 4 5 6 7 8 9 10 11
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11])
- dayPandasIndex
PandasIndex(Index([3, 4, 5, 6, 7, 8, 9, 10, 11], dtype='int64', name='day'))
Exercise: Get the data on hours of sunlight for day number 3 through 11 using regular indexing for arrays. Do you notice a difference in which indeces you use to access the data?
Solution
data_sun[2:11]<xarray.DataArray 'hours_of_sunlight_over_year' (day: 9)> Size: 72B array([ 3.74655378, 8.924612 , 9.38304235, 8.9275101 , 3.77495059, 13.16365669, 2.45902953, 8.36089037, 7.09654038]) Coordinates:
- day (day) int64 72B 3 4 5 6 7 8 9 10 11xarray.DataArray‘hours_of_sunlight_over_year’
- day: 9
- 3.747 8.925 9.383 8.928 3.775 13.16 2.459 8.361 7.097
array([ 3.74655378, 8.924612 , 9.38304235, 8.9275101 , 3.77495059, 13.16365669, 2.45902953, 8.36089037, 7.09654038])
- day(day)int643 4 5 6 7 8 9 10 11
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11])
- dayPandasIndex
PandasIndex(Index([3, 4, 5, 6, 7, 8, 9, 10, 11], dtype='int64', name='day'))
Exercise: In the hiking boots DataArray from the example, get the number of hiking boots sold in October using the loc method.
Solution
data_boots.loc['October']<xarray.DataArray 'sale_hiking_boots' ()> Size: 8B array(48) Coordinates: month <U9 36B 'October'
- 48
array(48)
- month()<U9'October'
array('October', dtype='<U9')
Exercise: Creating DataArrays with multidimensional data. Let’s say that the company selling hiking boots has stores in multiple cities - Cologne, Berlin, and Munich - and that you want to store data on sales in all three cities throughout the year. In this case, you’re storing multidimensional data; data across time and space, similar to neuroscience data.
months = ['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
cities = ['Cologne', 'Berlin', 'Munich']
hiking_boots_sold = np.random.randint(low=2, high = 50, size = (len(months), len(cities)))
hiking_boots_soldarray([[11, 38, 45],
[37, 25, 13],
[ 4, 21, 11],
[45, 19, 20],
[ 9, 29, 28],
[ 8, 5, 49],
[10, 22, 23],
[28, 25, 12],
[43, 5, 28],
[13, 18, 9],
[ 6, 30, 33],
[20, 20, 16]])data_boots_cities = xr.DataArray(
data=hiking_boots_sold,
coords={'month': months, 'city': cities},
name='hiking_boots_sold_different_cities'
)
data_boots_cities<xarray.DataArray 'hiking_boots_sold_different_cities' (month: 12, city: 3)> Size: 288B array([[11, 38, 45], [37, 25, 13], [ 4, 21, 11], [45, 19, 20], [ 9, 29, 28], [ 8, 5, 49], [10, 22, 23], [28, 25, 12], [43, 5, 28], [13, 18, 9], [ 6, 30, 33], [20, 20, 16]]) Coordinates:
- month (month) <U9 432B 'Januar' 'February' … 'November' 'December'
- city (city) <U7 84B 'Cologne' 'Berlin' 'Munich'xarray.DataArray‘hiking_boots_sold_different_cities’
- month: 12
- city: 3
- 11 38 45 37 25 13 4 21 11 45 19 20 … 5 28 13 18 9 6 30 33 20 20 16
array([[11, 38, 45], [37, 25, 13], [ 4, 21, 11], [45, 19, 20], [ 9, 29, 28], [ 8, 5, 49], [10, 22, 23], [28, 25, 12], [43, 5, 28], [13, 18, 9], [ 6, 30, 33], [20, 20, 16]])
- month(month)<U9'Januar' 'February' … 'December'
array(['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], dtype='<U9')
- city(city)<U7'Cologne' 'Berlin' 'Munich'
array(['Cologne', 'Berlin', 'Munich'], dtype='<U7')
- monthPandasIndex
PandasIndex(Index(['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], dtype='object', name='month'))
- cityPandasIndex
PandasIndex(Index(['Cologne', 'Berlin', 'Munich'], dtype='object', name='city'))
Exercise: Make a 2-D array with data on sunlight throughout the year in Germany, France, and Italy using the variables in the cell below.
Solution
days = np.linspace(1,365,365,dtype=int)
countries = ['Germany', 'France', 'Italy']
hours_of_sunlight = np.random.uniform(low=0, high = 16, size=(len(days), len(countries)))data_sun_country = xr.DataArray(
data = hours_of_sunlight,
coords={'day': days, 'country': countries},
name='hours_of_sunlight_countries'
)
data_sun_country<xarray.DataArray 'hours_of_sunlight_countries' (day: 365, country: 3)> Size: 9kB array([[10.62911822, 0.13318533, 5.50667079], [10.20166023, 15.54403471, 6.6744995 ], [15.91543282, 6.52257226, 13.96066533], …, [ 2.5423545 , 9.90954904, 12.38464112], [ 3.10759408, 8.16055736, 10.55890302], [13.99674588, 2.43413591, 2.84435315]]) Coordinates:
- day (day) int64 3kB 1 2 3 4 5 6 7 8 … 358 359 360 361 362 363 364 365
- country (country) <U7 84B 'Germany' 'France' 'Italy'xarray.DataArray‘hours_of_sunlight_countries’
- day: 365
- country: 3
- 10.63 0.1332 5.507 10.2 15.54 6.674 … 8.161 10.56 14.0 2.434 2.844
array([[10.62911822, 0.13318533, 5.50667079], [10.20166023, 15.54403471, 6.6744995 ], [15.91543282, 6.52257226, 13.96066533], …, [ 2.5423545 , 9.90954904, 12.38464112], [ 3.10759408, 8.16055736, 10.55890302], [13.99674588, 2.43413591, 2.84435315]])
- day(day)int641 2 3 4 5 6 … 361 362 363 364 365
array([ 1, 2, 3, …, 363, 364, 365])
- country(country)<U7'Germany' 'France' 'Italy'
array(['Germany', 'France', 'Italy'], dtype='<U7')
- dayPandasIndex
PandasIndex(Index([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, … 356, 357, 358, 359, 360, 361, 362, 363, 364, 365], dtype='int64', name='day', length=365))
- countryPandasIndex
PandasIndex(Index(['Germany', 'France', 'Italy'], dtype='object', name='country'))
Exercise: Get the data on hours of sunlight from day 3 to 11 for Italy using the loc function.
Solution
data_sun_country.loc[3:11, 'Italy']<xarray.DataArray 'hours_of_sunlight_countries' (day: 9)> Size: 72B array([13.96066533, 1.17941739, 11.09906937, 14.0559088 , 11.17591572, 10.79901602, 12.55835353, 1.38706644, 9.55820968]) Coordinates:
- day (day) int64 72B 3 4 5 6 7 8 9 10 11
country <U7 28B 'Italy'xarray.DataArray‘hours_of_sunlight_countries’
- day: 9
- 13.96 1.179 11.1 14.06 11.18 10.8 12.56 1.387 9.558
array([13.96066533, 1.17941739, 11.09906937, 14.0559088 , 11.17591572, 10.79901602, 12.55835353, 1.38706644, 9.55820968])
- day(day)int643 4 5 6 7 8 9 10 11
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11])
- country()<U7'Italy'
array('Italy', dtype='<U7')
- dayPandasIndex
PandasIndex(Index([3, 4, 5, 6, 7, 8, 9, 10, 11], dtype='int64', name='day'))
Exercise: Get the data on hours of sunlight from day 3 to 11 for both Germany and France together.
Solution
data_sun_country.loc[3:11, ['Germany', 'France']]<xarray.DataArray 'hours_of_sunlight_countries' (day: 9, country: 2)> Size: 144B array([[15.91543282, 6.52257226], [ 3.05188703, 12.36757178], [ 5.86002647, 12.3537221 ], [ 9.81929209, 9.54686706], [11.81800274, 13.86272208], [ 0.78291768, 11.79843803], [ 8.8559426 , 5.45258128], [14.32486253, 4.11157752], [ 7.35287893, 8.44928813]]) Coordinates:
- day (day) int64 72B 3 4 5 6 7 8 9 10 11
- country (country) <U7 56B 'Germany' 'France'xarray.DataArray‘hours_of_sunlight_countries’
- day: 9
- country: 2
- 15.92 6.523 3.052 12.37 5.86 12.35 … 5.453 14.32 4.112 7.353 8.449
array([[15.91543282, 6.52257226], [ 3.05188703, 12.36757178], [ 5.86002647, 12.3537221 ], [ 9.81929209, 9.54686706], [11.81800274, 13.86272208], [ 0.78291768, 11.79843803], [ 8.8559426 , 5.45258128], [14.32486253, 4.11157752], [ 7.35287893, 8.44928813]])
- day(day)int643 4 5 6 7 8 9 10 11
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11])
- country(country)<U7'Germany' 'France'
array(['Germany', 'France'], dtype='<U7')
- dayPandasIndex
PandasIndex(Index([3, 4, 5, 6, 7, 8, 9, 10, 11], dtype='int64', name='day'))
- countryPandasIndex
PandasIndex(Index(['Germany', 'France'], dtype='object', name='country'))
Section 2: Saving DataArray to File.
After the DataArray is constructed, you want to save it to a file so that you can load it and continue to work on it later or share it with others.
| Code | Description |
|---|---|
da.to_netcdf('data/filename.nc') |
Write the DataArray variable named “da” to a file with a filename of your choosing in the data directory |
data = xr.load_dataarray('data/filename.nc') |
Load the DataArray and put it in a variable |
Run the cell below to create the data directory if it doesn’t already exist.
Exercises
from pathlib import Path
data_dir = Path('data')
data_dir.mkdir(exist_ok=True, parents=True)Exercise: Write the DataArray data on sunlight per day to file to save it.
Solution
#Write to file
data_boots_cities.to_netcdf('data/hiking_boots_sold.nc')Exercise: Load the DataArray data on sunlight per day in different countries you saved to a variable. Display the variable to check that the data was stored correctly.
Solution
data = xr.load_dataarray('data/hiking_boots_sold.nc')
data<xarray.DataArray 'hiking_boots_sold_different_cities' (month: 12, city: 3)> Size: 288B array([[11, 38, 45], [37, 25, 13], [ 4, 21, 11], [45, 19, 20], [ 9, 29, 28], [ 8, 5, 49], [10, 22, 23], [28, 25, 12], [43, 5, 28], [13, 18, 9], [ 6, 30, 33], [20, 20, 16]]) Coordinates:
- month (month) <U9 432B 'Januar' 'February' … 'November' 'December'
- city (city) <U7 84B 'Cologne' 'Berlin' 'Munich'xarray.DataArray‘hiking_boots_sold_different_cities’
- month: 12
- city: 3
- 11 38 45 37 25 13 4 21 11 45 19 20 … 5 28 13 18 9 6 30 33 20 20 16
array([[11, 38, 45], [37, 25, 13], [ 4, 21, 11], [45, 19, 20], [ 9, 29, 28], [ 8, 5, 49], [10, 22, 23], [28, 25, 12], [43, 5, 28], [13, 18, 9], [ 6, 30, 33], [20, 20, 16]])
- month(month)<U9'Januar' 'February' … 'December'
array(['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], dtype='<U9')
- city(city)<U7'Cologne' 'Berlin' 'Munich'
array(['Cologne', 'Berlin', 'Munich'], dtype='<U7')
- monthPandasIndex
PandasIndex(Index(['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], dtype='object', name='month'))
- cityPandasIndex
PandasIndex(Index(['Cologne', 'Berlin', 'Munich'], dtype='object', name='city'))