Creating Xarray DataArrays

Authors
Dr. Nicholas Del Grosso | Dr. Sangeetha Nandakumar | Dr. Ole Bialas | Dr. Atle E. Rimehaug

Setup

Import Libraries

import xarray as xr
import numpy as np

Thus far, we have worked with data organized in separate numpy arrays, but when working with real data, keeping track of many arrays stored in separate variables and how they relate to one another can quickly become complicated. Xarray helps you store multiple arrays and information about their relationships together in one DataArray, making it easier for the researcher to work with and understand the data.

In this notebook, we will look at how you can put multiple arrays together in xarray DataArrays, access the data stored in a DataArray, and how save it to in a file.

Section 1: Labeling the Indices of a DataArray’s Dimensions and Accessing the Data

Code Description
da = xr.DataArray(data=x, coords={'time': y}, name='sensor') Make a DataArray from the equal-length arrays x and y, describing x as a sensor data and y as the time points for each measurement.
da = xr.DataArray(data=x, coords={'time': y, 'channel': z}, name='sensor') Make a 2D DataArray from x, y, and z, where z is the channel names in the sensor data.
da['time'] Get all time points at which data was recorded
da['channel'] Get the names of all channels
da.loc[1:1.5] Get the sensor data from time points 1-1.5 secs.
da.loc[1:1.5, :] Get the sensor data from time points 1-1.5 secs, and all channels.
da.loc[1:1.5, ['CHAN-2, 'CHAN-4]] Get the sensor data from time points 1-1.5 secs and the channels labeled ‘CHAN-2’ and ‘CHAN-4’.
da.sel(channel='CHAN-3') Get the sensor data across the whole time period from the channel labeled ‘CHAN-3’

Exercises

Example: A DataArray can be made by simply passing a regular numpy array with the data to the xarray DataArray constructor.

data = np.random.random(size = 10)
data
array([0.58489453, 0.30931659, 0.35022568, 0.50525519, 0.83252742,
       0.78679708, 0.66900383, 0.65463909, 0.55992887, 0.37307137])
data_xr = xr.DataArray(data)
data_xr
<xarray.DataArray (dim_0: 10)> Size: 80B
array([0.58489453, 0.30931659, 0.35022568, 0.50525519, 0.83252742,
0.78679708, 0.66900383, 0.65463909, 0.55992887, 0.37307137])
Dimensions without coordinates: dim_0
xarray.DataArray
  • dim_0: 10
  • 0.5849 0.3093 0.3502 0.5053 0.8325 0.7868 0.669 0.6546 0.5599 0.3731
    array([0.58489453, 0.30931659, 0.35022568, 0.50525519, 0.83252742,
    0.78679708, 0.66900383, 0.65463909, 0.55992887, 0.37307137])

      Example: When we display the resulting DataArray, we see that there is more information that can be added. That’s the strength and benefit of DataArrays; but we’re not taking advantage of it in the example above. In the following example, we include time information - the month - for which a given data point is recorded. In this hypothetical scenario, it’s the sale of hiking boots in a sportswear store over the course of a year.

      months = ['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
      hiking_boots_sold = np.random.randint(low=2, high = 50, size = len(months))
      hiking_boots_sold
      array([39,  6,  6, 40,  6, 33, 38, 22, 32, 48, 18,  3])
      data_boots = xr.DataArray(
          data=hiking_boots_sold, 
          coords={'month': months}, 
          name='sale_hiking_boots',
      )
      data_boots
      <xarray.DataArray 'sale_hiking_boots' (month: 12)> Size: 96B
      array([39,  6,  6, 40,  6, 33, 38, 22, 32, 48, 18,  3])
      Coordinates:

      • month (month) <U9 432B 'Januar' 'February' … 'November' 'December'
      xarray.DataArray
      ‘sale_hiking_boots’
      • month: 12
      • 39 6 6 40 6 33 38 22 32 48 18 3
        array([39,  6,  6, 40,  6, 33, 38, 22, 32, 48, 18,  3])
        • month
          (month)
          <U9
          'Januar' 'February' … 'December'
          array(['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August',
          'September', 'October', 'November', 'December'], dtype='<U9')
        • month
          PandasIndex
          PandasIndex(Index(['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August',
          'September', 'October', 'November', 'December'],
          dtype='object', name='month'))

      Exercise: Make a DataArray out of the following variables containing numpy arrays.

      Solution
      days = np.linspace(1,365,365,dtype=int)
      hours_of_sunlight = np.random.uniform(low=0, high = 16, size=len(days))
      data_sun = xr.DataArray(
          data = hours_of_sunlight,
          coords = {'day': days},
          name = 'hours_of_sunlight_over_year'
      )
      data_sun
      <xarray.DataArray 'hours_of_sunlight_over_year' (day: 365)> Size: 3kB
      array([8.23006743e+00, 7.21694392e+00, 3.74655378e+00, 8.92461200e+00,
      9.38304235e+00, 8.92751010e+00, 3.77495059e+00, 1.31636567e+01,
      2.45902953e+00, 8.36089037e+00, 7.09654038e+00, 1.48757403e+01,
      1.16164646e+01, 9.80515275e+00, 7.08769855e+00, 8.52507807e+00,
      3.11079725e+00, 7.91321694e+00, 1.11614207e+01, 1.48038049e+01,
      8.91518155e+00, 5.33508301e-01, 2.47056017e+00, 6.91468774e+00,
      1.17036281e+01, 8.77655657e+00, 7.17807369e+00, 1.36831349e+00,
      4.61231869e-01, 4.64060336e+00, 6.83088895e+00, 6.06679933e+00,
      2.45094377e+00, 7.01031672e-01, 1.11239269e+01, 1.43514494e+01,
      3.29001670e-01, 1.02294294e+01, 6.81727355e+00, 9.18526246e+00,
      7.79739073e+00, 1.41748651e+01, 1.09333584e+01, 5.92642428e+00,
      7.79074386e+00, 4.53065382e-01, 1.12618476e+01, 6.21238801e+00,
      1.12555484e+01, 1.20050358e+01, 6.02066390e+00, 1.53692216e+01,
      1.59851780e+01, 1.76377694e+00, 1.58066919e+01, 7.57641734e+00,
      3.90439098e+00, 9.46307778e+00, 1.02639021e+00, 4.35392115e+00,
      3.46629355e+00, 8.00917271e+00, 1.56924200e+01, 3.35086470e+00,
      5.81554633e+00, 7.80686597e+00, 8.82128041e+00, 1.51389988e+01,
      8.24378132e+00, 5.40782027e+00, 5.95643967e+00, 1.33394170e+00,
      1.47696565e+01, 1.47813671e+00, 1.48453840e+01, 6.26934476e+00,
      5.41620327e+00, 6.11286275e+00, 6.49687445e+00, 1.08877683e+01,
      …
      9.84325244e+00, 1.01528923e+00, 3.15687929e+00, 9.77263943e+00,
      9.96574333e+00, 9.42822240e+00, 6.55878955e-01, 1.17890842e+00,
      6.50210208e+00, 1.30461117e+01, 1.27016934e+01, 1.74379897e+00,
      5.08618410e+00, 1.11982102e+01, 1.18590474e+01, 2.71737149e+00,
      1.46177633e+01, 1.31257279e+01, 1.16233467e+01, 6.16799363e+00,
      4.28178084e+00, 7.83740481e+00, 1.29375046e+01, 6.06850805e+00,
      1.32126785e+01, 6.39787765e+00, 1.56068444e+01, 1.51391095e+01,
      1.51110788e+01, 5.43832742e+00, 8.26582415e+00, 1.87781332e+00,
      6.74993425e+00, 9.21053554e+00, 1.35123060e+01, 1.53834993e+01,
      1.02323070e+01, 1.28699584e+01, 1.17486530e+01, 4.87753529e+00,
      6.55916363e+00, 4.16106549e+00, 1.24088568e+01, 2.53296482e+00,
      2.81155073e+00, 9.35439130e+00, 9.83815547e+00, 1.20805543e+01,
      9.91077458e+00, 1.17941470e+01, 3.73621124e+00, 6.84984768e-01,
      7.11782867e+00, 1.49531769e+01, 1.32211583e+01, 3.99443547e+00,
      2.04934036e+00, 1.38590038e+01, 1.17655673e+01, 7.11167642e+00,
      4.77290576e+00, 3.36814979e+00, 5.25550318e+00, 7.29273883e+00,
      1.57836152e+01, 1.29942957e+01, 1.26501592e+01, 1.38997670e+01,
      9.35196810e+00, 1.58813368e+01, 3.86108730e+00, 1.54960821e+01,
      1.16770869e+01, 1.38766505e+01, 1.25933240e+01, 2.27569293e+00,
      8.90981941e+00])
      Coordinates:

      • day (day) int64 3kB 1 2 3 4 5 6 7 8 … 358 359 360 361 362 363 364 365
      xarray.DataArray
      ‘hours_of_sunlight_over_year’
      • day: 365
      • 8.23 7.217 3.747 8.925 9.383 8.928 … 11.68 13.88 12.59 2.276 8.91
        array([8.23006743e+00, 7.21694392e+00, 3.74655378e+00, 8.92461200e+00,
        9.38304235e+00, 8.92751010e+00, 3.77495059e+00, 1.31636567e+01,
        2.45902953e+00, 8.36089037e+00, 7.09654038e+00, 1.48757403e+01,
        1.16164646e+01, 9.80515275e+00, 7.08769855e+00, 8.52507807e+00,
        3.11079725e+00, 7.91321694e+00, 1.11614207e+01, 1.48038049e+01,
        8.91518155e+00, 5.33508301e-01, 2.47056017e+00, 6.91468774e+00,
        1.17036281e+01, 8.77655657e+00, 7.17807369e+00, 1.36831349e+00,
        4.61231869e-01, 4.64060336e+00, 6.83088895e+00, 6.06679933e+00,
        2.45094377e+00, 7.01031672e-01, 1.11239269e+01, 1.43514494e+01,
        3.29001670e-01, 1.02294294e+01, 6.81727355e+00, 9.18526246e+00,
        7.79739073e+00, 1.41748651e+01, 1.09333584e+01, 5.92642428e+00,
        7.79074386e+00, 4.53065382e-01, 1.12618476e+01, 6.21238801e+00,
        1.12555484e+01, 1.20050358e+01, 6.02066390e+00, 1.53692216e+01,
        1.59851780e+01, 1.76377694e+00, 1.58066919e+01, 7.57641734e+00,
        3.90439098e+00, 9.46307778e+00, 1.02639021e+00, 4.35392115e+00,
        3.46629355e+00, 8.00917271e+00, 1.56924200e+01, 3.35086470e+00,
        5.81554633e+00, 7.80686597e+00, 8.82128041e+00, 1.51389988e+01,
        8.24378132e+00, 5.40782027e+00, 5.95643967e+00, 1.33394170e+00,
        1.47696565e+01, 1.47813671e+00, 1.48453840e+01, 6.26934476e+00,
        5.41620327e+00, 6.11286275e+00, 6.49687445e+00, 1.08877683e+01,
        …
        9.84325244e+00, 1.01528923e+00, 3.15687929e+00, 9.77263943e+00,
        9.96574333e+00, 9.42822240e+00, 6.55878955e-01, 1.17890842e+00,
        6.50210208e+00, 1.30461117e+01, 1.27016934e+01, 1.74379897e+00,
        5.08618410e+00, 1.11982102e+01, 1.18590474e+01, 2.71737149e+00,
        1.46177633e+01, 1.31257279e+01, 1.16233467e+01, 6.16799363e+00,
        4.28178084e+00, 7.83740481e+00, 1.29375046e+01, 6.06850805e+00,
        1.32126785e+01, 6.39787765e+00, 1.56068444e+01, 1.51391095e+01,
        1.51110788e+01, 5.43832742e+00, 8.26582415e+00, 1.87781332e+00,
        6.74993425e+00, 9.21053554e+00, 1.35123060e+01, 1.53834993e+01,
        1.02323070e+01, 1.28699584e+01, 1.17486530e+01, 4.87753529e+00,
        6.55916363e+00, 4.16106549e+00, 1.24088568e+01, 2.53296482e+00,
        2.81155073e+00, 9.35439130e+00, 9.83815547e+00, 1.20805543e+01,
        9.91077458e+00, 1.17941470e+01, 3.73621124e+00, 6.84984768e-01,
        7.11782867e+00, 1.49531769e+01, 1.32211583e+01, 3.99443547e+00,
        2.04934036e+00, 1.38590038e+01, 1.17655673e+01, 7.11167642e+00,
        4.77290576e+00, 3.36814979e+00, 5.25550318e+00, 7.29273883e+00,
        1.57836152e+01, 1.29942957e+01, 1.26501592e+01, 1.38997670e+01,
        9.35196810e+00, 1.58813368e+01, 3.86108730e+00, 1.54960821e+01,
        1.16770869e+01, 1.38766505e+01, 1.25933240e+01, 2.27569293e+00,
        8.90981941e+00])
        • day
          (day)
          int64
          1 2 3 4 5 6 … 361 362 363 364 365
          array([  1,   2,   3, …, 363, 364, 365])
        • day
          PandasIndex
          PandasIndex(Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,
          …
          356, 357, 358, 359, 360, 361, 362, 363, 364, 365],
          dtype='int64', name='day', length=365))

      Exercise: Get the array containing the days throughout the year.

      Solution
      data_sun['day']
      <xarray.DataArray 'day' (day: 365)> Size: 3kB
      array([  1,   2,   3, …, 363, 364, 365])
      Coordinates:

      • day (day) int64 3kB 1 2 3 4 5 6 7 8 … 358 359 360 361 362 363 364 365
      xarray.DataArray
      ‘day’
      • day: 365
      • 1 2 3 4 5 6 7 8 9 10 11 … 356 357 358 359 360 361 362 363 364 365
        array([  1,   2,   3, …, 363, 364, 365])
        • day
          (day)
          int64
          1 2 3 4 5 6 … 361 362 363 364 365
          array([  1,   2,   3, …, 363, 364, 365])
        • day
          PandasIndex
          PandasIndex(Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,
          …
          356, 357, 358, 359, 360, 361, 362, 363, 364, 365],
          dtype='int64', name='day', length=365))

      Exercise: Get the data on hours of sunlight for day number 3 through 11 using the loc method.

      Solution
      data_sun.loc[3:11]
      <xarray.DataArray 'hours_of_sunlight_over_year' (day: 9)> Size: 72B
      array([ 3.74655378,  8.924612  ,  9.38304235,  8.9275101 ,  3.77495059,
      13.16365669,  2.45902953,  8.36089037,  7.09654038])
      Coordinates:

      • day (day) int64 72B 3 4 5 6 7 8 9 10 11
      xarray.DataArray
      ‘hours_of_sunlight_over_year’
      • day: 9
      • 3.747 8.925 9.383 8.928 3.775 13.16 2.459 8.361 7.097
        array([ 3.74655378,  8.924612  ,  9.38304235,  8.9275101 ,  3.77495059,
        13.16365669,  2.45902953,  8.36089037,  7.09654038])
        • day
          (day)
          int64
          3 4 5 6 7 8 9 10 11
          array([ 3,  4,  5,  6,  7,  8,  9, 10, 11])
        • day
          PandasIndex
          PandasIndex(Index([3, 4, 5, 6, 7, 8, 9, 10, 11], dtype='int64', name='day'))

      Exercise: Get the data on hours of sunlight for day number 3 through 11 using regular indexing for arrays. Do you notice a difference in which indeces you use to access the data?

      Solution
      data_sun[2:11]
      <xarray.DataArray 'hours_of_sunlight_over_year' (day: 9)> Size: 72B
      array([ 3.74655378,  8.924612  ,  9.38304235,  8.9275101 ,  3.77495059,
      13.16365669,  2.45902953,  8.36089037,  7.09654038])
      Coordinates:

      • day (day) int64 72B 3 4 5 6 7 8 9 10 11
      xarray.DataArray
      ‘hours_of_sunlight_over_year’
      • day: 9
      • 3.747 8.925 9.383 8.928 3.775 13.16 2.459 8.361 7.097
        array([ 3.74655378,  8.924612  ,  9.38304235,  8.9275101 ,  3.77495059,
        13.16365669,  2.45902953,  8.36089037,  7.09654038])
        • day
          (day)
          int64
          3 4 5 6 7 8 9 10 11
          array([ 3,  4,  5,  6,  7,  8,  9, 10, 11])
        • day
          PandasIndex
          PandasIndex(Index([3, 4, 5, 6, 7, 8, 9, 10, 11], dtype='int64', name='day'))

      Exercise: In the hiking boots DataArray from the example, get the number of hiking boots sold in October using the loc method.

      Solution
      data_boots.loc['October']
      <xarray.DataArray 'sale_hiking_boots' ()> Size: 8B
      array(48)
      Coordinates:
      month    <U9 36B 'October'
      xarray.DataArray
      ‘sale_hiking_boots’
      • 48
        array(48)
        • month
          ()
          <U9
          'October'
          array('October', dtype='<U9')

        Exercise: Creating DataArrays with multidimensional data. Let’s say that the company selling hiking boots has stores in multiple cities - Cologne, Berlin, and Munich - and that you want to store data on sales in all three cities throughout the year. In this case, you’re storing multidimensional data; data across time and space, similar to neuroscience data.

        months = ['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
        cities = ['Cologne', 'Berlin', 'Munich']
        hiking_boots_sold = np.random.randint(low=2, high = 50, size = (len(months), len(cities)))
        hiking_boots_sold
        array([[11, 38, 45],
               [37, 25, 13],
               [ 4, 21, 11],
               [45, 19, 20],
               [ 9, 29, 28],
               [ 8,  5, 49],
               [10, 22, 23],
               [28, 25, 12],
               [43,  5, 28],
               [13, 18,  9],
               [ 6, 30, 33],
               [20, 20, 16]])
        data_boots_cities = xr.DataArray(
            data=hiking_boots_sold,
            coords={'month': months, 'city': cities},
            name='hiking_boots_sold_different_cities'
        )
        data_boots_cities
        <xarray.DataArray 'hiking_boots_sold_different_cities' (month: 12, city: 3)> Size: 288B
        array([[11, 38, 45],
        [37, 25, 13],
        [ 4, 21, 11],
        [45, 19, 20],
        [ 9, 29, 28],
        [ 8,  5, 49],
        [10, 22, 23],
        [28, 25, 12],
        [43,  5, 28],
        [13, 18,  9],
        [ 6, 30, 33],
        [20, 20, 16]])
        Coordinates:

        • month (month) <U9 432B 'Januar' 'February' … 'November' 'December'
        • city (city) <U7 84B 'Cologne' 'Berlin' 'Munich'
        xarray.DataArray
        ‘hiking_boots_sold_different_cities’
        • month: 12
        • city: 3
        • 11 38 45 37 25 13 4 21 11 45 19 20 … 5 28 13 18 9 6 30 33 20 20 16
          array([[11, 38, 45],
          [37, 25, 13],
          [ 4, 21, 11],
          [45, 19, 20],
          [ 9, 29, 28],
          [ 8,  5, 49],
          [10, 22, 23],
          [28, 25, 12],
          [43,  5, 28],
          [13, 18,  9],
          [ 6, 30, 33],
          [20, 20, 16]])
          • month
            (month)
            <U9
            'Januar' 'February' … 'December'
            array(['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August',
            'September', 'October', 'November', 'December'], dtype='<U9')
          • city
            (city)
            <U7
            'Cologne' 'Berlin' 'Munich'
            array(['Cologne', 'Berlin', 'Munich'], dtype='<U7')
          • month
            PandasIndex
            PandasIndex(Index(['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August',
            'September', 'October', 'November', 'December'],
            dtype='object', name='month'))
          • city
            PandasIndex
            PandasIndex(Index(['Cologne', 'Berlin', 'Munich'], dtype='object', name='city'))

        Exercise: Make a 2-D array with data on sunlight throughout the year in Germany, France, and Italy using the variables in the cell below.

        Solution
        days = np.linspace(1,365,365,dtype=int)
        countries = ['Germany', 'France', 'Italy']
        hours_of_sunlight = np.random.uniform(low=0, high = 16, size=(len(days), len(countries)))
        data_sun_country = xr.DataArray(
            data = hours_of_sunlight,
            coords={'day': days, 'country': countries},
            name='hours_of_sunlight_countries'
        )
        data_sun_country
        <xarray.DataArray 'hours_of_sunlight_countries' (day: 365, country: 3)> Size: 9kB
        array([[10.62911822,  0.13318533,  5.50667079],
        [10.20166023, 15.54403471,  6.6744995 ],
        [15.91543282,  6.52257226, 13.96066533],
        …,
        [ 2.5423545 ,  9.90954904, 12.38464112],
        [ 3.10759408,  8.16055736, 10.55890302],
        [13.99674588,  2.43413591,  2.84435315]])
        Coordinates:

        • day (day) int64 3kB 1 2 3 4 5 6 7 8 … 358 359 360 361 362 363 364 365
        • country (country) <U7 84B 'Germany' 'France' 'Italy'
        xarray.DataArray
        ‘hours_of_sunlight_countries’
        • day: 365
        • country: 3
        • 10.63 0.1332 5.507 10.2 15.54 6.674 … 8.161 10.56 14.0 2.434 2.844
          array([[10.62911822,  0.13318533,  5.50667079],
          [10.20166023, 15.54403471,  6.6744995 ],
          [15.91543282,  6.52257226, 13.96066533],
          …,
          [ 2.5423545 ,  9.90954904, 12.38464112],
          [ 3.10759408,  8.16055736, 10.55890302],
          [13.99674588,  2.43413591,  2.84435315]])
          • day
            (day)
            int64
            1 2 3 4 5 6 … 361 362 363 364 365
            array([  1,   2,   3, …, 363, 364, 365])
          • country
            (country)
            <U7
            'Germany' 'France' 'Italy'
            array(['Germany', 'France', 'Italy'], dtype='<U7')
          • day
            PandasIndex
            PandasIndex(Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,
            …
            356, 357, 358, 359, 360, 361, 362, 363, 364, 365],
            dtype='int64', name='day', length=365))
          • country
            PandasIndex
            PandasIndex(Index(['Germany', 'France', 'Italy'], dtype='object', name='country'))

        Exercise: Get the data on hours of sunlight from day 3 to 11 for Italy using the loc function.

        Solution
        data_sun_country.loc[3:11, 'Italy']
        <xarray.DataArray 'hours_of_sunlight_countries' (day: 9)> Size: 72B
        array([13.96066533,  1.17941739, 11.09906937, 14.0559088 , 11.17591572,
        10.79901602, 12.55835353,  1.38706644,  9.55820968])
        Coordinates:

        • day (day) int64 72B 3 4 5 6 7 8 9 10 11 country <U7 28B 'Italy'
        xarray.DataArray
        ‘hours_of_sunlight_countries’
        • day: 9
        • 13.96 1.179 11.1 14.06 11.18 10.8 12.56 1.387 9.558
          array([13.96066533,  1.17941739, 11.09906937, 14.0559088 , 11.17591572,
          10.79901602, 12.55835353,  1.38706644,  9.55820968])
          • day
            (day)
            int64
            3 4 5 6 7 8 9 10 11
            array([ 3,  4,  5,  6,  7,  8,  9, 10, 11])
          • country
            ()
            <U7
            'Italy'
            array('Italy', dtype='<U7')
          • day
            PandasIndex
            PandasIndex(Index([3, 4, 5, 6, 7, 8, 9, 10, 11], dtype='int64', name='day'))

        Exercise: Get the data on hours of sunlight from day 3 to 11 for both Germany and France together.

        Solution
        data_sun_country.loc[3:11, ['Germany', 'France']]
        <xarray.DataArray 'hours_of_sunlight_countries' (day: 9, country: 2)> Size: 144B
        array([[15.91543282,  6.52257226],
        [ 3.05188703, 12.36757178],
        [ 5.86002647, 12.3537221 ],
        [ 9.81929209,  9.54686706],
        [11.81800274, 13.86272208],
        [ 0.78291768, 11.79843803],
        [ 8.8559426 ,  5.45258128],
        [14.32486253,  4.11157752],
        [ 7.35287893,  8.44928813]])
        Coordinates:

        • day (day) int64 72B 3 4 5 6 7 8 9 10 11
        • country (country) <U7 56B 'Germany' 'France'
        xarray.DataArray
        ‘hours_of_sunlight_countries’
        • day: 9
        • country: 2
        • 15.92 6.523 3.052 12.37 5.86 12.35 … 5.453 14.32 4.112 7.353 8.449
          array([[15.91543282,  6.52257226],
          [ 3.05188703, 12.36757178],
          [ 5.86002647, 12.3537221 ],
          [ 9.81929209,  9.54686706],
          [11.81800274, 13.86272208],
          [ 0.78291768, 11.79843803],
          [ 8.8559426 ,  5.45258128],
          [14.32486253,  4.11157752],
          [ 7.35287893,  8.44928813]])
          • day
            (day)
            int64
            3 4 5 6 7 8 9 10 11
            array([ 3,  4,  5,  6,  7,  8,  9, 10, 11])
          • country
            (country)
            <U7
            'Germany' 'France'
            array(['Germany', 'France'], dtype='<U7')
          • day
            PandasIndex
            PandasIndex(Index([3, 4, 5, 6, 7, 8, 9, 10, 11], dtype='int64', name='day'))
          • country
            PandasIndex
            PandasIndex(Index(['Germany', 'France'], dtype='object', name='country'))

        Section 2: Saving DataArray to File.

        After the DataArray is constructed, you want to save it to a file so that you can load it and continue to work on it later or share it with others.

        Code Description
        da.to_netcdf('data/filename.nc') Write the DataArray variable named “da” to a file with a filename of your choosing in the data directory
        data = xr.load_dataarray('data/filename.nc') Load the DataArray and put it in a variable

        Run the cell below to create the data directory if it doesn’t already exist.

        Exercises

        from pathlib import Path
        
        data_dir = Path('data')
        
        data_dir.mkdir(exist_ok=True, parents=True)

        Exercise: Write the DataArray data on sunlight per day to file to save it.

        Solution
        #Write to file
        data_boots_cities.to_netcdf('data/hiking_boots_sold.nc')

        Exercise: Load the DataArray data on sunlight per day in different countries you saved to a variable. Display the variable to check that the data was stored correctly.

        Solution
        data = xr.load_dataarray('data/hiking_boots_sold.nc')
        data
        <xarray.DataArray 'hiking_boots_sold_different_cities' (month: 12, city: 3)> Size: 288B
        array([[11, 38, 45],
        [37, 25, 13],
        [ 4, 21, 11],
        [45, 19, 20],
        [ 9, 29, 28],
        [ 8,  5, 49],
        [10, 22, 23],
        [28, 25, 12],
        [43,  5, 28],
        [13, 18,  9],
        [ 6, 30, 33],
        [20, 20, 16]])
        Coordinates:

        • month (month) <U9 432B 'Januar' 'February' … 'November' 'December'
        • city (city) <U7 84B 'Cologne' 'Berlin' 'Munich'
        xarray.DataArray
        ‘hiking_boots_sold_different_cities’
        • month: 12
        • city: 3
        • 11 38 45 37 25 13 4 21 11 45 19 20 … 5 28 13 18 9 6 30 33 20 20 16
          array([[11, 38, 45],
          [37, 25, 13],
          [ 4, 21, 11],
          [45, 19, 20],
          [ 9, 29, 28],
          [ 8,  5, 49],
          [10, 22, 23],
          [28, 25, 12],
          [43,  5, 28],
          [13, 18,  9],
          [ 6, 30, 33],
          [20, 20, 16]])
          • month
            (month)
            <U9
            'Januar' 'February' … 'December'
            array(['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August',
            'September', 'October', 'November', 'December'], dtype='<U9')
          • city
            (city)
            <U7
            'Cologne' 'Berlin' 'Munich'
            array(['Cologne', 'Berlin', 'Munich'], dtype='<U7')
          • month
            PandasIndex
            PandasIndex(Index(['Januar', 'February', 'March', 'April', 'May', 'June', 'July', 'August',
            'September', 'October', 'November', 'December'],
            dtype='object', name='month'))
          • city
            PandasIndex
            PandasIndex(Index(['Cologne', 'Berlin', 'Munich'], dtype='object', name='city'))