home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 169588316

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
169588316 MDExOlB1bGxSZXF1ZXN0ODAyMjk0OTM= 947 Multi-index levels as coordinates 4160723 closed 0     17 2016-08-05T11:34:49Z 2016-09-14T03:35:04Z 2016-09-14T03:34:51Z MEMBER   0 pydata/xarray/pulls/947

Implements 2, 4 and 5 in #719.

Demo:

``` In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: import xarray as xr

In [4]: index = pd.MultiIndex.from_product((list('ab'), range(2)), ...: names= ('level_1', 'level_2'))

In [5]: da = xr.DataArray(np.random.rand(4, 4), coords={'x': index}, ...: dims=('x', 'y'), name='test')

In [6]: da Out[6]: <xarray.DataArray 'test' (x: 4, y: 4)> array([[ 0.15036153, 0.68974802, 0.40082234, 0.94451318], [ 0.26732938, 0.49598123, 0.8679231 , 0.6149102 ], [ 0.3313594 , 0.93857424, 0.73023367, 0.44069622], [ 0.81304837, 0.81244159, 0.37274953, 0.86405196]]) Coordinates: * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 0 1 0 1 * y (y) int64 0 1 2 3

In [7]: da['level_1'] Out[7]: <xarray.DataArray 'level_1' (x: 4)> array(['a', 'a', 'b', 'b'], dtype=object) Coordinates: * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 0 1 0 1

In [8]: da.sel(x='a', level_2=1) Out[8]: <xarray.DataArray 'test' (y: 4)> array([ 0.26732938, 0.49598123, 0.8679231 , 0.6149102 ]) Coordinates: x object ('a', 1) * y (y) int64 0 1 2 3

In [9]: da.sel(level_2=1) Out[9]: <xarray.DataArray 'test' (level_1: 2, y: 4)> array([[ 0.26732938, 0.49598123, 0.8679231 , 0.6149102 ], [ 0.81304837, 0.81244159, 0.37274953, 0.86405196]]) Coordinates: * level_1 (level_1) object 'a' 'b' * y (y) int64 0 1 2 3 ```

Some notes about the implementation: - I slightly modified Coordinate so that it allows setting different values for the names of the coordinate and its dimension. There is no breaking change. - I also added a Coordinate.get_level_coords method to get independent, single-index coordinates objects from a MultiIndex coordinate.

Remaining issues: - Coordinate.get_level_coords calls pandas.MultiIndex.get_level_values for each level and is itself called each time when indexing and for repr. This can be very costly!! It would be nice to return some kind of lazy index object instead of computing the actual level values. - repr replace a MultiIndex coordinate by its level coordinates. That can be confusing in some cases (see below). Maybe we can set a different marker than * for level coordinates.

``` In [6]: [name for name in da.coords] Out[6]: ['x', 'y']

In [7]: da.coords.keys() Out[7]: KeysView(Coordinates: * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 0 1 0 1 * y (y) int64 0 1 2 3) `` -DataArray.level_1doesn't return anotherDataArray` object:

In [10]: da.level_1 Out[10]: <xarray.Coordinate 'level_1' (x: 4)> array(['a', 'a', 'b', 'b'], dtype=object) - Maybe we need to test the uniqueness of level names at DataArray or Dataset creation.

Of course still needs proper tests and docs...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/947/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 17 rows from issue in issue_comments
Powered by Datasette · Queries took 0.56ms · About: xarray-datasette