home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 231308952

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
231308952 MDExOlB1bGxSZXF1ZXN0MTIyNDE4MjA3 1426 scalar_level in MultiIndex 6815844 closed 0     10 2017-05-25T11:03:05Z 2019-01-14T21:20:28Z 2019-01-14T21:20:27Z MEMBER   0 pydata/xarray/pulls/1426
  • [x] Closes #1408
  • [x] Tests added / passed
  • [x] Passes git diff upstream/master | flake8 --diff
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API

[Edit for more clarity] I restarted a new branch to fix #1408 (I closed the older one #1412).

Because the changes I made is relatively large, here I summarize this PR.

Sumamry

In this PR, I newly added two kinds of levels in MultiIndex, index-level and scalar-level. index-level is an ordinary level in MultiIndex (as in current implementation), while scalar-level indicates dropped level (which is newly added in this PR).

Changes in behaviors.

  1. Indexing a scalar at a particular level changes that level to scalar-level instead of dropping that level (changed from #767).
  2. Indexing a scalar from a MultiIndex, the selected value now becomes a MultiIndex-scalar rather than a scalar of tuple.
  3. Enabled indexing along a index-level if the MultiIndex has only a single index-level.

Examples of the output are shown below. Any suggestions for these behaviors are welcome.

```python In [1]: import numpy as np ...: import xarray as xr ...: ...: ds1 = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3], 'y': 'a'}) ...: ds2 = xr.Dataset({'foo': (('x',), [4, 5, 6])}, {'x': [1, 2, 3], 'y': 'b'}) ...: # example data ...: ds = xr.concat([ds1, ds2], dim='y').stack(yx=['y', 'x']) ...: ds Out[1]: <xarray.Dataset> Dimensions: (yx: 6) Coordinates: * yx (yx) MultiIndex - y (yx) object 'a' 'a' 'a' 'b' 'b' 'b' # <--- this is index-level - x (yx) int64 1 2 3 1 2 3 # <--- this is also index-level Data variables: foo (yx) int64 1 2 3 4 5 6

In [2]: # 1. indexing a scalar converts index-level x to scalar-level. ...: ds.sel(x=1) Out[2]: <xarray.Dataset> Dimensions: (yx: 2) Coordinates: * yx (yx) MultiIndex - y (yx) object 'a' 'b' # <--- this is index-level - x int64 1 # <--- this is scalar-level Data variables: foo (yx) int64 1 4

In [3]: # 2. indexing a single element from MultiIndex makes a MultiIndex-scalar ...: ds.isel(yx=0) Out[3]: <xarray.Dataset> Dimensions: () Coordinates: yx MultiIndex # <--- this is MultiIndex-scalar - y <U1 'a' - x int64 1 Data variables: foo int64 1

In [6]: # 3. Enables to selecting along a index-level if only one index-level exists in MultiIndex ...: ds.sel(x=1).isel(y=[0,1]) Out[6]: <xarray.Dataset> Dimensions: (yx: 2) Coordinates: * yx (yx) MultiIndex - y (yx) object 'a' 'b' - x int64 1 Data variables: foo (yx) int64 1 4

```

Changes in the public APIs

Some changes were necessary to the public APIs, though I tried to minimize them.

  • level_names, get_level_values methods were moved from IndexVariable to Variable. This is because IndexVariable cannnot handle 0-d array, which I want to support in 2.

  • scalar_level_names and all_level_names properties were added to Variable

  • reset_levels method was added to Variable class to control scalar-level and index-level.

Implementation summary

The main changes in the implementation is the addition of our own wrapper of pd.MultiIndex, PandasMultiIndexAdapter. This does most of MultiIndex-related operations, such as indexing, concatenation, conversion between 'scalar-levelandindex-level`.

What we can do now

The main merit of this proposal is that it enables us to handle MultiIndex more consistent way to the normal Variable. Now we can

  • recover the MultiIndex with dropped level. ```python In [5]: ds.sel(x=1).expand_dims('x') Out[5]: <xarray.Dataset> Dimensions: (yx: 2) Coordinates:
  • yx (yx) MultiIndex
  • y (yx) object 'a' 'b'
  • x (yx) int64 1 1 Data variables: foo (yx) int64 1 4 ```

  • construct a MultiIndex by concatenation of MultiIndex-scalar. ```python In [8]: xr.concat([ds.isel(yx=i) for i in range(len(ds['yx']))], dim='yx') Out[8]: <xarray.Dataset> Dimensions: (yx: 6) Coordinates:

  • yx (yx) MultiIndex
  • y (yx) object 'a' 'a' 'a' 'b' 'b' 'b'
  • x (yx) int64 1 2 3 1 2 3 Data variables: foo (yx) int64 1 2 3 4 5 6 ```

What we cannot do now

With the current implementation, we can do python ds.sel(y='a').rolling(x=2) but with this PR we cannot, because x is not yet an ordinary coordinate, but a MultiIndex with a single index-level. I think it is better if we can handle such a MultiIndex with a single index-level as very similar way to an ordinary coordinate.

Similary, we can neither do ds.sel(y='a').mean(dim='x'). Also, ds.sel(y='a').to_netcdf('file') (#719)

What are to be decided

  • How to repr these new levels (Current formatting is shown in Out[2] and Out[3] above.)
  • Terminologies such as index-level, scalar-level, MultiIndex-scalar are clear enough?
  • How much operations should we support for a single index-level MultiIndex? Do we support ds.sel(y='a').rolling(x=2) and ds.sel(y='a').mean(dim='x')?

TODOs

  • [ ] Support indexing with DataAarray, ds.sel(x=ds.x[0])
  • [ ] Support stack, unstack, set_index, reset_index methods with scalar-level MultiIndex.
  • [ ] Add a full document
  • [ ] Clean up the code related to MultiIndex
  • [ ] Fix issues (#1428, #1430, #1431) related to MultiIndex
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1426/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 4 rows from issues_id in issues_labels
  • 10 rows from issue in issue_comments
Powered by Datasette · Queries took 0.62ms · About: xarray-datasette