issues: 231308952

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
231308952	MDExOlB1bGxSZXF1ZXN0MTIyNDE4MjA3	1426	scalar_level in MultiIndex	6815844	closed	0			10	2017-05-25T11:03:05Z	2019-01-14T21:20:28Z	2019-01-14T21:20:27Z	MEMBER		0	pydata/xarray/pulls/1426	[x] Closes #1408 [x] Tests added / passed [x] Passes `git diff upstream/master \| flake8 --diff` [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API [Edit for more clarity] I restarted a new branch to fix #1408 (I closed the older one #1412). Because the changes I made is relatively large, here I summarize this PR. Sumamry In this PR, I newly added two kinds of levels in MultiIndex, `index-level` and `scalar-level`. `index-level` is an ordinary level in MultiIndex (as in current implementation), while `scalar-level` indicates dropped level (which is newly added in this PR). Changes in behaviors. Indexing a scalar at a particular level changes that level to `scalar-level` instead of dropping that level (changed from #767). Indexing a scalar from a MultiIndex, the selected value now becomes a `MultiIndex-scalar` rather than a scalar of tuple. Enabled indexing along a `index-level` if the MultiIndex has only a single `index-level`. Examples of the output are shown below. Any suggestions for these behaviors are welcome. ```python In [1]: import numpy as np ...: import xarray as xr ...: ...: ds1 = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3], 'y': 'a'}) ...: ds2 = xr.Dataset({'foo': (('x',), [4, 5, 6])}, {'x': [1, 2, 3], 'y': 'b'}) ...: # example data ...: ds = xr.concat([ds1, ds2], dim='y').stack(yx=['y', 'x']) ...: ds Out[1]: <xarray.Dataset> Dimensions: (yx: 6) Coordinates: * yx (yx) MultiIndex - y (yx) object 'a' 'a' 'a' 'b' 'b' 'b' # <--- this is index-level - x (yx) int64 1 2 3 1 2 3 # <--- this is also index-level Data variables: foo (yx) int64 1 2 3 4 5 6 In [2]: # 1. indexing a scalar converts `index-level` x to `scalar-level`. ...: ds.sel(x=1) Out[2]: <xarray.Dataset> Dimensions: (yx: 2) Coordinates: * yx (yx) MultiIndex - y (yx) object 'a' 'b' # <--- this is index-level - x int64 1 # <--- this is scalar-level Data variables: foo (yx) int64 1 4 In [3]: # 2. indexing a single element from MultiIndex makes a `MultiIndex-scalar` ...: ds.isel(yx=0) Out[3]: <xarray.Dataset> Dimensions: () Coordinates: yx MultiIndex # <--- this is MultiIndex-scalar - y <U1 'a' - x int64 1 Data variables: foo int64 1 In [6]: # 3. Enables to selecting along a `index-level` if only one `index-level` exists in MultiIndex ...: ds.sel(x=1).isel(y=[0,1]) Out[6]: <xarray.Dataset> Dimensions: (yx: 2) Coordinates: * yx (yx) MultiIndex - y (yx) object 'a' 'b' - x int64 1 Data variables: foo (yx) int64 1 4 ``` Changes in the public APIs Some changes were necessary to the public APIs, though I tried to minimize them. `level_names`, `get_level_values` methods were moved from `IndexVariable` to `Variable`. This is because `IndexVariable` cannnot handle 0-d array, which I want to support in 2. `scalar_level_names` and `all_level_names` properties were added to `Variable` `reset_levels` method was added to `Variable` class to control `scalar-level` and `index-level`. Implementation summary The main changes in the implementation is the addition of our own wrapper of `pd.MultiIndex`, `PandasMultiIndexAdapter`. This does most of `MultiIndex`-related operations, such as indexing, concatenation, conversion between 'scalar-level`and`index-level`. What we can do now The main merit of this proposal is that it enables us to handle `MultiIndex` more consistent way to the normal `Variable`. Now we can recover the MultiIndex with dropped level. ```python In [5]: ds.sel(x=1).expand_dims('x') Out[5]: <xarray.Dataset> Dimensions: (yx: 2) Coordinates: yx (yx) MultiIndex y (yx) object 'a' 'b' x (yx) int64 1 1 Data variables: foo (yx) int64 1 4 ``` construct a MultiIndex by concatenation of MultiIndex-scalar. ```python In [8]: xr.concat([ds.isel(yx=i) for i in range(len(ds['yx']))], dim='yx') Out[8]: <xarray.Dataset> Dimensions: (yx: 6) Coordinates: yx (yx) MultiIndex y (yx) object 'a' 'a' 'a' 'b' 'b' 'b' x (yx) int64 1 2 3 1 2 3 Data variables: foo (yx) int64 1 2 3 4 5 6 ``` What we cannot do now With the current implementation, we can do `python ds.sel(y='a').rolling(x=2)` but with this PR we cannot, because `x` is not yet an ordinary coordinate, but a MultiIndex with a single `index-level`. I think it is better if we can handle such a MultiIndex with a single `index-level` as very similar way to an ordinary coordinate. Similary, we can neither do `ds.sel(y='a').mean(dim='x')`. Also, `ds.sel(y='a').to_netcdf('file')` (#719) What are to be decided How to `repr` these new levels (Current formatting is shown in Out[2] and Out[3] above.) Terminologies such as `index-level`, `scalar-level`, `MultiIndex-scalar` are clear enough? How much operations should we support for a single `index-level` MultiIndex? Do we support `ds.sel(y='a').rolling(x=2)` and `ds.sel(y='a').mean(dim='x')`? TODOs [ ] Support indexing with DataAarray, `ds.sel(x=ds.x[0])` [ ] Support `stack`, `unstack`, `set_index`, `reset_index` methods with `scalar-level` MultiIndex. [ ] Add a full document [ ] Clean up the code related to MultiIndex [ ] Fix issues (#1428, #1430, #1431) related to MultiIndex	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1426/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			13221727	pull

Links from other tables

4 rows from issues_id in issues_labels
10 rows from issue in issue_comments