html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/1426#issuecomment-305058643,https://api.github.com/repos/pydata/xarray/issues/1426,305058643,MDEyOklzc3VlQ29tbWVudDMwNTA1ODY0Mw==,1217238,2017-05-31T01:46:35Z,2017-05-31T01:46:35Z,MEMBER,"> If my understanding is correct, does it mean that we will support
> ds.sel(x='a'), ds.isel(x=[0, 1]) and ds.mean(dim='x') with your example data?
> Will it raise an Error if Coordinate is more than 1 dimensional?
> How about ds.sel(x='a', y=[1, 2])?
I was only thinking about `.sel()` (as works currently with `MultiIndex`). I'm not sure about the others yet.
@benbovy although a `CoordinateGroup` is definitely better than `MultiIndex-scalar`, it still feels like a very similar notion. It could make for a nice internal clean-up, but from an user perspective I think it's about as confusing as a MultiIndex -- it's just as many terms to keep track of.
Right now, our user facing API in xarray exposes three related concepts:
- `Coordinate`
- `Index`
- `MultiIndex`
Eliminating any of these concepts would be an improvement.
To this end, I have two (vague) proposals:
1. Eliminate `MultiIndex`. We only have an idea of ""indexed"" coordinates, marked by `*` in the `repr`, which don't necessarily correspond to dimensions. Indexed coordinates, which are immutable, can have any number of dimensions and you can have any other of ""indexed"" coordinates per dimension. Indexing, concatenating and expanding dimensions should not change their nature.
2. Eliminate both `MultiIndex` *and* explicit indexes. Indexes required for efficient operations are created on the fly when necessary. This might be too magical.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,231308952
https://github.com/pydata/xarray/pull/1426#issuecomment-304778433,https://api.github.com/repos/pydata/xarray/issues/1426,304778433,MDEyOklzc3VlQ29tbWVudDMwNDc3ODQzMw==,1217238,2017-05-30T05:29:11Z,2017-05-30T05:29:11Z,MEMBER,"Sorry for the delay getting back to you here -- I'm still thinking through the implications of this change.
This does make the handling of `MultiIndex` type data much more consistent, but calling scalars `MultiIndex-scalar` seems quite confusing to me. I think of the data-type here as closer to NumPy's [structured types](https://docs.scipy.org/doc/numpy/user/basics.rec.html), except without the implied storage format for the data.
However, taking a step back, I wonder if this is the right approach. In many ways, structured dtypes are similar to xarray's existing data structures, so supporting them fully means a lot of duplicated functionality. MultiIndexes (especially with scalars) should work similarly to separate variables, but they are implemented very differently under the hood (all the data lives in one variable).
(See https://github.com/pandas-dev/pandas/issues/3443 for related discussion about pandas and
why it doesn't support structured dtypes.)
It occurs to me that if we had full support for indexing on coordinate levels, we might not need a notion of a ""MultiIndex"" in the public API at all. To make this more concrete, what if this was the `repr()` for the result of `ds.stack(yx=['y', 'x'])` in your first example?
```
Dimensions: (yx: 6)
Coordinates:
y (yx) object 'a' 'a' 'a' 'b' 'b' 'b'
x (yx) int64 1 2 3 1 2 3
Data variables:
foo (yx) int64 1 2 3 4 5 6
```
If we supported `MultiIndex`-like indexing for `x` and `y`, this could be nearly equivalent to a MultiIndex with much less code duplication. The important practical difference is that here there are no labels along the `yx`, so `ds['yx'][0]` would not return a tuple. Also, we would need to figure out some way to explicitly signal what should become part of a MultiIndex when we convert to a pandas DataFrame.
Pandas has `MultiIndex` because it needed a way to group multiple arrays together into a single index array. In xarray, this is less necessary, because we have multiple coordinates to represent levels, and xarray itself no longer need a MultiIndex notion because we longer requires coordinate labels for every dimension (as of v0.9).
CC @benbovy ","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,231308952