issue_comments: 304778433

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/pull/1426#issuecomment-304778433	https://api.github.com/repos/pydata/xarray/issues/1426	304778433	MDEyOklzc3VlQ29tbWVudDMwNDc3ODQzMw==	1217238	2017-05-30T05:29:11Z	2017-05-30T05:29:11Z	MEMBER	Sorry for the delay getting back to you here -- I'm still thinking through the implications of this change. This does make the handling of `MultiIndex` type data much more consistent, but calling scalars `MultiIndex-scalar` seems quite confusing to me. I think of the data-type here as closer to NumPy's structured types, except without the implied storage format for the data. However, taking a step back, I wonder if this is the right approach. In many ways, structured dtypes are similar to xarray's existing data structures, so supporting them fully means a lot of duplicated functionality. MultiIndexes (especially with scalars) should work similarly to separate variables, but they are implemented very differently under the hood (all the data lives in one variable). (See https://github.com/pandas-dev/pandas/issues/3443 for related discussion about pandas and why it doesn't support structured dtypes.) It occurs to me that if we had full support for indexing on coordinate levels, we might not need a notion of a "MultiIndex" in the public API at all. To make this more concrete, what if this was the `repr()` for the result of `ds.stack(yx=['y', 'x'])` in your first example? `<xarray.Dataset> Dimensions: (yx: 6) Coordinates: y (yx) object 'a' 'a' 'a' 'b' 'b' 'b' x (yx) int64 1 2 3 1 2 3 Data variables: foo (yx) int64 1 2 3 4 5 6` If we supported `MultiIndex`-like indexing for `x` and `y`, this could be nearly equivalent to a MultiIndex with much less code duplication. The important practical difference is that here there are no labels along the `yx`, so `ds['yx'][0]` would not return a tuple. Also, we would need to figure out some way to explicitly signal what should become part of a MultiIndex when we convert to a pandas DataFrame. Pandas has `MultiIndex` because it needed a way to group multiple arrays together into a single index array. In xarray, this is less necessary, because we have multiple coordinates to represent levels, and xarray itself no longer need a MultiIndex notion because we longer requires coordinate labels for every dimension (as of v0.9). CC @benbovy	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		231308952