issue_comments: 442797084

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/1603#issuecomment-442797084	https://api.github.com/repos/pydata/xarray/issues/1603	442797084	MDEyOklzc3VlQ29tbWVudDQ0Mjc5NzA4NA==	4160723	2018-11-29T11:15:17Z	2018-11-29T11:15:17Z	MEMBER	we will definitely have to make some intentional deviations from the behavior of pandas Looking at the reported issues related to multi-indexes in xarray, I have the same feeling. Simply reusing `pandas.MultiIndex` in xarray where slightly different semantics are generally expected has shown to be painful. It seems easier to have our own baked solution and deal with differences during xarray<-> pandas conversion if needed. If we re-design indexes so that we allow 3rd-party indexes, maybe we could support both and let the user choose the one (xarray or pandas baked) that best suits his needs? Regarding MultiIndex as part of the data schema vs an implementation detail, if we support extending indexes (and already given the different kinds of multi-coordinate indexes: MultiIndex, KDTree, etc.), then I think that it should be transparent to the user. However, I don't really see why a multi-coordinate index should have its own variable (with tuples of values). I don't want to speak for others, but IMHO `ds.sel(multi=list_of_pairs)` is rather a edge case and I'm not sure if we really need to support it. Using `ds.sel(x=..., y=...)` with DataArray objects is certainly more code to write, but this form of indexing is very powerful and it might not be a bad idea to encourage it. If a variable for each multi-coordinate index is "just" for data schema consistency, then why not showing all those indexes in a separate section of the repr? For example: `Coordinates: * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 1 2 1 2 Multi-indexes: pandas.MultiIndex [level_1, level_2]` It is equally transparent, not more verbose, and it is clear that multi-indexes are not part of the coordinates (in fact there is no need of "virtual" coordinates either, nor to name the index). I don't think single indexes should be shown here as it would results in duplicated, uninformative lines. More generally, here is how I would see indexes handled in xarray (I might be missing important aspects, though): Default behavior: all 1-dimensional coordinates each have their own, single index (`pandas.Index`), unless explicitly stated. Explicit API is used for setting new, possibly multi-coordinate indexes. Note the absence of keyword argument below to specify the variables: This is actually more consistent with the pandas API but this would be a breaking change and I don't know how a smooth transition could look like. `set_index(['x', 'y'], kind='multiindex') # xarray built-in index` `set_index(['x', 'y'], kind='kdtree') # xarray built-in index` `set_index('x', kind=ASingleIndexWrapperClass) # 3rd-party index` If a coordinate is removed from the Dataset or if its index is reset or changed: If the coordinate had a single index, no problem If the coordinate was part of a multi-coordinate index: a new index is built from all remaining coordinates that were also part of the original index, if it is supported. Otherwise, the original index is removed and the default behavior (single `pandas.Index`) is reset for all those remaining coordinates.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		262642978