issue_comments: 442710536

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/1603#issuecomment-442710536	https://api.github.com/repos/pydata/xarray/issues/1603	442710536	MDEyOklzc3VlQ29tbWVudDQ0MjcxMDUzNg==	1217238	2018-11-29T05:23:33Z	2018-11-29T05:25:48Z	MEMBER	There's no need to support indexing like `ds.sel(multi=list_of_pairs)`. Indexing like `ds.sel(x=..., y=...)` solves the same use case and looks nicer. This needs an important caveat: it's only true that you use `ds.sel(x=..., y=...)` to emulate `ds.sel(multi=list_of_pairs)` if you do explicit vectorized indexing like in @max-sixty's example above (https://github.com/pydata/xarray/issues/1603#issuecomment-442636798). It would be nice to preserve a way to select a list of particular points that didn't require constructing explicit DataArray objects as the indexers. (But maybe this is a somewhat niche use-case and it isn't worth the trouble.) Let me make a tentative proposal: we should model a MultiIndex in xarray as exactly equivalent to a sparse multi-dimensional array, except with missing elements modeled implicitly (by omission) instead of explicitly (with NaN). If we do this, I think MultiIndex semantics could be defined to be identical to those of separable Index objects. One challenge is that we will definitely have to make some intentional deviations from the behavior of pandas, at least when dealing with array indexing of a MultiIndex level. Pandas has some strange behaviors with array indexing of a MultiIndex level, and I'm honestly not sure if they are bugs or features: - It ignores missing labels (https://github.com/pandas-dev/pandas/issues/15452) - It drops duplicate labels (https://github.com/pandas-dev/pandas/issues/19414) Fortunately, the MultiIndex data model is not that complicated, and it is quite straightforward to remap indexing results from sub-Index levels onto integer codes. I suspect we will find it easier to rewrite some of these routines than to change pandas, both because pandas may not agree with different semantics and because the pandas indexing code is an unholy mess. For example, we can reproduce the above issues: `python import pandas as pd index = pd.MultiIndex.from_arrays([['a', 'b', 'c']]) print(index.get_locs((['a', 'a'],))) # [0] print(index.get_locs((['a', 'd'],))) # [0]` We actually want something more like: ```python def get_locs(index, key): return index.get_indexer(pd.MultiIndex.from_product(key)) print(get_locs(index, (['a', 'a'],))) # [0, 0] print(get_locs(index, (['a', 'd'],))) # [0, -1] ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		262642978