home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 442710536

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1603#issuecomment-442710536 https://api.github.com/repos/pydata/xarray/issues/1603 442710536 MDEyOklzc3VlQ29tbWVudDQ0MjcxMDUzNg== 1217238 2018-11-29T05:23:33Z 2018-11-29T05:25:48Z MEMBER

There's no need to support indexing like ds.sel(multi=list_of_pairs). Indexing like ds.sel(x=..., y=...) solves the same use case and looks nicer.

This needs an important caveat: it's only true that you use ds.sel(x=..., y=...) to emulate ds.sel(multi=list_of_pairs) if you do explicit vectorized indexing like in @max-sixty's example above (https://github.com/pydata/xarray/issues/1603#issuecomment-442636798). It would be nice to preserve a way to select a list of particular points that didn't require constructing explicit DataArray objects as the indexers. (But maybe this is a somewhat niche use-case and it isn't worth the trouble.)

Let me make a tentative proposal: we should model a MultiIndex in xarray as exactly equivalent to a sparse multi-dimensional array, except with missing elements modeled implicitly (by omission) instead of explicitly (with NaN). If we do this, I think MultiIndex semantics could be defined to be identical to those of separable Index objects.

One challenge is that we will definitely have to make some intentional deviations from the behavior of pandas, at least when dealing with array indexing of a MultiIndex level. Pandas has some strange behaviors with array indexing of a MultiIndex level, and I'm honestly not sure if they are bugs or features: - It ignores missing labels (https://github.com/pandas-dev/pandas/issues/15452) - It drops duplicate labels (https://github.com/pandas-dev/pandas/issues/19414)

Fortunately, the MultiIndex data model is not that complicated, and it is quite straightforward to remap indexing results from sub-Index levels onto integer codes. I suspect we will find it easier to rewrite some of these routines than to change pandas, both because pandas may not agree with different semantics and because the pandas indexing code is an unholy mess.

For example, we can reproduce the above issues: python import pandas as pd index = pd.MultiIndex.from_arrays([['a', 'b', 'c']]) print(index.get_locs((['a', 'a'],))) # [0] print(index.get_locs((['a', 'd'],))) # [0] We actually want something more like: ```python def get_locs(index, key): return index.get_indexer(pd.MultiIndex.from_product(key))

print(get_locs(index, (['a', 'a'],))) # [0, 0] print(get_locs(index, (['a', 'd'],))) # [0, -1] ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  262642978
Powered by Datasette · Queries took 0.92ms · About: xarray-datasette