issue_comments: 442710536
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/1603#issuecomment-442710536 | https://api.github.com/repos/pydata/xarray/issues/1603 | 442710536 | MDEyOklzc3VlQ29tbWVudDQ0MjcxMDUzNg== | 1217238 | 2018-11-29T05:23:33Z | 2018-11-29T05:25:48Z | MEMBER |
This needs an important caveat: it's only true that you use Let me make a tentative proposal: we should model a MultiIndex in xarray as exactly equivalent to a sparse multi-dimensional array, except with missing elements modeled implicitly (by omission) instead of explicitly (with NaN). If we do this, I think MultiIndex semantics could be defined to be identical to those of separable Index objects. One challenge is that we will definitely have to make some intentional deviations from the behavior of pandas, at least when dealing with array indexing of a MultiIndex level. Pandas has some strange behaviors with array indexing of a MultiIndex level, and I'm honestly not sure if they are bugs or features: - It ignores missing labels (https://github.com/pandas-dev/pandas/issues/15452) - It drops duplicate labels (https://github.com/pandas-dev/pandas/issues/19414) Fortunately, the MultiIndex data model is not that complicated, and it is quite straightforward to remap indexing results from sub-Index levels onto integer codes. I suspect we will find it easier to rewrite some of these routines than to change pandas, both because pandas may not agree with different semantics and because the pandas indexing code is an unholy mess. For example, we can reproduce the above issues:
print(get_locs(index, (['a', 'a'],))) # [0, 0] print(get_locs(index, (['a', 'd'],))) # [0, -1] ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
262642978 |