html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4186#issuecomment-652032780,https://api.github.com/repos/pydata/xarray/issues/4186,652032780,MDEyOklzc3VlQ29tbWVudDY1MjAzMjc4MA==,1217238,2020-06-30T20:44:00Z,2020-06-30T20:44:00Z,MEMBER,"> > My concern was when another person works on this and didn't get the context that `idx` might be different from `dataframe.index` and new bugs could potentially be introduced > > Let me see if I can rewrite the helper functions to avoid passing around a `DataFrame` This was a good suggestion. Done in https://github.com/pydata/xarray/pull/4184/commits/96b544b5a59894359a35680151af71c0226f0505","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-652018527,https://api.github.com/repos/pydata/xarray/issues/4186,652018527,MDEyOklzc3VlQ29tbWVudDY1MjAxODUyNw==,1217238,2020-06-30T20:13:44Z,2020-06-30T20:13:44Z,MEMBER,"> My concern was when another person works on this and didn't get the context that `idx` might be different from `dataframe.index` and new bugs could potentially be introduced Let me see if I can rewrite the helper functions to avoid passing around a `DataFrame`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651905098,https://api.github.com/repos/pydata/xarray/issues/4186,651905098,MDEyOklzc3VlQ29tbWVudDY1MTkwNTA5OA==,1217238,2020-06-30T16:29:10Z,2020-06-30T16:44:02Z,MEMBER,"@Li9htmare I'm not sure I follow your example. #4184 does remove the use of `DataFrame.set_index()`, but it also removes any subsequent use of `dataframe.index` -- it always uses the separately processed index. Is there something specific that you are worried about going wrong with your latest example? For what it's worth, here's what `to_xarray()` does with the current version of #4184: ``` In [4]: df.to_xarray() Out[4]: Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'b' 'a' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 0 2 C2 (lev1, lev2) int64 1 3 In [5]: df.to_xarray().indexes Out[5]: lev1: CategoricalIndex(['b', 'a'], categories=['b', 'a'], ordered=True, name='lev1', dtype='category') lev2: Index(['foo'], dtype='object', name='lev2') ``` I *think* this is doing the right thing already?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651467248,https://api.github.com/repos/pydata/xarray/issues/4186,651467248,MDEyOklzc3VlQ29tbWVudDY1MTQ2NzI0OA==,1217238,2020-06-30T01:41:36Z,2020-06-30T01:41:36Z,MEMBER,"The sorting seems to be a separate matter, caused by `dataframe.set_index()` inside our `remove_unused_levels_categories` function. I think we can remove that, which will fix the sorting issue when removing unused levels. Then the result will be the desired: ``` df.to_xarray() Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'b' 'a' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 0 2 C2 (lev1, lev2) int64 1 3 ```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651458105,https://api.github.com/repos/pydata/xarray/issues/4186,651458105,MDEyOklzc3VlQ29tbWVudDY1MTQ1ODEwNQ==,1217238,2020-06-30T01:14:45Z,2020-06-30T01:14:45Z,MEMBER,"Actually, I realize now that this is basically the same issue as https://github.com/pydata/xarray/issues/2619 If I remove the use of `removed_unused_levels_categories` from `from_dataframe`, then I get the same behavior that we considered a bug in that issue: ``` In [5]: ds.isel(xy=ds['x'] < 4).to_pandas().to_xarray() Out[5]: array([[ 0., 1., 2., 3., 4.], [ 5., 6., 7., 8., 9.], [10., 11., 12., 13., 14.], [15., 16., 17., 18., 19.], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan]]) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 * y (y) int64 0 1 2 3 4 ``` So maybe it is more consistent to keep calling `remove_unused_levels()`, which somewhat surprisingly sorts MultiIndex levels.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651454795,https://api.github.com/repos/pydata/xarray/issues/4186,651454795,MDEyOklzc3VlQ29tbWVudDY1MTQ1NDc5NQ==,6815844,2020-06-30T01:06:34Z,2020-06-30T01:06:34Z,MEMBER,I agree that it's better not to sort.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651453863,https://api.github.com/repos/pydata/xarray/issues/4186,651453863,MDEyOklzc3VlQ29tbWVudDY1MTQ1Mzg2Mw==,1217238,2020-06-30T01:03:40Z,2020-06-30T01:03:40Z,MEMBER,"I verified that #4184 fixes the tests added for #3953 even after removing the call to `remove_unused_levels_categories()`. The main question is what behavior we want to do have: Should `from_dataframe` preserve index levels exactly, or should it sort them first? I think it's better to not to sort (but of course it's better to sort than to get the wrong order).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651438776,https://api.github.com/repos/pydata/xarray/issues/4186,651438776,MDEyOklzc3VlQ29tbWVudDY1MTQzODc3Ng==,6815844,2020-06-30T00:21:43Z,2020-06-30T00:21:43Z,MEMBER,"I think the #3953 fixes the case where the multiindex has unused levels. I had no better idea than #3953, but if it works without #3953, it would be better ;)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651428394,https://api.github.com/repos/pydata/xarray/issues/4186,651428394,MDEyOklzc3VlQ29tbWVudDY1MTQyODM5NA==,1217238,2020-06-29T23:51:49Z,2020-06-29T23:51:49Z,MEMBER,"Thanks for clarifying! This raises an interesting question for #4184: do we want to keep @fujiisoup's fix from #3953 or not? If we remove @fujiisoup's fix, then the output we see is: ``` df.to_xarray() Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'b' 'a' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 0 2 C2 (lev1, lev2) int64 1 3 ``` This is also *correct* -- coordinates match up with values -- but the order of the result is different from what is currently on master.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651402838,https://api.github.com/repos/pydata/xarray/issues/4186,651402838,MDEyOklzc3VlQ29tbWVudDY1MTQwMjgzOA==,1217238,2020-06-29T22:28:00Z,2020-06-29T22:28:00Z,MEMBER,"Hi @pzhlobi @Li9htmare -- thanks for raising this issue. Could you kindly clarify for me exactly what behavior you think xarray *should* do? The results are indeed reordered currently, but as far as I can tell the pairing between coordinators and values remains consistent. When I test this myself, I see the same behavior (documented in the first post) either with or without my changes from #4184.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560