html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4186#issuecomment-652064154,https://api.github.com/repos/pydata/xarray/issues/4186,652064154,MDEyOklzc3VlQ29tbWVudDY1MjA2NDE1NA==,15720911,2020-06-30T21:48:33Z,2020-06-30T21:48:33Z,NONE,This intention of variables used constructing the Dataset looks a lot clearer now. Many thanks Stephan!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-652032780,https://api.github.com/repos/pydata/xarray/issues/4186,652032780,MDEyOklzc3VlQ29tbWVudDY1MjAzMjc4MA==,1217238,2020-06-30T20:44:00Z,2020-06-30T20:44:00Z,MEMBER,"> > My concern was when another person works on this and didn't get the context that `idx` might be different from `dataframe.index` and new bugs could potentially be introduced > > Let me see if I can rewrite the helper functions to avoid passing around a `DataFrame` This was a good suggestion. Done in https://github.com/pydata/xarray/pull/4184/commits/96b544b5a59894359a35680151af71c0226f0505","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-652018527,https://api.github.com/repos/pydata/xarray/issues/4186,652018527,MDEyOklzc3VlQ29tbWVudDY1MjAxODUyNw==,1217238,2020-06-30T20:13:44Z,2020-06-30T20:13:44Z,MEMBER,"> My concern was when another person works on this and didn't get the context that `idx` might be different from `dataframe.index` and new bugs could potentially be introduced Let me see if I can rewrite the helper functions to avoid passing around a `DataFrame`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651984472,https://api.github.com/repos/pydata/xarray/issues/4186,651984472,MDEyOklzc3VlQ29tbWVudDY1MTk4NDQ3Mg==,15720911,2020-06-30T19:02:28Z,2020-06-30T19:02:28Z,NONE,"Sorry @shoyer, I didn't notice you have pushed new commits to #4184 and thought you meant to just remove the `DataFrame.set_index`. Your latest commits indeed give the correct result. My concern was when another person works on this and didn't get the context that `idx` might be different from `dataframe.index` and new bugs could potentially be introduced. Though consider the limited scope where we are maintaining both `idx` and `dataframe`, I guess it should be fine.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651905098,https://api.github.com/repos/pydata/xarray/issues/4186,651905098,MDEyOklzc3VlQ29tbWVudDY1MTkwNTA5OA==,1217238,2020-06-30T16:29:10Z,2020-06-30T16:44:02Z,MEMBER,"@Li9htmare I'm not sure I follow your example. #4184 does remove the use of `DataFrame.set_index()`, but it also removes any subsequent use of `dataframe.index` -- it always uses the separately processed index. Is there something specific that you are worried about going wrong with your latest example? For what it's worth, here's what `to_xarray()` does with the current version of #4184: ``` In [4]: df.to_xarray() Out[4]: Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'b' 'a' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 0 2 C2 (lev1, lev2) int64 1 3 In [5]: df.to_xarray().indexes Out[5]: lev1: CategoricalIndex(['b', 'a'], categories=['b', 'a'], ordered=True, name='lev1', dtype='category') lev2: Index(['foo'], dtype='object', name='lev2') ``` I *think* this is doing the right thing already?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651674763,https://api.github.com/repos/pydata/xarray/issues/4186,651674763,MDEyOklzc3VlQ29tbWVudDY1MTY3NDc2Mw==,15720911,2020-06-30T09:24:13Z,2020-06-30T09:24:13Z,NONE,"Hi @shoyer , without `dataframe.set_index()`, `dataframe.index` can potentially be different from `idx` returned by `remove_unused_levels_categories`, this will lead to other problems. One example is the following `df`: ``` df = pd.DataFrame( { 'lev1': pd.Series( ['b', 'a'], dtype=pd.CategoricalDtype(['c', 'b', 'a'], ordered=True) ), 'lev2': 'foo', 'C1': [0, 2], 'C2': [1, 3], } ).set_index(['lev1', 'lev2']) ``` I agree it will be better if we can maintain the order from `df` to `xr.Dataset`, but I think we should never work with a copy of `idx` which is different from `dataframe.index`, as this will lead to hard to debug problems due to ""surprising"" behavior `pandas` does.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651467248,https://api.github.com/repos/pydata/xarray/issues/4186,651467248,MDEyOklzc3VlQ29tbWVudDY1MTQ2NzI0OA==,1217238,2020-06-30T01:41:36Z,2020-06-30T01:41:36Z,MEMBER,"The sorting seems to be a separate matter, caused by `dataframe.set_index()` inside our `remove_unused_levels_categories` function. I think we can remove that, which will fix the sorting issue when removing unused levels. Then the result will be the desired: ``` df.to_xarray() Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'b' 'a' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 0 2 C2 (lev1, lev2) int64 1 3 ```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651458105,https://api.github.com/repos/pydata/xarray/issues/4186,651458105,MDEyOklzc3VlQ29tbWVudDY1MTQ1ODEwNQ==,1217238,2020-06-30T01:14:45Z,2020-06-30T01:14:45Z,MEMBER,"Actually, I realize now that this is basically the same issue as https://github.com/pydata/xarray/issues/2619 If I remove the use of `removed_unused_levels_categories` from `from_dataframe`, then I get the same behavior that we considered a bug in that issue: ``` In [5]: ds.isel(xy=ds['x'] < 4).to_pandas().to_xarray() Out[5]: array([[ 0., 1., 2., 3., 4.], [ 5., 6., 7., 8., 9.], [10., 11., 12., 13., 14.], [15., 16., 17., 18., 19.], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan]]) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 * y (y) int64 0 1 2 3 4 ``` So maybe it is more consistent to keep calling `remove_unused_levels()`, which somewhat surprisingly sorts MultiIndex levels.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651454795,https://api.github.com/repos/pydata/xarray/issues/4186,651454795,MDEyOklzc3VlQ29tbWVudDY1MTQ1NDc5NQ==,6815844,2020-06-30T01:06:34Z,2020-06-30T01:06:34Z,MEMBER,I agree that it's better not to sort.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651453863,https://api.github.com/repos/pydata/xarray/issues/4186,651453863,MDEyOklzc3VlQ29tbWVudDY1MTQ1Mzg2Mw==,1217238,2020-06-30T01:03:40Z,2020-06-30T01:03:40Z,MEMBER,"I verified that #4184 fixes the tests added for #3953 even after removing the call to `remove_unused_levels_categories()`. The main question is what behavior we want to do have: Should `from_dataframe` preserve index levels exactly, or should it sort them first? I think it's better to not to sort (but of course it's better to sort than to get the wrong order).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651438776,https://api.github.com/repos/pydata/xarray/issues/4186,651438776,MDEyOklzc3VlQ29tbWVudDY1MTQzODc3Ng==,6815844,2020-06-30T00:21:43Z,2020-06-30T00:21:43Z,MEMBER,"I think the #3953 fixes the case where the multiindex has unused levels. I had no better idea than #3953, but if it works without #3953, it would be better ;)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651428394,https://api.github.com/repos/pydata/xarray/issues/4186,651428394,MDEyOklzc3VlQ29tbWVudDY1MTQyODM5NA==,1217238,2020-06-29T23:51:49Z,2020-06-29T23:51:49Z,MEMBER,"Thanks for clarifying! This raises an interesting question for #4184: do we want to keep @fujiisoup's fix from #3953 or not? If we remove @fujiisoup's fix, then the output we see is: ``` df.to_xarray() Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'b' 'a' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 0 2 C2 (lev1, lev2) int64 1 3 ``` This is also *correct* -- coordinates match up with values -- but the order of the result is different from what is currently on master.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651424721,https://api.github.com/repos/pydata/xarray/issues/4186,651424721,MDEyOklzc3VlQ29tbWVudDY1MTQyNDcyMQ==,15720911,2020-06-29T23:40:41Z,2020-06-29T23:41:45Z,NONE,"Hi @shoyer, sorry I got you confused, I should have run your code at first place. You code removes the problematic `dataframe.reindex` in `Dataset._set_numpy_data_from_dataframe`, but there is indeed another place causing the problem, which is actually already fixed (but not released yet) by https://github.com/pydata/xarray/pull/3953/files#diff-921db548d18a549f6381818ed08298c9L4607-L4608 Using pzhlobi's example `df` with xarray 0.15.1 (incorrect result): ``` df.to_xarray() Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'b' 'a' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 2 0 C2 (lev1, lev2) int64 3 1 ``` Using the same `df` with both #3953 and #4184 (correct result): ``` df.to_xarray() Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'a' 'b' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 2 0 C2 (lev1, lev2) int64 3 1 ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-651402838,https://api.github.com/repos/pydata/xarray/issues/4186,651402838,MDEyOklzc3VlQ29tbWVudDY1MTQwMjgzOA==,1217238,2020-06-29T22:28:00Z,2020-06-29T22:28:00Z,MEMBER,"Hi @pzhlobi @Li9htmare -- thanks for raising this issue. Could you kindly clarify for me exactly what behavior you think xarray *should* do? The results are indeed reordered currently, but as far as I can tell the pairing between coordinators and values remains consistent. When I test this myself, I see the same behavior (documented in the first post) either with or without my changes from #4184.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560 https://github.com/pydata/xarray/issues/4186#issuecomment-650738680,https://api.github.com/repos/pydata/xarray/issues/4186,650738680,MDEyOklzc3VlQ29tbWVudDY1MDczODY4MA==,15720911,2020-06-28T11:37:20Z,2020-06-28T11:37:20Z,NONE,"It seems the problem here is in `Dataset.from_dataframe` the `dims` and `coords` are created with `df.index.levels` which is unsorted: https://github.com/pydata/xarray/blob/732750a06aef2025b206ba6ff765f5acc53bfa25/xarray/core/dataset.py#L4642-L4643 Then in `Dataset._set_numpy_data_from_dataframe`, the `pd.MultiIndex.from_product` and `dataframe.reindex` unintentionally sort the `dataframe` by index: https://github.com/pydata/xarray/blob/732750a06aef2025b206ba6ff765f5acc53bfa25/xarray/core/dataset.py#L4588-L4589 Besides the perf improvement it provides, #4184 seems also have a nice side effect fixing this issue.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560