html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/4186#issuecomment-652064154,https://api.github.com/repos/pydata/xarray/issues/4186,652064154,MDEyOklzc3VlQ29tbWVudDY1MjA2NDE1NA==,15720911,2020-06-30T21:48:33Z,2020-06-30T21:48:33Z,NONE,This intention of variables used constructing the Dataset looks a lot clearer now. Many thanks Stephan!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-652032780,https://api.github.com/repos/pydata/xarray/issues/4186,652032780,MDEyOklzc3VlQ29tbWVudDY1MjAzMjc4MA==,1217238,2020-06-30T20:44:00Z,2020-06-30T20:44:00Z,MEMBER,"> > My concern was when another person works on this and didn't get the context that `idx` might be different from `dataframe.index` and new bugs could potentially be introduced
> 
> Let me see if I can rewrite the helper functions to avoid passing around a `DataFrame`

This was a good suggestion. Done in https://github.com/pydata/xarray/pull/4184/commits/96b544b5a59894359a35680151af71c0226f0505","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-652018527,https://api.github.com/repos/pydata/xarray/issues/4186,652018527,MDEyOklzc3VlQ29tbWVudDY1MjAxODUyNw==,1217238,2020-06-30T20:13:44Z,2020-06-30T20:13:44Z,MEMBER,"> My concern was when another person works on this and didn't get the context that `idx` might be different from `dataframe.index` and new bugs could potentially be introduced

Let me see if I can rewrite the helper functions to avoid passing around a `DataFrame`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651984472,https://api.github.com/repos/pydata/xarray/issues/4186,651984472,MDEyOklzc3VlQ29tbWVudDY1MTk4NDQ3Mg==,15720911,2020-06-30T19:02:28Z,2020-06-30T19:02:28Z,NONE,"Sorry @shoyer, I didn't notice you have pushed new commits to #4184 and thought you meant to just remove the `DataFrame.set_index`. Your latest commits indeed give the correct result. My concern was when another person works on this and didn't get the context that `idx` might be different from `dataframe.index` and new bugs could potentially be introduced. Though consider the limited scope where we are maintaining both `idx` and `dataframe`, I guess it should be fine.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651905098,https://api.github.com/repos/pydata/xarray/issues/4186,651905098,MDEyOklzc3VlQ29tbWVudDY1MTkwNTA5OA==,1217238,2020-06-30T16:29:10Z,2020-06-30T16:44:02Z,MEMBER,"@Li9htmare I'm not sure I follow your example. #4184 does remove the use of `DataFrame.set_index()`, but it also removes any subsequent use of `dataframe.index` -- it always uses the separately processed index.

Is there something specific that you are worried about going wrong with your latest example? For what it's worth, here's what `to_xarray()` does with the current version of #4184:
```
In [4]: df.to_xarray()
Out[4]:
<xarray.Dataset>
Dimensions:  (lev1: 2, lev2: 1)
Coordinates:
  * lev1     (lev1) object 'b' 'a'
  * lev2     (lev2) object 'foo'
Data variables:
    C1       (lev1, lev2) int64 0 2
    C2       (lev1, lev2) int64 1 3

In [5]: df.to_xarray().indexes
Out[5]:
lev1: CategoricalIndex(['b', 'a'], categories=['b', 'a'], ordered=True, name='lev1', dtype='category')
lev2: Index(['foo'], dtype='object', name='lev2')
```

I *think* this is doing the right thing already?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651674763,https://api.github.com/repos/pydata/xarray/issues/4186,651674763,MDEyOklzc3VlQ29tbWVudDY1MTY3NDc2Mw==,15720911,2020-06-30T09:24:13Z,2020-06-30T09:24:13Z,NONE,"Hi @shoyer , without `dataframe.set_index()`, `dataframe.index` can potentially be different from `idx` returned by `remove_unused_levels_categories`, this will lead to other problems. One example is the following `df`:
```
df = pd.DataFrame(
    {
        'lev1': pd.Series(
            ['b', 'a'], dtype=pd.CategoricalDtype(['c', 'b', 'a'], ordered=True)
        ),
        'lev2': 'foo',
        'C1': [0, 2],
        'C2': [1, 3],
    }
).set_index(['lev1', 'lev2'])
```

I agree it will be better if we can maintain the order from `df` to `xr.Dataset`, but I think we should never work with a copy of `idx` which is different from `dataframe.index`, as this will lead to hard to debug problems due to ""surprising"" behavior `pandas` does.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651467248,https://api.github.com/repos/pydata/xarray/issues/4186,651467248,MDEyOklzc3VlQ29tbWVudDY1MTQ2NzI0OA==,1217238,2020-06-30T01:41:36Z,2020-06-30T01:41:36Z,MEMBER,"The sorting seems to be a separate matter, caused by `dataframe.set_index()` inside our `remove_unused_levels_categories` function. I think we can remove that, which will fix the sorting issue when removing unused levels. Then the result will be the desired:
```
df.to_xarray()
 <xarray.Dataset>
Dimensions:  (lev1: 2, lev2: 1)
Coordinates:
  * lev1     (lev1) object 'b' 'a'
  * lev2     (lev2) object 'foo'
Data variables:
    C1       (lev1, lev2) int64 0 2
    C2       (lev1, lev2) int64 1 3
```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651458105,https://api.github.com/repos/pydata/xarray/issues/4186,651458105,MDEyOklzc3VlQ29tbWVudDY1MTQ1ODEwNQ==,1217238,2020-06-30T01:14:45Z,2020-06-30T01:14:45Z,MEMBER,"Actually, I realize now that this is basically the same issue as https://github.com/pydata/xarray/issues/2619

If I remove the use of `removed_unused_levels_categories` from `from_dataframe`, then I get the same behavior that we considered a bug in that issue:
```
In [5]: ds.isel(xy=ds['x'] < 4).to_pandas().to_xarray()
Out[5]:
<xarray.DataArray (x: 8, y: 5)>
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.],
       [nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan]])
Coordinates:
  * x        (x) int64 0 1 2 3 4 5 6 7
  * y        (y) int64 0 1 2 3 4
```

So maybe it is more consistent to keep calling `remove_unused_levels()`, which somewhat surprisingly sorts MultiIndex levels.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651454795,https://api.github.com/repos/pydata/xarray/issues/4186,651454795,MDEyOklzc3VlQ29tbWVudDY1MTQ1NDc5NQ==,6815844,2020-06-30T01:06:34Z,2020-06-30T01:06:34Z,MEMBER,I agree that it's better not to sort.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651453863,https://api.github.com/repos/pydata/xarray/issues/4186,651453863,MDEyOklzc3VlQ29tbWVudDY1MTQ1Mzg2Mw==,1217238,2020-06-30T01:03:40Z,2020-06-30T01:03:40Z,MEMBER,"I verified that #4184 fixes the tests added for #3953 even after removing the call to `remove_unused_levels_categories()`.

The main question is what behavior we want to do have: Should `from_dataframe` preserve index levels exactly, or should it sort them first?

I think it's better to not to sort (but of course it's better to sort than to get the wrong order).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651438776,https://api.github.com/repos/pydata/xarray/issues/4186,651438776,MDEyOklzc3VlQ29tbWVudDY1MTQzODc3Ng==,6815844,2020-06-30T00:21:43Z,2020-06-30T00:21:43Z,MEMBER,"I think the #3953 fixes the case where the multiindex has unused levels.
I had no better idea than #3953, but if it works without #3953, it would be better ;)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651428394,https://api.github.com/repos/pydata/xarray/issues/4186,651428394,MDEyOklzc3VlQ29tbWVudDY1MTQyODM5NA==,1217238,2020-06-29T23:51:49Z,2020-06-29T23:51:49Z,MEMBER,"Thanks for clarifying!

This raises an interesting question for #4184: do we want to keep @fujiisoup's fix from #3953 or not?

If we remove @fujiisoup's fix, then the output we see is:
```
df.to_xarray()
 <xarray.Dataset>
Dimensions:  (lev1: 2, lev2: 1)
Coordinates:
  * lev1     (lev1) object 'b' 'a'
  * lev2     (lev2) object 'foo'
Data variables:
    C1       (lev1, lev2) int64 0 2
    C2       (lev1, lev2) int64 1 3
```

This is also *correct* -- coordinates match up with values -- but the order of the result is different from what is currently on master.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651424721,https://api.github.com/repos/pydata/xarray/issues/4186,651424721,MDEyOklzc3VlQ29tbWVudDY1MTQyNDcyMQ==,15720911,2020-06-29T23:40:41Z,2020-06-29T23:41:45Z,NONE,"Hi @shoyer, sorry I got you confused, I should have run your code at first place. You code removes the problematic `dataframe.reindex` in `Dataset._set_numpy_data_from_dataframe`, but there is indeed another place causing the problem, which is actually already fixed (but not released yet) by https://github.com/pydata/xarray/pull/3953/files#diff-921db548d18a549f6381818ed08298c9L4607-L4608

Using pzhlobi's example `df` with xarray 0.15.1 (incorrect result):
```
df.to_xarray()
<xarray.Dataset>
Dimensions:  (lev1: 2, lev2: 1)
Coordinates:
  * lev1     (lev1) object 'b' 'a'
  * lev2     (lev2) object 'foo'
Data variables:
    C1       (lev1, lev2) int64 2 0
    C2       (lev1, lev2) int64 3 1
```

Using the same `df` with both #3953 and #4184 (correct result):
```
df.to_xarray()
<xarray.Dataset>
Dimensions:  (lev1: 2, lev2: 1)
Coordinates:
  * lev1     (lev1) object 'a' 'b'
  * lev2     (lev2) object 'foo'
Data variables:
    C1       (lev1, lev2) int64 2 0
    C2       (lev1, lev2) int64 3 1
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651402838,https://api.github.com/repos/pydata/xarray/issues/4186,651402838,MDEyOklzc3VlQ29tbWVudDY1MTQwMjgzOA==,1217238,2020-06-29T22:28:00Z,2020-06-29T22:28:00Z,MEMBER,"Hi @pzhlobi @Li9htmare -- thanks for raising this issue.

Could you kindly clarify for me exactly what behavior you think xarray *should* do? The results are indeed reordered currently, but as far as I can tell the pairing between coordinators and values remains consistent.

When I test this myself, I see the same behavior (documented in the first post) either with or without my changes from #4184.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-650738680,https://api.github.com/repos/pydata/xarray/issues/4186,650738680,MDEyOklzc3VlQ29tbWVudDY1MDczODY4MA==,15720911,2020-06-28T11:37:20Z,2020-06-28T11:37:20Z,NONE,"It seems the problem here is in `Dataset.from_dataframe` the `dims` and `coords` are created with `df.index.levels` which is unsorted: https://github.com/pydata/xarray/blob/732750a06aef2025b206ba6ff765f5acc53bfa25/xarray/core/dataset.py#L4642-L4643

Then in `Dataset._set_numpy_data_from_dataframe`, the `pd.MultiIndex.from_product` and `dataframe.reindex` unintentionally sort the `dataframe` by index:
https://github.com/pydata/xarray/blob/732750a06aef2025b206ba6ff765f5acc53bfa25/xarray/core/dataset.py#L4588-L4589

Besides the perf improvement it provides, #4184 seems also have a nice side effect fixing this issue.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560