html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/4186#issuecomment-652064154,https://api.github.com/repos/pydata/xarray/issues/4186,652064154,MDEyOklzc3VlQ29tbWVudDY1MjA2NDE1NA==,15720911,2020-06-30T21:48:33Z,2020-06-30T21:48:33Z,NONE,This intention of variables used constructing the Dataset looks a lot clearer now. Many thanks Stephan!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651984472,https://api.github.com/repos/pydata/xarray/issues/4186,651984472,MDEyOklzc3VlQ29tbWVudDY1MTk4NDQ3Mg==,15720911,2020-06-30T19:02:28Z,2020-06-30T19:02:28Z,NONE,"Sorry @shoyer, I didn't notice you have pushed new commits to #4184 and thought you meant to just remove the `DataFrame.set_index`. Your latest commits indeed give the correct result. My concern was when another person works on this and didn't get the context that `idx` might be different from `dataframe.index` and new bugs could potentially be introduced. Though consider the limited scope where we are maintaining both `idx` and `dataframe`, I guess it should be fine.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651674763,https://api.github.com/repos/pydata/xarray/issues/4186,651674763,MDEyOklzc3VlQ29tbWVudDY1MTY3NDc2Mw==,15720911,2020-06-30T09:24:13Z,2020-06-30T09:24:13Z,NONE,"Hi @shoyer , without `dataframe.set_index()`, `dataframe.index` can potentially be different from `idx` returned by `remove_unused_levels_categories`, this will lead to other problems. One example is the following `df`:
```
df = pd.DataFrame(
{
'lev1': pd.Series(
['b', 'a'], dtype=pd.CategoricalDtype(['c', 'b', 'a'], ordered=True)
),
'lev2': 'foo',
'C1': [0, 2],
'C2': [1, 3],
}
).set_index(['lev1', 'lev2'])
```
I agree it will be better if we can maintain the order from `df` to `xr.Dataset`, but I think we should never work with a copy of `idx` which is different from `dataframe.index`, as this will lead to hard to debug problems due to ""surprising"" behavior `pandas` does.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-651424721,https://api.github.com/repos/pydata/xarray/issues/4186,651424721,MDEyOklzc3VlQ29tbWVudDY1MTQyNDcyMQ==,15720911,2020-06-29T23:40:41Z,2020-06-29T23:41:45Z,NONE,"Hi @shoyer, sorry I got you confused, I should have run your code at first place. You code removes the problematic `dataframe.reindex` in `Dataset._set_numpy_data_from_dataframe`, but there is indeed another place causing the problem, which is actually already fixed (but not released yet) by https://github.com/pydata/xarray/pull/3953/files#diff-921db548d18a549f6381818ed08298c9L4607-L4608
Using pzhlobi's example `df` with xarray 0.15.1 (incorrect result):
```
df.to_xarray()
Dimensions: (lev1: 2, lev2: 1)
Coordinates:
* lev1 (lev1) object 'b' 'a'
* lev2 (lev2) object 'foo'
Data variables:
C1 (lev1, lev2) int64 2 0
C2 (lev1, lev2) int64 3 1
```
Using the same `df` with both #3953 and #4184 (correct result):
```
df.to_xarray()
Dimensions: (lev1: 2, lev2: 1)
Coordinates:
* lev1 (lev1) object 'a' 'b'
* lev2 (lev2) object 'foo'
Data variables:
C1 (lev1, lev2) int64 2 0
C2 (lev1, lev2) int64 3 1
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560
https://github.com/pydata/xarray/issues/4186#issuecomment-650738680,https://api.github.com/repos/pydata/xarray/issues/4186,650738680,MDEyOklzc3VlQ29tbWVudDY1MDczODY4MA==,15720911,2020-06-28T11:37:20Z,2020-06-28T11:37:20Z,NONE,"It seems the problem here is in `Dataset.from_dataframe` the `dims` and `coords` are created with `df.index.levels` which is unsorted: https://github.com/pydata/xarray/blob/732750a06aef2025b206ba6ff765f5acc53bfa25/xarray/core/dataset.py#L4642-L4643
Then in `Dataset._set_numpy_data_from_dataframe`, the `pd.MultiIndex.from_product` and `dataframe.reindex` unintentionally sort the `dataframe` by index:
https://github.com/pydata/xarray/blob/732750a06aef2025b206ba6ff765f5acc53bfa25/xarray/core/dataset.py#L4588-L4589
Besides the perf improvement it provides, #4184 seems also have a nice side effect fixing this issue.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,646716560