html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1560#issuecomment-412407141,https://api.github.com/repos/pydata/xarray/issues/1560,412407141,MDEyOklzc3VlQ29tbWVudDQxMjQwNzE0MQ==,1217238,2018-08-13T04:44:09Z,2018-08-13T04:44:09Z,MEMBER,"@maahn yes, that would look fine to me. Please add an ASV benchmark so we can monitor this for regressions:
https://github.com/pydata/xarray/tree/master/asv_bench/benchmarks
It would be nice to push this up this optimization into `reindex_variables`, but it's not necessary (and I'm not even sure it could be done as efficiently as the equals check in `unstack`).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,255989233
https://github.com/pydata/xarray/issues/1560#issuecomment-327896138,https://api.github.com/repos/pydata/xarray/issues/1560,327896138,MDEyOklzc3VlQ29tbWVudDMyNzg5NjEzOA==,1217238,2017-09-07T19:12:50Z,2017-09-07T19:12:50Z,MEMBER,Though possibly we should just be using `Index.reindex` directly inside `reindex_variables` (in `xarray/core/alignment.py`) instead of calling `get_indexer`.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,255989233
https://github.com/pydata/xarray/issues/1560#issuecomment-327895477,https://api.github.com/repos/pydata/xarray/issues/1560,327895477,MDEyOklzc3VlQ29tbWVudDMyNzg5NTQ3Nw==,1217238,2017-09-07T19:10:03Z,2017-09-07T19:10:03Z,MEMBER,"@davidh-ssec Yes, but we need it for `MultiIndex.get_indexer`, not `MultiIndex.reindex`:
https://github.com/pandas-dev/pandas/blob/ee6185e2fb9461632949f3ba52a28b37a1f7296e/pandas/core/indexes/multi.py#L1781","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,255989233
https://github.com/pydata/xarray/issues/1560#issuecomment-327890644,https://api.github.com/repos/pydata/xarray/issues/1560,327890644,MDEyOklzc3VlQ29tbWVudDMyNzg5MDY0NA==,1217238,2017-09-07T18:50:50Z,2017-09-07T18:50:50Z,MEMBER,"The MultiIndex speed/memory improvements seem to be around even in pandas 0.20.3, the latest release. So definitely make sure your pandas install is up to date here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,255989233
https://github.com/pydata/xarray/issues/1560#issuecomment-327886998,https://api.github.com/repos/pydata/xarray/issues/1560,327886998,MDEyOklzc3VlQ29tbWVudDMyNzg4Njk5OA==,1217238,2017-09-07T18:36:48Z,2017-09-07T18:36:48Z,MEMBER,"This is still somewhat annoyingly slow, but for a 8000 x 9000 MultiIndex on pandas 0.21-dev, I measure 41 seconds for `get_indexer()` vs 3.8 seconds for `equals()`.
So a fast-path might still be a good idea, but to get to truly interactive speeds, we might need a faster way to validate a MultiIndex as equal to the outer-product of its levels. Potentially we could save some metadata in `PandasIndexAdapter` as part of `stack()` to indicate that the levels are from an outer product:
https://github.com/pydata/xarray/blob/98a05f11c6f38489c82e86c9e9df796e7fb65fd2/xarray/core/indexing.py#L502-L505","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,255989233
https://github.com/pydata/xarray/issues/1560#issuecomment-327884467,https://api.github.com/repos/pydata/xarray/issues/1560,327884467,MDEyOklzc3VlQ29tbWVudDMyNzg4NDQ2Nw==,1217238,2017-09-07T18:27:27Z,2017-09-07T18:27:27Z,MEMBER,"Actually, the timings above were with pandas 0.19. It's still somewhat slow using the dev version of pandas, but it's more like 10x slower rather than 100x slower:
```
In [4]: idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)])
...: idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)])
...: %time idx1.get_indexer(idx2)
...:
CPU times: user 215 ms, sys: 81.8 ms, total: 297 ms
Wall time: 319 ms
Out[4]: array([ 0, 1, 2, ..., 999997, 999998, 999999])
In [5]: idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)])
...: idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)])
...: %time idx1.equals(idx2)
...:
CPU times: user 19.8 ms, sys: 9.29 ms, total: 29.1 ms
Wall time: 32.1 ms
Out[5]: True
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,255989233
https://github.com/pydata/xarray/issues/1560#issuecomment-327882144,https://api.github.com/repos/pydata/xarray/issues/1560,327882144,MDEyOklzc3VlQ29tbWVudDMyNzg4MjE0NA==,1217238,2017-09-07T18:19:03Z,2017-09-07T18:19:03Z,MEMBER,"Indeed, unstack does seem to be quite slow on large dimensions. For 1000x1000, I measure only 10ms to stack, but 4 seconds to unstack:
```
%time arr = DataArray(np.empty([1, 1000, 1000])).stack(flat_dim=['dim_1', 'dim_2'])
%time arr.unstack('flat_dim')
```
Profiling suggests the culprit is the `reindex` call in `unstack()`:
https://github.com/pydata/xarray/blob/98a05f11c6f38489c82e86c9e9df796e7fb65fd2/xarray/core/dataset.py#L1896
And, in turn, the call to `pandas.MultiIndex.get_indexer()`. To reproduce with pure pandas:
```
idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)])
idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)])
%time idx1.get_indexer(idx2)
# CPU times: user 4.1 s, sys: 128 ms, total: 4.23 s
# Wall time: 4.41 s
```
We do need this reindex for correctness, but we should have a separate fast-path of some sort (either here or in pandas) to speed this up when the two indexes are identical. For example, note:
```
idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)])
idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)])
%time idx1.equals(idx2)
# CPU times: user 19 ms, sys: 0 ns, total: 19 ms
# Wall time: 18.5 ms
```
I'll file an issue on the pandas tracker.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,255989233