issue_comments: 327882144

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/1560#issuecomment-327882144	https://api.github.com/repos/pydata/xarray/issues/1560	327882144	MDEyOklzc3VlQ29tbWVudDMyNzg4MjE0NA==	1217238	2017-09-07T18:19:03Z	2017-09-07T18:19:03Z	MEMBER	Indeed, unstack does seem to be quite slow on large dimensions. For 1000x1000, I measure only 10ms to stack, but 4 seconds to unstack: `%time arr = DataArray(np.empty([1, 1000, 1000])).stack(flat_dim=['dim_1', 'dim_2']) %time arr.unstack('flat_dim')` Profiling suggests the culprit is the `reindex` call in `unstack()`: https://github.com/pydata/xarray/blob/98a05f11c6f38489c82e86c9e9df796e7fb65fd2/xarray/core/dataset.py#L1896 And, in turn, the call to `pandas.MultiIndex.get_indexer()`. To reproduce with pure pandas: ``` idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) %time idx1.get_indexer(idx2) CPU times: user 4.1 s, sys: 128 ms, total: 4.23 s Wall time: 4.41 s ``` We do need this reindex for correctness, but we should have a separate fast-path of some sort (either here or in pandas) to speed this up when the two indexes are identical. For example, note: ``` idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) %time idx1.equals(idx2) CPU times: user 19 ms, sys: 0 ns, total: 19 ms Wall time: 18.5 ms ``` I'll file an issue on the pandas tracker.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		255989233