issue_comments: 824459878

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/5202#issuecomment-824459878	https://api.github.com/repos/pydata/xarray/issues/5202	824459878	MDEyOklzc3VlQ29tbWVudDgyNDQ1OTg3OA==	1217238	2021-04-22T00:57:56Z	2021-04-22T00:57:56Z	MEMBER	Do we have any ideas on how expensive the MultiIndex creation is as a share of `stack`? It depends, but it can easily be 50% to nearly 100% of the runtime. `stack()` uses `reshape()` on data variables, which is either free (for arrays that are still contiguous and can use views) or can be delayed until compute-time (with dask). In contrast, the MultiIndex is always created eagerly. If we use Fortran order arrays, we can get a rough lower bound on the time for MultiIndex creation, e.g., consider: `python import xarray import numpy as np a = xarray.DataArray(np.ones((5000, 5000), order='F'), dims=['x', 'y']) %prun a.stack(z=['x', 'y'])` Not surprisingly, making the multi-index takes about half the runtime here. Pandas does delay creating the actual hash-table behind a MultiIndex until it's needed, so I guess the main expense here is just allocating the new coordinate arrays.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		864249974