html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/5202#issuecomment-935683056,https://api.github.com/repos/pydata/xarray/issues/5202,935683056,IC_kwDOAMm_X843xWPw,4160723,2021-10-06T07:50:59Z,2021-10-06T07:50:59Z,MEMBER,"From https://github.com/pydata/xarray/pull/5692#issuecomment-925718593: > One change is that a multi-index is not always created with stack. It is created only if each of the dimensions to stack together have one and only one coordinate with a pandas index (this could be a non-dimension coordinate). > This could maybe address #5202, since we could simply drop the indexes before stacking the dimensions in order to avoid the creation of a multi-index. I don't think it's a big breaking change either unless there are users who rely on default multi-indexes with range (0, 1, 2...) levels. Looking at #5202, however, those default multi-indexes seem more problematic than something really useful, but I might be wrong here. Also, range-based indexes can still be created explicitly before stacking the dimensions if needed. > Another consequence is that stack is not always reversible, since unstack still requires a pandas multi-index (one and only one multi-index per dimension to unstack). cc @pydata/xarray as this is an improvement regarding this issue but also a sensible change. To ensure a smoother transition we could maybe add a `create_index` option to `stack` which accepts these values: - `True`: always create a multi-index - `False`: never create a multi-index - `None`: create a multi-index only if we can unambiguously pick one index for each of the dimensions to stack We can default to `True` now to avoid breaking changes and maybe later default to `None`. If we eventually add support for custom (non-pandas backed) indexes, we could also allow passing an `xarray.Index` class.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,864249974 https://github.com/pydata/xarray/issues/5202#issuecomment-856296662,https://api.github.com/repos/pydata/xarray/issues/5202,856296662,MDEyOklzc3VlQ29tbWVudDg1NjI5NjY2Mg==,4160723,2021-06-07T22:10:15Z,2021-06-07T22:10:15Z,MEMBER,"> it seems like it could be a good idea to allow stack to skip creating a MultiIndex for the new dimension, via a new keyword argument such as `ds.stack(index=False)` `Dataset.stack` might eventually accept any custom index (that supports it) if that makes sense. Would `index=None` be slightly better than `index=False` in that case? (considering that the default value would be `index=PandasMultiIndex` or something like that).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,864249974 https://github.com/pydata/xarray/issues/5202#issuecomment-825494167,https://api.github.com/repos/pydata/xarray/issues/5202,825494167,MDEyOklzc3VlQ29tbWVudDgyNTQ5NDE2Nw==,5635139,2021-04-23T08:30:55Z,2021-04-23T08:30:55Z,MEMBER,"Great, this seems like a good idea — at the very least an `index=False` option","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,864249974 https://github.com/pydata/xarray/issues/5202#issuecomment-824459878,https://api.github.com/repos/pydata/xarray/issues/5202,824459878,MDEyOklzc3VlQ29tbWVudDgyNDQ1OTg3OA==,1217238,2021-04-22T00:57:56Z,2021-04-22T00:57:56Z,MEMBER,"> Do we have any ideas on how expensive the MultiIndex creation is as a share of `stack`? It depends, but it can easily be 50% to nearly 100% of the runtime. `stack()` uses `reshape()` on data variables, which is either free (for arrays that are still contiguous and can use views) or can be delayed until compute-time (with dask). In contrast, the MultiIndex is always created eagerly. If we use Fortran order arrays, we can get a rough lower bound on the time for MultiIndex creation, e.g., consider: ```python import xarray import numpy as np a = xarray.DataArray(np.ones((5000, 5000), order='F'), dims=['x', 'y']) %prun a.stack(z=['x', 'y']) ``` Not surprisingly, making the multi-index takes about half the runtime here. Pandas does delay creating the actual hash-table behind a MultiIndex until it's needed, so I guess the main expense here is just allocating the new coordinate arrays.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,864249974 https://github.com/pydata/xarray/issues/5202#issuecomment-824388578,https://api.github.com/repos/pydata/xarray/issues/5202,824388578,MDEyOklzc3VlQ29tbWVudDgyNDM4ODU3OA==,5635139,2021-04-21T22:05:53Z,2021-04-21T22:05:53Z,MEMBER,Do we have any ideas on how expensive the MultiIndex creation is as a share of `stack`?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,864249974