issues: 1037894157

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1037894157	I_kwDOAMm_X8493QIN	5902	Slow performance of `DataArray.unstack()` from checking `variable.data`	1312546	closed	0			4	2021-10-27T21:54:48Z	2021-10-29T15:21:24Z	2021-10-29T15:21:24Z	MEMBER				What happened: Calling `DataArray.unstack()` spends time allocating an object-dtype NumPy array from values of the pandas MultiIndex. What you expected to happen: Faster unstack. Minimal Complete Verifiable Example: ```python import pandas as pd import numpy as np import xarray as xr t = pd.date_range("2000", periods=2) x = np.arange(1000) y = np.arange(1000) component = np.arange(4) idx = pd.MultiIndex.from_product([t, y, x], names=["time", "y", "x"]) data = np.random.uniform(size=(len(idx), len(component))) arr = xr.DataArray( data, coords={"pixel": xr.DataArray(idx, name="pixel", dims="pixel"), "component": xr.DataArray(component, name="component", dims="component")}, dims=("pixel", "component") ) %time _ = arr.unstack() CPU times: user 6.33 s, sys: 295 ms, total: 6.62 s Wall time: 6.62 s ``` Anything else we need to know?: For this example, >99% of the time is spent at on this line: https://github.com/pydata/xarray/blob/df7646182b17d829fe9b2199aebf649ddb2ed480/xarray/core/dataset.py#L4162, specifically on the call to `v.data` for the `pixel` array, which is a pandas MultiIndex. Just going by the comments, it does seem like accessing `v.data` is necessary to perform the check. I'm wonder if we could make `is_duck_dask_array` a bit smarter, to avoid unnecessarily allocating data? Alternatively, if that's too difficult, perhaps we could add a flag to `unstack` to disable those checks and just take the "slow" path. In my actual use-case, the slow `_unstack_full_reindex` is necessary since I have large Dask Arrays. But even then, the unstack completes in less than 3s, while I was getting OOM killed on the `v.data` checks. Environment: Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.12 \| packaged by conda-forge \| (default, Sep 29 2021, 19:52:28) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.0-1040-azure machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.19.0 pandas: 1.3.3 numpy: 1.20.0 scipy: 1.7.1 netCDF4: 1.5.7 pydap: installed h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.1 cftime: 1.5.1 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: 1.2.9 cfgrib: 0.9.9.0 iris: None bottleneck: 1.3.2 dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.4.3 cartopy: 0.20.0 seaborn: 0.11.2 numbagg: None pint: 0.17 setuptools: 58.0.4 pip: 20.3.4 conda: None pytest: None IPython: 7.28.0 sphinx: None ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5902/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

2 rows from issues_id in issues_labels
4 rows from issue in issue_comments