id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1038531231,PR_kwDOAMm_X84tzEEk,5906,Avoid accessing slow .data in unstack,1312546,closed,0,,,4,2021-10-28T13:39:36Z,2021-10-29T15:29:39Z,2021-10-29T15:14:43Z,MEMBER,,0,pydata/xarray/pulls/5906,"- [x] Closes https://github.com/pydata/xarray/issues/5902
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [x] New functions/methods are listed in `api.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5906/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1037894157,I_kwDOAMm_X8493QIN,5902,Slow performance of `DataArray.unstack()` from checking `variable.data`,1312546,closed,0,,,4,2021-10-27T21:54:48Z,2021-10-29T15:21:24Z,2021-10-29T15:21:24Z,MEMBER,,,,"**What happened**:

Calling `DataArray.unstack()` spends time allocating an object-dtype NumPy array from values of the pandas MultiIndex.

**What you expected to happen**:

Faster unstack.

**Minimal Complete Verifiable Example**:

```python
import pandas as pd
import numpy as np
import xarray as xr

t = pd.date_range(""2000"", periods=2)
x = np.arange(1000)
y = np.arange(1000)
component = np.arange(4)

idx = pd.MultiIndex.from_product([t, y, x], names=[""time"", ""y"", ""x""])

data = np.random.uniform(size=(len(idx), len(component)))
arr = xr.DataArray(
    data,
    coords={""pixel"": xr.DataArray(idx, name=""pixel"", dims=""pixel""),
            ""component"": xr.DataArray(component, name=""component"", dims=""component"")},
    dims=(""pixel"", ""component"")
)

%time _ = arr.unstack()
CPU times: user 6.33 s, sys: 295 ms, total: 6.62 s
Wall time: 6.62 s
```

**Anything else we need to know?**:

For this example, >99% of the time is spent at on this line: https://github.com/pydata/xarray/blob/df7646182b17d829fe9b2199aebf649ddb2ed480/xarray/core/dataset.py#L4162, specifically on the call to `v.data` for the `pixel` array, which is a pandas MultiIndex.

Just going by the comments, it does seem like accessing `v.data` is necessary to perform the check. I'm wonder if we could make `is_duck_dask_array` a bit smarter, to avoid unnecessarily allocating data?

Alternatively, if that's too difficult, perhaps we could add a flag to `unstack` to disable those checks and just take the ""slow"" path. In my actual use-case, the slow `_unstack_full_reindex` is necessary since I have large Dask Arrays. But even then, the unstack completes in less than 3s, while I was getting OOM killed on the `v.data` checks.

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

```
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.12 | packaged by conda-forge | (default, Sep 29 2021, 19:52:28) 
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-1040-azure
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 0.19.0
pandas: 1.3.3
numpy: 1.20.0
scipy: 1.7.1
netCDF4: 1.5.7
pydap: installed
h5netcdf: 0.11.0
h5py: 3.4.0
Nio: None
zarr: 2.10.1
cftime: 1.5.1
nc_time_axis: 1.3.1
PseudoNetCDF: None
rasterio: 1.2.9
cfgrib: 0.9.9.0
iris: None
bottleneck: 1.3.2
dask: 2021.08.1
distributed: 2021.08.1
matplotlib: 3.4.3
cartopy: 0.20.0
seaborn: 0.11.2
numbagg: None
pint: 0.17
setuptools: 58.0.4
pip: 20.3.4
conda: None
pytest: None
IPython: 7.28.0
sphinx: None

```

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5902/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue