html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/5902#issuecomment-953379569,https://api.github.com/repos/pydata/xarray/issues/5902,953379569,IC_kwDOAMm_X84402rx,1312546,2021-10-27T23:19:49Z,2021-10-27T23:19:49Z,MEMBER,"Thanks @dcherian, that seems to fix this performance problem. I'll see if the tests pass and will submit a PR.
I came across #5582 while searching, thanks :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1037894157
https://github.com/pydata/xarray/issues/5902#issuecomment-953351408,https://api.github.com/repos/pydata/xarray/issues/5902,953351408,IC_kwDOAMm_X8440vzw,2448579,2021-10-27T22:16:17Z,2021-10-27T22:18:33Z,MEMBER,"(warning: untested code)
Instead of looking at all of `self.variables` we could
``` python
nonindexes = set(self.variables) - set(self.indexes)
# or alternatively make a list of multiindex variables names and exclude those
# then the condition becomes
any(is_duck_dask_array(self.variables[v].data) for v in nonindexes)
``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1037894157
https://github.com/pydata/xarray/issues/5902#issuecomment-953352129,https://api.github.com/repos/pydata/xarray/issues/5902,953352129,IC_kwDOAMm_X8440v_B,2448579,2021-10-27T22:17:39Z,2021-10-27T22:17:39Z,MEMBER,PS: It doesn't seem like the bottleneck in your case but #5582 has an alternative proposal for unstacking dask arrays.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1037894157
https://github.com/pydata/xarray/issues/5902#issuecomment-953344052,https://api.github.com/repos/pydata/xarray/issues/5902,953344052,IC_kwDOAMm_X8440uA0,1312546,2021-10-27T22:02:58Z,2021-10-27T22:03:35Z,MEMBER,"Oh, hmm... I'm noticing now that `IndexVariable` (currently) eagerly loads data into memory, so that check will *always* be false for the problematic IndexVariable variable.
So perhaps a slight adjustment to `is_duck_dask_array` to handle `xarray.Variable` ?
```diff
diff --git a/xarray/core/dataset.py b/xarray/core/dataset.py
index 550c3587..16637574 100644
--- a/xarray/core/dataset.py
+++ b/xarray/core/dataset.py
@@ -4159,14 +4159,14 @@ class Dataset(DataWithCoords, DatasetArithmetic, Mapping):
# Dask arrays don't support assignment by index, which the fast unstack
# function requires.
# https://github.com/pydata/xarray/pull/4746#issuecomment-753282125
- any(is_duck_dask_array(v.data) for v in self.variables.values())
+ any(is_duck_dask_array(v) for v in self.variables.values())
# Sparse doesn't currently support (though we could special-case
# it)
# https://github.com/pydata/sparse/issues/422
- or any(
- isinstance(v.data, sparse_array_type)
- for v in self.variables.values()
- )
+ # or any(
+ # isinstance(v.data, sparse_array_type)
+ # for v in self.variables.values()
+ # )
or sparse
# Until https://github.com/pydata/xarray/pull/4751 is resolved,
# we check explicitly whether it's a numpy array. Once that is
@@ -4177,9 +4177,9 @@ class Dataset(DataWithCoords, DatasetArithmetic, Mapping):
# # or any(
# # isinstance(v.data, pint_array_type) for v in self.variables.values()
# # )
- or any(
- not isinstance(v.data, np.ndarray) for v in self.variables.values()
- )
+ # or any(
+ # not isinstance(v.data, np.ndarray) for v in self.variables.values()
+ # )
):
result = result._unstack_full_reindex(dim, fill_value, sparse)
else:
diff --git a/xarray/core/pycompat.py b/xarray/core/pycompat.py
index d1649235..e9669105 100644
--- a/xarray/core/pycompat.py
+++ b/xarray/core/pycompat.py
@@ -44,6 +44,12 @@ class DuckArrayModule:
def is_duck_dask_array(x):
+ from xarray.core.variable import IndexVariable, Variable
+ if isinstance(x, IndexVariable):
+ return False
+ elif isinstance(x, Variable):
+ x = x.data
+
if DuckArrayModule(""dask"").available:
from dask.base import is_dask_collection
```
That's completely ignoring the accesses to `v.data` for the sparse and pint checks, which don't look quite as easy to solve.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1037894157