html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/5902#issuecomment-953379569,https://api.github.com/repos/pydata/xarray/issues/5902,953379569,IC_kwDOAMm_X84402rx,1312546,2021-10-27T23:19:49Z,2021-10-27T23:19:49Z,MEMBER,"Thanks @dcherian, that seems to fix this performance problem. I'll see if the tests pass and will submit a PR. I came across #5582 while searching, thanks :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1037894157 https://github.com/pydata/xarray/issues/5902#issuecomment-953351408,https://api.github.com/repos/pydata/xarray/issues/5902,953351408,IC_kwDOAMm_X8440vzw,2448579,2021-10-27T22:16:17Z,2021-10-27T22:18:33Z,MEMBER,"(warning: untested code) Instead of looking at all of `self.variables` we could ``` python nonindexes = set(self.variables) - set(self.indexes) # or alternatively make a list of multiindex variables names and exclude those # then the condition becomes any(is_duck_dask_array(self.variables[v].data) for v in nonindexes) ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1037894157 https://github.com/pydata/xarray/issues/5902#issuecomment-953352129,https://api.github.com/repos/pydata/xarray/issues/5902,953352129,IC_kwDOAMm_X8440v_B,2448579,2021-10-27T22:17:39Z,2021-10-27T22:17:39Z,MEMBER,PS: It doesn't seem like the bottleneck in your case but #5582 has an alternative proposal for unstacking dask arrays.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1037894157 https://github.com/pydata/xarray/issues/5902#issuecomment-953344052,https://api.github.com/repos/pydata/xarray/issues/5902,953344052,IC_kwDOAMm_X8440uA0,1312546,2021-10-27T22:02:58Z,2021-10-27T22:03:35Z,MEMBER,"Oh, hmm... I'm noticing now that `IndexVariable` (currently) eagerly loads data into memory, so that check will *always* be false for the problematic IndexVariable variable. So perhaps a slight adjustment to `is_duck_dask_array` to handle `xarray.Variable` ? ```diff diff --git a/xarray/core/dataset.py b/xarray/core/dataset.py index 550c3587..16637574 100644 --- a/xarray/core/dataset.py +++ b/xarray/core/dataset.py @@ -4159,14 +4159,14 @@ class Dataset(DataWithCoords, DatasetArithmetic, Mapping): # Dask arrays don't support assignment by index, which the fast unstack # function requires. # https://github.com/pydata/xarray/pull/4746#issuecomment-753282125 - any(is_duck_dask_array(v.data) for v in self.variables.values()) + any(is_duck_dask_array(v) for v in self.variables.values()) # Sparse doesn't currently support (though we could special-case # it) # https://github.com/pydata/sparse/issues/422 - or any( - isinstance(v.data, sparse_array_type) - for v in self.variables.values() - ) + # or any( + # isinstance(v.data, sparse_array_type) + # for v in self.variables.values() + # ) or sparse # Until https://github.com/pydata/xarray/pull/4751 is resolved, # we check explicitly whether it's a numpy array. Once that is @@ -4177,9 +4177,9 @@ class Dataset(DataWithCoords, DatasetArithmetic, Mapping): # # or any( # # isinstance(v.data, pint_array_type) for v in self.variables.values() # # ) - or any( - not isinstance(v.data, np.ndarray) for v in self.variables.values() - ) + # or any( + # not isinstance(v.data, np.ndarray) for v in self.variables.values() + # ) ): result = result._unstack_full_reindex(dim, fill_value, sparse) else: diff --git a/xarray/core/pycompat.py b/xarray/core/pycompat.py index d1649235..e9669105 100644 --- a/xarray/core/pycompat.py +++ b/xarray/core/pycompat.py @@ -44,6 +44,12 @@ class DuckArrayModule: def is_duck_dask_array(x): + from xarray.core.variable import IndexVariable, Variable + if isinstance(x, IndexVariable): + return False + elif isinstance(x, Variable): + x = x.data + if DuckArrayModule(""dask"").available: from dask.base import is_dask_collection ``` That's completely ignoring the accesses to `v.data` for the sparse and pint checks, which don't look quite as easy to solve.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1037894157