github: issue_comments: 4 rows where issue = 1037894157 sorted by updated

4 rows where issue = 1037894157 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
953379569	https://github.com/pydata/xarray/issues/5902#issuecomment-953379569	https://api.github.com/repos/pydata/xarray/issues/5902	IC_kwDOAMm_X84402rx	TomAugspurger 1312546	2021-10-27T23:19:49Z	2021-10-27T23:19:49Z	MEMBER	Thanks @dcherian, that seems to fix this performance problem. I'll see if the tests pass and will submit a PR. I came across #5582 while searching, thanks :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Slow performance of `DataArray.unstack()` from checking `variable.data` 1037894157
953351408	https://github.com/pydata/xarray/issues/5902#issuecomment-953351408	https://api.github.com/repos/pydata/xarray/issues/5902	IC_kwDOAMm_X8440vzw	dcherian 2448579	2021-10-27T22:16:17Z	2021-10-27T22:18:33Z	MEMBER	(warning: untested code) Instead of looking at all of `self.variables` we could ``` python nonindexes = set(self.variables) - set(self.indexes) or alternatively make a list of multiindex variables names and exclude those then the condition becomes any(is_duck_dask_array(self.variables[v].data) for v in nonindexes) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Slow performance of `DataArray.unstack()` from checking `variable.data` 1037894157
953352129	https://github.com/pydata/xarray/issues/5902#issuecomment-953352129	https://api.github.com/repos/pydata/xarray/issues/5902	IC_kwDOAMm_X8440v_B	dcherian 2448579	2021-10-27T22:17:39Z	2021-10-27T22:17:39Z	MEMBER	PS: It doesn't seem like the bottleneck in your case but #5582 has an alternative proposal for unstacking dask arrays.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Slow performance of `DataArray.unstack()` from checking `variable.data` 1037894157
953344052	https://github.com/pydata/xarray/issues/5902#issuecomment-953344052	https://api.github.com/repos/pydata/xarray/issues/5902	IC_kwDOAMm_X8440uA0	TomAugspurger 1312546	2021-10-27T22:02:58Z	2021-10-27T22:03:35Z	MEMBER	Oh, hmm... I'm noticing now that `IndexVariable` (currently) eagerly loads data into memory, so that check will always be false for the problematic IndexVariable variable. So perhaps a slight adjustment to `is_duck_dask_array` to handle `xarray.Variable` ? ```diff diff --git a/xarray/core/dataset.py b/xarray/core/dataset.py index 550c3587..16637574 100644 --- a/xarray/core/dataset.py +++ b/xarray/core/dataset.py @@ -4159,14 +4159,14 @@ class Dataset(DataWithCoords, DatasetArithmetic, Mapping): # Dask arrays don't support assignment by index, which the fast unstack # function requires. # https://github.com/pydata/xarray/pull/4746#issuecomment-753282125 - any(is_duck_dask_array(v.data) for v in self.variables.values()) + any(is_duck_dask_array(v) for v in self.variables.values()) # Sparse doesn't currently support (though we could special-case # it) # https://github.com/pydata/sparse/issues/422 - or any( - isinstance(v.data, sparse_array_type) - for v in self.variables.values() - ) + # or any( + # isinstance(v.data, sparse_array_type) + # for v in self.variables.values() + # ) or sparse # Until https://github.com/pydata/xarray/pull/4751 is resolved, # we check explicitly whether it's a numpy array. Once that is @@ -4177,9 +4177,9 @@ class Dataset(DataWithCoords, DatasetArithmetic, Mapping): # # or any( # # isinstance(v.data, pint_array_type) for v in self.variables.values() # # ) - or any( - not isinstance(v.data, np.ndarray) for v in self.variables.values() - ) + # or any( + # not isinstance(v.data, np.ndarray) for v in self.variables.values() + # ) ): result = result._unstack_full_reindex(dim, fill_value, sparse) else: diff --git a/xarray/core/pycompat.py b/xarray/core/pycompat.py index d1649235..e9669105 100644 --- a/xarray/core/pycompat.py +++ b/xarray/core/pycompat.py @@ -44,6 +44,12 @@ class DuckArrayModule: def is_duck_dask_array(x): + from xarray.core.variable import IndexVariable, Variable + if isinstance(x, IndexVariable): + return False + elif isinstance(x, Variable): + x = x.data + if DuckArrayModule("dask").available: from dask.base import is_dask_collection ``` That's completely ignoring the accesses to `v.data` for the sparse and pint checks, which don't look quite as easy to solve.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Slow performance of `DataArray.unstack()` from checking `variable.data` 1037894157

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

4 rows where issue = 1037894157 sorted by updated_at descending

or alternatively make a list of multiindex variables names and exclude those

then the condition becomes

Advanced export