home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 1037894157 and user = 1312546 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • TomAugspurger · 2 ✖

issue 1

  • Slow performance of `DataArray.unstack()` from checking `variable.data` · 2 ✖

author_association 1

  • MEMBER 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
953379569 https://github.com/pydata/xarray/issues/5902#issuecomment-953379569 https://api.github.com/repos/pydata/xarray/issues/5902 IC_kwDOAMm_X84402rx TomAugspurger 1312546 2021-10-27T23:19:49Z 2021-10-27T23:19:49Z MEMBER

Thanks @dcherian, that seems to fix this performance problem. I'll see if the tests pass and will submit a PR.

I came across #5582 while searching, thanks :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of `DataArray.unstack()` from checking `variable.data` 1037894157
953344052 https://github.com/pydata/xarray/issues/5902#issuecomment-953344052 https://api.github.com/repos/pydata/xarray/issues/5902 IC_kwDOAMm_X8440uA0 TomAugspurger 1312546 2021-10-27T22:02:58Z 2021-10-27T22:03:35Z MEMBER

Oh, hmm... I'm noticing now that IndexVariable (currently) eagerly loads data into memory, so that check will always be false for the problematic IndexVariable variable.

So perhaps a slight adjustment to is_duck_dask_array to handle xarray.Variable ?

```diff diff --git a/xarray/core/dataset.py b/xarray/core/dataset.py index 550c3587..16637574 100644 --- a/xarray/core/dataset.py +++ b/xarray/core/dataset.py @@ -4159,14 +4159,14 @@ class Dataset(DataWithCoords, DatasetArithmetic, Mapping): # Dask arrays don't support assignment by index, which the fast unstack # function requires. # https://github.com/pydata/xarray/pull/4746#issuecomment-753282125 - any(is_duck_dask_array(v.data) for v in self.variables.values()) + any(is_duck_dask_array(v) for v in self.variables.values()) # Sparse doesn't currently support (though we could special-case # it) # https://github.com/pydata/sparse/issues/422 - or any( - isinstance(v.data, sparse_array_type) - for v in self.variables.values() - ) + # or any( + # isinstance(v.data, sparse_array_type) + # for v in self.variables.values() + # ) or sparse # Until https://github.com/pydata/xarray/pull/4751 is resolved, # we check explicitly whether it's a numpy array. Once that is @@ -4177,9 +4177,9 @@ class Dataset(DataWithCoords, DatasetArithmetic, Mapping): # # or any( # # isinstance(v.data, pint_array_type) for v in self.variables.values() # # ) - or any( - not isinstance(v.data, np.ndarray) for v in self.variables.values() - ) + # or any( + # not isinstance(v.data, np.ndarray) for v in self.variables.values() + # ) ): result = result._unstack_full_reindex(dim, fill_value, sparse) else: diff --git a/xarray/core/pycompat.py b/xarray/core/pycompat.py index d1649235..e9669105 100644 --- a/xarray/core/pycompat.py +++ b/xarray/core/pycompat.py @@ -44,6 +44,12 @@ class DuckArrayModule:

def is_duck_dask_array(x): + from xarray.core.variable import IndexVariable, Variable + if isinstance(x, IndexVariable): + return False + elif isinstance(x, Variable): + x = x.data + if DuckArrayModule("dask").available: from dask.base import is_dask_collection ```

That's completely ignoring the accesses to v.data for the sparse and pint checks, which don't look quite as easy to solve.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of `DataArray.unstack()` from checking `variable.data` 1037894157

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 722.29ms · About: xarray-datasette