home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where issue = 68759727 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • shoyer 3
  • mrocklin 2
  • jhamman 1
  • stale[bot] 1

author_association 2

  • MEMBER 6
  • NONE 1

issue 1

  • Non-aggregating grouped operations on dask arrays are painfully slow to construct · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
459898582 https://github.com/pydata/xarray/issues/392#issuecomment-459898582 https://api.github.com/repos/pydata/xarray/issues/392 MDEyOklzc3VlQ29tbWVudDQ1OTg5ODU4Mg== jhamman 2443309 2019-02-01T23:06:35Z 2019-02-01T23:06:35Z MEMBER

I reran @shoyer's original benchmark. I think this can be closed. Constructing the graphs for the example shown above (on slightly different data) took 84.7 ms and 241 ms respectively. Much better than before.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non-aggregating grouped operations on dask arrays are painfully slow to construct 68759727
459892186 https://github.com/pydata/xarray/issues/392#issuecomment-459892186 https://api.github.com/repos/pydata/xarray/issues/392 MDEyOklzc3VlQ29tbWVudDQ1OTg5MjE4Ng== stale[bot] 26384082 2019-02-01T22:38:31Z 2019-02-01T22:38:31Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non-aggregating grouped operations on dask arrays are painfully slow to construct 68759727
94085685 https://github.com/pydata/xarray/issues/392#issuecomment-94085685 https://api.github.com/repos/pydata/xarray/issues/392 MDEyOklzc3VlQ29tbWVudDk0MDg1Njg1 shoyer 1217238 2015-04-17T22:05:10Z 2015-04-17T22:05:10Z MEMBER

Yeah, like I said in the other issue I don't think this is a blocker (we can add a disclaimer to the docs).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non-aggregating grouped operations on dask arrays are painfully slow to construct 68759727
94084031 https://github.com/pydata/xarray/issues/392#issuecomment-94084031 https://api.github.com/repos/pydata/xarray/issues/392 MDEyOklzc3VlQ29tbWVudDk0MDg0MDMx mrocklin 306380 2015-04-17T21:56:13Z 2015-04-17T21:56:13Z MEMBER

Hrm, not that much faster. It's 25% overhead on the apply. Worth checking out. I might not get to this before release though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non-aggregating grouped operations on dask arrays are painfully slow to construct 68759727
94079544 https://github.com/pydata/xarray/issues/392#issuecomment-94079544 https://api.github.com/repos/pydata/xarray/issues/392 MDEyOklzc3VlQ29tbWVudDk0MDc5NTQ0 shoyer 1217238 2015-04-17T21:31:56Z 2015-04-17T21:31:56Z MEMBER

The good news about that timing info is that dask is still much faster for calculating the graph than doing the actual computation. But it's still not ideal from an interactivity perspective.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non-aggregating grouped operations on dask arrays are painfully slow to construct 68759727
94079391 https://github.com/pydata/xarray/issues/392#issuecomment-94079391 https://api.github.com/repos/pydata/xarray/issues/392 MDEyOklzc3VlQ29tbWVudDk0MDc5Mzkx shoyer 1217238 2015-04-17T21:30:47Z 2015-04-17T21:30:47Z MEMBER

Here's the timing info:

``` %time res = ds.t2m.groupby('time.month').mean('time').sum()

CPU times: user 133 ms, sys: 6.39 ms, total: 140 ms

Wall time: 145 ms

%time res.load_data()

CPU times: user 2min 47s, sys: 1min, total: 3min 48s

Wall time: 1min 19s

%time res = ds.t2m.groupby('time.month').apply(lambda x: x - x.mean()).sum()

CPU times: user 49.1 s, sys: 6.39 s, total: 55.5 s

Wall time: 55.1 s

%time res.load_data()

CPU times: user 6min 17s, sys: 2min 20s, total: 8min 38s

Wall time: 3min 25s

```

Blocks shape is {'latitude': (256,), 'longitude': (512,), 'time': (124, 124, 112, ..., 124, 120, 124)}

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non-aggregating grouped operations on dask arrays are painfully slow to construct 68759727
93532129 https://github.com/pydata/xarray/issues/392#issuecomment-93532129 https://api.github.com/repos/pydata/xarray/issues/392 MDEyOklzc3VlQ29tbWVudDkzNTMyMTI5 mrocklin 306380 2015-04-15T18:58:11Z 2015-04-15T18:58:11Z MEMBER

Interesting. Looks like the overhead of managing the graph is becoming non-trivial.

How long do these operations take to execute? What is your blockdims?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non-aggregating grouped operations on dask arrays are painfully slow to construct 68759727

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1114.776ms · About: xarray-datasette