issue_comments
3 rows where author_association = "NONE", issue = 1277437106 and user = 3309802 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Means of zarr arrays cause a memory overload in dask workers · 3 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1165001097 | https://github.com/pydata/xarray/issues/6709#issuecomment-1165001097 | https://api.github.com/repos/pydata/xarray/issues/6709 | IC_kwDOAMm_X85FcIGJ | gjoseph92 3309802 | 2022-06-23T23:15:19Z | 2022-06-23T23:15:19Z | NONE | I took a little bit more of a look at this and I don't think root task overproduction is the (only) problem here. I also feel like intuitively, this operation shouldn't require holding so many root tasks around at once. But the graph dask is making, or how it's ordering it, doesn't seem to work that way. We can see the ordering is pretty bad: When we actually run it (on https://github.com/dask/distributed/pull/6614 with overproduction fixed), you can see that dask requires keeping tons of the input chunks in memory, because they're going to be needed by a future task that isn't able to run yet (because not all of its inputs have been computed): I feel like it's possible that the order in which dask is executing the input tasks is bad? But I more thank that I haven't thought about the problem enough, and there's an obvious reason why the graph is structured like this. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Means of zarr arrays cause a memory overload in dask workers 1277437106 | |
1164690164 | https://github.com/pydata/xarray/issues/6709#issuecomment-1164690164 | https://api.github.com/repos/pydata/xarray/issues/6709 | IC_kwDOAMm_X85Fa8L0 | gjoseph92 3309802 | 2022-06-23T17:37:59Z | 2022-06-23T17:37:59Z | NONE | FYI @robin-cls I would be a bit surprised if there is anything you can do on your end to fix things here with off-the-shelf dask. What @dcherian mentioned in https://github.com/dask/distributed/issues/6360#issuecomment-1129484190 is probably the only thing that might work. Otherwise you'll need to run one my experimental branches. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Means of zarr arrays cause a memory overload in dask workers 1277437106 | |
1164660225 | https://github.com/pydata/xarray/issues/6709#issuecomment-1164660225 | https://api.github.com/repos/pydata/xarray/issues/6709 | IC_kwDOAMm_X85Fa04B | gjoseph92 3309802 | 2022-06-23T17:05:12Z | 2022-06-23T17:05:12Z | NONE | Thanks @dcherian, yeah this is definitely root task overproduction. I think your case is somewhat similar to @TomNicholas's https://github.com/dask/distributed/issues/6571 (that one might even be a little simpler actually). There's some prototyping going on to address this, but I'd say "soon" is probably on the couple month timescale right now FYI. https://github.com/dask/distributed/pull/6598 or https://github.com/dask/distributed/pull/6614 will probably make this work. I'm hopefully going to benchmark these against some real workloads in the next couple days, so I'll probably add yours in. Thanks for the MVCE!
See https://github.com/dask/distributed/issues/6360#issuecomment-1129434333 and the linked issues for why this happens. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Means of zarr arrays cause a memory overload in dask workers 1277437106 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1