issue_comments
6 rows where issue = 355264812 and user = 1217238 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
These facets timed out: issue
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
419218306 | https://github.com/pydata/xarray/issues/2389#issuecomment-419218306 | https://api.github.com/repos/pydata/xarray/issues/2389 | MDEyOklzc3VlQ29tbWVudDQxOTIxODMwNg== | shoyer 1217238 | 2018-09-06T19:46:03Z | 2018-09-06T19:46:03Z | MEMBER | Removing the self-references to the dask graphs in #2261 seems to resolve the performance issue on its own. I would be interested if https://github.com/pydata/xarray/pull/2391 still improves performance in any real world yes cases -- perhaps it helps when working with a real cluster or on large datasets? I can't see any difference in my local benchmarks using dask-distributed. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Large pickle overhead in ds.to_netcdf() involving dask.delayed functions 355264812 | |
417380229 | https://github.com/pydata/xarray/issues/2389#issuecomment-417380229 | https://api.github.com/repos/pydata/xarray/issues/2389 | MDEyOklzc3VlQ29tbWVudDQxNzM4MDIyOQ== | shoyer 1217238 | 2018-08-30T16:24:07Z | 2018-08-30T16:24:07Z | MEMBER | OK, so it seems like the complete solution here should involve refactoring our backend classes to avoid any references to objects storing dask graphs. This is a cleaner solution even regardless of the pickle overhead because it allows us to eliminate all state stored in backend classes. I'll get on that in #2261. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Large pickle overhead in ds.to_netcdf() involving dask.delayed functions 355264812 | |
417176707 | https://github.com/pydata/xarray/issues/2389#issuecomment-417176707 | https://api.github.com/repos/pydata/xarray/issues/2389 | MDEyOklzc3VlQ29tbWVudDQxNzE3NjcwNw== | shoyer 1217238 | 2018-08-30T03:18:33Z | 2018-08-30T03:18:33Z | MEMBER | Give https://github.com/pydata/xarray/pull/2391 a try -- in my testing, it speeds up both examples to only take about 3 second each. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Large pickle overhead in ds.to_netcdf() involving dask.delayed functions 355264812 | |
417076301 | https://github.com/pydata/xarray/issues/2389#issuecomment-417076301 | https://api.github.com/repos/pydata/xarray/issues/2389 | MDEyOklzc3VlQ29tbWVudDQxNzA3NjMwMQ== | shoyer 1217238 | 2018-08-29T19:29:56Z | 2018-08-29T19:29:56Z | MEMBER | If I understand the heuristics used by dask's schedulers correctly, a data dependency might actually be a good idea here because it would encourage colocating write tasks on the same machines. We should probably give this a try. On Wed, Aug 29, 2018 at 12:15 PM Matthew Rocklin notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Large pickle overhead in ds.to_netcdf() involving dask.delayed functions 355264812 | |
417066100 | https://github.com/pydata/xarray/issues/2389#issuecomment-417066100 | https://api.github.com/repos/pydata/xarray/issues/2389 | MDEyOklzc3VlQ29tbWVudDQxNzA2NjEwMA== | shoyer 1217238 | 2018-08-29T18:55:39Z | 2018-08-29T18:55:39Z | MEMBER |
This seems plausible to me, though the situation is likely improved with #2261. It would be nice if dask had a way to consolidate the serialization of these objects, rather than separately serializing them in each task. It's not obvious to me how to do that in xarray short of manually building task graphs so those CC @mrocklin in case he has thoughts here |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Large pickle overhead in ds.to_netcdf() involving dask.delayed functions 355264812 | |
417047186 | https://github.com/pydata/xarray/issues/2389#issuecomment-417047186 | https://api.github.com/repos/pydata/xarray/issues/2389 | MDEyOklzc3VlQ29tbWVudDQxNzA0NzE4Ng== | shoyer 1217238 | 2018-08-29T17:59:24Z | 2018-08-29T17:59:24Z | MEMBER | Offhand, I don't know why I'm not super familiar with profiling dask, but it might be worth looking at dask's diagnostics tools (http://dask.pydata.org/en/latest/understanding-performance.html) to understand what's going on here. The appearance of It would also be interesting to see if this changes with the xarray backend refactor from https://github.com/pydata/xarray/pull/2261. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Large pickle overhead in ds.to_netcdf() involving dask.delayed functions 355264812 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1