home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 417047186

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2389#issuecomment-417047186 https://api.github.com/repos/pydata/xarray/issues/2389 417047186 MDEyOklzc3VlQ29tbWVudDQxNzA0NzE4Ng== 1217238 2018-08-29T17:59:24Z 2018-08-29T17:59:24Z MEMBER

Offhand, I don't know why dask.delayed should be adding this much overhead. One possibility is that when tasks are pickled (as is done by dask-distributed), the tasks are much larger because the delayed function gets serialized into each task. It does seem like pickling can add a significant amount of overhead in some cases when using xarray with dask for serialization: https://github.com/pangeo-data/pangeo/issues/266

I'm not super familiar with profiling dask, but it might be worth looking at dask's diagnostics tools (http://dask.pydata.org/en/latest/understanding-performance.html) to understand what's going on here. The appearance of _thread.lock in at the top of these profiles is a good indication that we aren't measuring where most of the computation is happening.

It would also be interesting to see if this changes with the xarray backend refactor from https://github.com/pydata/xarray/pull/2261.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  355264812
Powered by Datasette · Queries took 0.816ms · About: xarray-datasette