home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 417252006

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2389#issuecomment-417252006 https://api.github.com/repos/pydata/xarray/issues/2389 417252006 MDEyOklzc3VlQ29tbWVudDQxNzI1MjAwNg== 1882397 2018-08-30T09:23:20Z 2018-08-30T09:48:40Z NONE

It seems the xarray object that is sent to the workers contains a reference to the complete graph:

```python vals = da.random.random((5, 1), chunks=(1, 1)) ds = xr.Dataset({'vals': (['a', 'b'], vals)}) write = ds.to_netcdf('file2.nc', compute=False)

key = [val for val in write.dask.keys() if isinstance(val, str) and val.startswith('NetCDF')][0] wrapper = write.dask[key] len(pickle.dumps(wrapper))

14652

delayed_store = wrapper.datastore.delayed_store len(pickle.dumps(delayed_store))

14652

dask.visualize(delayed_store) ```

The size jumps to the 1.3MB if I use 500 chunks again.

The warning about the large object in the graph disappears if we delete that reference before we execute the graph: key = [val for val in write.dask.keys() if isinstance(val,str) and val.startswith('NetCDF')][0] wrapper = write.dask[key] del wrapper.datastore.delayed_store It doesn't to change the runtime though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  355264812
Powered by Datasette · Queries took 5.649ms · About: xarray-datasette