html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1464#issuecomment-350113818,https://api.github.com/repos/pydata/xarray/issues/1464,350113818,MDEyOklzc3VlQ29tbWVudDM1MDExMzgxOA==,2443309,2017-12-07T22:27:08Z,2017-12-07T22:27:08Z,MEMBER,"> The place to start is probably to write an integration test for this functionality. I notice now that our current tests only check reading netCDF files with dask-distributed:
We should probably also write some tests for saving datasets with `save_mfdataset` and distributed.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,238284894
https://github.com/pydata/xarray/issues/1464#issuecomment-341329662,https://api.github.com/repos/pydata/xarray/issues/1464,341329662,MDEyOklzc3VlQ29tbWVudDM0MTMyOTY2Mg==,1217238,2017-11-02T06:29:38Z,2017-11-02T06:29:38Z,MEMBER,"I did a little bit of digging here, using @mrocklin's `Client(processes=False)` trick.
The problem seems to be that the arrays that we add to the writer in `AbstractWritableDataStore.set_variables` are not pickleable. To be more concrete, consider these lines:
https://github.com/pydata/xarray/blob/f83361c76b6aa8cdba8923080bb6b98560cf3a96/xarray/backends/common.py#L221-L232
`target` is currently a `netCDF4.Variable` object (or whatever the appropriate backend type is). Anything added to the writer eventually ends up as an argument to `dask.array.store` and hence gets put into the dask graph. When dask-distributed tries to pickle the dask graph, it fails on the `netCDF4.Variable`.
What we need to instead is wrap these `target` arrays in appropriate array wrappers, e.g., `NetCDF4ArrayWrapper`, adding `__setitem__` methods to the array wrappers if needed. Unlike most backend array types, our array wrappers are pickleable, which is essentially for use with dask-distributed.
If anyone's curious, here's the traceback and code I used to debug this:
https://gist.github.com/shoyer/4564971a4d030cd43bba8241d3b36c73","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,238284894
https://github.com/pydata/xarray/issues/1464#issuecomment-311122109,https://api.github.com/repos/pydata/xarray/issues/1464,311122109,MDEyOklzc3VlQ29tbWVudDMxMTEyMjEwOQ==,1217238,2017-06-26T17:10:07Z,2017-06-26T17:10:07Z,MEMBER,"I'm a little surprised that this doesn't work because I thought we made all our xarray datastore object pickle-able.
The place to start is probably to write an integration test for this functionality. I notice now that our current tests only check *reading* netCDF files with dask-distributed:
https://github.com/pydata/xarray/blob/master/xarray/tests/test_distributed.py","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,238284894
https://github.com/pydata/xarray/issues/1464#issuecomment-311114129,https://api.github.com/repos/pydata/xarray/issues/1464,311114129,MDEyOklzc3VlQ29tbWVudDMxMTExNDEyOQ==,306380,2017-06-26T16:39:24Z,2017-06-26T16:39:24Z,MEMBER,"Presumably there is some object in the task graph that we don't know how to serialize. This can be fixed either in XArray, by not including such an object but recreating it each time or wrapping it, or in Dask, by learning how to (de)serialize it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,238284894
https://github.com/pydata/xarray/issues/1464#issuecomment-310943792,https://api.github.com/repos/pydata/xarray/issues/1464,310943792,MDEyOklzc3VlQ29tbWVudDMxMDk0Mzc5Mg==,6628425,2017-06-26T01:38:17Z,2017-06-26T01:38:17Z,MEMBER,@shoyer @mrocklin thanks for your quick responses; I can confirm that both the workarounds you suggested work in my case.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,238284894
https://github.com/pydata/xarray/issues/1464#issuecomment-310817771,https://api.github.com/repos/pydata/xarray/issues/1464,310817771,MDEyOklzc3VlQ29tbWVudDMxMDgxNzc3MQ==,306380,2017-06-24T06:17:52Z,2017-06-24T06:17:52Z,MEMBER,"It's failing to serialize *something* in the task graph, I'm not sure what (I'm also surprised that the except clause didn't trigger and log the input). My first guess is that there is an open netcdf file object floating around within the task graph. If so then we should endeavor to avoid doing this (or have some file object proxy that *is* (de)serializable.
As a short-term workaround you might try starting a local cluster within the same process.
client = Client(processes=False)
This *might* help you to avoid serialization issues. Generally we should resolve the issue regardless though.
cc'ing @rabernat, who seems to have the most experience here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,238284894
https://github.com/pydata/xarray/issues/1464#issuecomment-310817117,https://api.github.com/repos/pydata/xarray/issues/1464,310817117,MDEyOklzc3VlQ29tbWVudDMxMDgxNzExNw==,1217238,2017-06-24T06:05:09Z,2017-06-24T06:05:09Z,MEMBER,"Hmm. Can you try using scipy as an engine to write the netcdf file?
Honestly I've barely used dask distributed. Possibly @mrocklin has ideas.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,238284894