issues: 567678992
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
567678992 | MDU6SXNzdWU1Njc2Nzg5OTI= | 3781 | to_netcdf() doesn't work with multiprocessing scheduler | 367900 | open | 0 | 4 | 2020-02-19T16:28:22Z | 2021-09-25T16:02:41Z | CONTRIBUTOR | If I create a chunked lazily-computed array, writing it to disk with MCVE Code Sample```python import dask import numpy as np import xarray as xr if name == "main": # Simple worker function. def inner(ds): if sum(ds.dims.values()) == 0: return ds return ds**2
``` Expected OutputComplete netCDF files should be created from all three schedulers. Problem DescriptionThe thread pool and distributed local cluster schedulers result in a complete output. The process pool scheduler fails when trying to write (note that test-process.nc is created with the header and coordinate information, but no actual data is written). The traceback is:
With a bit of editing of the system multiprocessing module I was able to determine that the lock being reported by this exception was the first lock created. I then added a breakpoint to the Lock constructor to get a traceback of what was creating it: | File | Line | Function |----------------------|------|------------------------- | core/dataset.py | 1535 | Dataset.to_netcdf | backends/api.py | 1071 | to_netcdf | backends/netCDF4_.py | 350 | open | backends/locks.py | 114 | get_write_lock | backends/locks.py | 39 | _get_multiprocessing_lock This last function creates the offending multiprocessing.Lock() object. Note that there are six Locks constructed and so its possible that the later-created ones would also cause an issue. The h5netcdf backend has the same problem with Lock. However the SciPy backend gives a NotImplementedError for this:
I'm not sure how simple it would be to get this working with the multiprocessing scheduler, or how vital it is given that the distributed scheduler works. If nothing else, it would be good to get the same NotImplementedError as with the SciPy backend. Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3781/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |