html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3781#issuecomment-927141507,https://api.github.com/repos/pydata/xarray/issues/3781,927141507,IC_kwDOAMm_X843Qw6D,488992,2021-09-25T16:01:02Z,2021-09-25T16:02:41Z,CONTRIBUTOR,"I'm currently studying this problem in depth and I noticed that while the threaded scheduler uses a lock that is defined in function of the file name (as the `key`):
https://github.com/pydata/xarray/blob/8d23032ecf20545cd320cfb552d8febef73cd69c/xarray/backends/locks.py#L24-L32
the process-based scheduler throws away the key:
https://github.com/pydata/xarray/blob/8d23032ecf20545cd320cfb552d8febef73cd69c/xarray/backends/locks.py#L35-L39
I'm not sure yet what are the consequences and logical interpretation of that, but I would like to reraise @bcbnz's question above: should this scenario simply raise a `NotImplemented` error because it cannot be supported?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,567678992
https://github.com/pydata/xarray/issues/3781#issuecomment-776160305,https://api.github.com/repos/pydata/xarray/issues/3781,776160305,MDEyOklzc3VlQ29tbWVudDc3NjE2MDMwNQ==,885575,2021-02-09T18:51:13Z,2021-02-09T18:51:13Z,NONE,"@lvankampenhout, I ran into your problem. OP's seems like it's actually in `to_netcdf()`, but I think yours (ours) is in Dask's lazy loading and therefore unrelated.
In short, `ds` will have some Dask arrays whose contents don't actually get loaded until you call `to_netcdf()`. By default, Dask loads in parallel, and the default Dask parallel scheduler chokes when you do your own parallelism on top. In my case, I was able to get around it by doing
```python
ds.load(scheduler='sync')
```
at some point. If it's outside `do_work()`, I think you can skip the `scheduler='sync'` part, but inside `do_work()`, it's required. This bypasses the parallelism in Dask, which is probably what you want if you're doing your own parallelism.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,567678992
https://github.com/pydata/xarray/issues/3781#issuecomment-713165400,https://api.github.com/repos/pydata/xarray/issues/3781,713165400,MDEyOklzc3VlQ29tbWVudDcxMzE2NTQwMA==,630436,2020-10-20T22:01:24Z,2020-10-20T22:01:24Z,NONE,I am also hitting the problem as described by @bcbnz ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,567678992
https://github.com/pydata/xarray/issues/3781#issuecomment-702348129,https://api.github.com/repos/pydata/xarray/issues/3781,702348129,MDEyOklzc3VlQ29tbWVudDcwMjM0ODEyOQ==,7933853,2020-10-01T19:24:48Z,2020-10-01T20:00:27Z,NONE,"I think I ran into a similar problem when combining dask-chunked DataSets (originating from `open_mfdataset`) with Python's native `multiprocessing` package. I get no error message, and the headers of the files are created, but then the script hangs indefinitely. The use case is: combining and resampling of variables into ~1000 different NetCDF files, which I want to distribute over different processes using `multiprocessing`.
**MCVE Code Sample**
```python
import xarray as xr
from multiprocessing import Pool
import os
if (False):
""""""
Load data without using dask
""""""
ds = xr.open_dataset(""http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.1960.nc"")
else:
""""""
Load data using dask
""""""
ds = xr.open_dataset(""http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.1960.nc"", chunks={})
print(ds.nbytes / 1e6, 'MB')
print('chunks', ds.air.chunks) # chunks is empty without dask
outdir = '/glade/scratch/lvank' # change this to some temporary directory on your system
def do_work(n):
print(n)
ds.to_netcdf(os.path.join(outdir, f'{n}.nc'))
tasks = range(10)
with Pool(processes=2) as pool:
pool.map(do_work, tasks)
print('done')
```
**Expected Output**
The NetCDF copies in `outdir` named `0.nc` to `9.nc` should be created for both cases (with and without Dask).
**Problem Description**
In the case with Dask, when the if-statement evaluates to `False`, the files are not created and the program hangs.
**Output of xr.show_versions()**
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1127.13.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.7.3
xarray: 0.16.1
pandas: 1.1.1
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.27.0
distributed: 2.28.0
matplotlib: 3.3.1
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20200925
pip: 20.2.2
conda: None
pytest: None
IPython: 7.18.1
sphinx: None
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,567678992