issues: 2118308210
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2118308210 | I_kwDOAMm_X85-QtFy | 8707 | Weird interaction between aggregation and multiprocessing on DaskArrays | 24508496 | closed | 0 | 10 | 2024-02-05T11:35:28Z | 2024-04-29T16:20:45Z | 2024-04-29T16:20:44Z | CONTRIBUTOR | What happened?When I try to run a modified version of the example from the dropna documentation (see below), it creates a never terminating process. To reproduce it I added a rolling operation before dropping nans and then run 4 processes using the standard library multiprocessing What did you expect to happen?There is nothing obvious to me why this wouldn't just work unless there is a weird interaction between the Dask threads and the different processes. Using Xarray+Dask+Multiprocessing seems to work for me on other functions, it seems to be this particular combination that is problematic. Minimal Complete Verifiable Example```Python import xarray as xr import numpy as np from multiprocessing import Pool datasets = [xr.Dataset( { "temperature": ( ["time", "location"], [[23.4, 24.1], [np.nan if i>1 else 23.4, 22.1 if i<2 else np.nan], [21.8 if i<3 else np.nan, 24.2], [20.5, 25.3]], ) }, coords={"time": [1, 2, 3, 4], "location": ["A", "B"]}, ).chunk(time=2) for i in range(4)] def process(dataset): return dataset.rolling(dim={'time':2}).sum().dropna(dim="time", how="all").compute() This works as expecteddropped = [] for dataset in datasets: dropped.append(process(dataset)) This seems to never finishwith Pool(4) as p: dropped = p.map(process, datasets) ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?I am still running on 2023.08.0 see below for more details about the environment Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 (main, Jan 25 2024, 20:42:03) [GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-124-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2023.8.0
pandas: 2.1.4
numpy: 1.26.3
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: 1.6.3
nc_time_axis: 1.4.1
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2024.1.1
distributed: 2024.1.1
matplotlib: 3.8.2
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2023.12.2
cupy: None
pint: 0.23
sparse: None
flox: 0.9.0
numpy_groupies: 0.10.2
setuptools: 69.0.3
pip: 23.2.1
conda: None
pytest: 8.0.0
mypy: None
IPython: 8.18.1
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8707/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |