home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 2118308210

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2118308210 I_kwDOAMm_X85-QtFy 8707 Weird interaction between aggregation and multiprocessing on DaskArrays 24508496 closed 0     10 2024-02-05T11:35:28Z 2024-04-29T16:20:45Z 2024-04-29T16:20:44Z CONTRIBUTOR      

What happened?

When I try to run a modified version of the example from the dropna documentation (see below), it creates a never terminating process. To reproduce it I added a rolling operation before dropping nans and then run 4 processes using the standard library multiprocessing Pool class on DaskArrays. Running the rolling + dropna in a for loop finishes as expectedly in no time.

What did you expect to happen?

There is nothing obvious to me why this wouldn't just work unless there is a weird interaction between the Dask threads and the different processes. Using Xarray+Dask+Multiprocessing seems to work for me on other functions, it seems to be this particular combination that is problematic.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np from multiprocessing import Pool

datasets = [xr.Dataset( { "temperature": ( ["time", "location"], [[23.4, 24.1], [np.nan if i>1 else 23.4, 22.1 if i<2 else np.nan], [21.8 if i<3 else np.nan, 24.2], [20.5, 25.3]], ) }, coords={"time": [1, 2, 3, 4], "location": ["A", "B"]}, ).chunk(time=2) for i in range(4)]

def process(dataset): return dataset.rolling(dim={'time':2}).sum().dropna(dim="time", how="all").compute()

This works as expected

dropped = [] for dataset in datasets: dropped.append(process(dataset))

This seems to never finish

with Pool(4) as p: dropped = p.map(process, datasets) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

I am still running on 2023.08.0 see below for more details about the environment

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 (main, Jan 25 2024, 20:42:03) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2023.8.0 pandas: 2.1.4 numpy: 1.26.3 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2024.1.1 distributed: 2024.1.1 matplotlib: 3.8.2 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.12.2 cupy: None pint: 0.23 sparse: None flox: 0.9.0 numpy_groupies: 0.10.2 setuptools: 69.0.3 pip: 23.2.1 conda: None pytest: 8.0.0 mypy: None IPython: 8.18.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8707/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 4 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 319.703ms · About: xarray-datasette