home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1333650265

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1333650265 I_kwDOAMm_X85PfeNZ 6904 `sel` behaving randomly when applying to a dataset with multiprocessing 12760310 open 0     12 2022-08-09T18:43:06Z 2022-08-10T16:48:53Z   NONE      

What happened?

I have a script structured like this

```python def main(): global ds ds = xr.open_dataset(file) for point in points: compute(point)

def compute(point): ds_point = ds.sel(lat=point['latitude'], lon=point['longitude'], method='nearest') print(ds_point.var.mean()) # do something with ds_point and other data...

if name == "main": main() ```

This works as expected. However, if I try to parallelize compute by calling it with

python process_map(compute, points, max_workers=5, chunksize=1)

The results of the print are completely different from the serial example and they change every time that I run the script. it seems that the sel is giving back a different part of the dataset when there are multiple processes running in parallel.

If I move the open_dataset statement inside compute then everything works also in the parallel case in the same way as in the serial one. Also, if I load the dataset at the beginning, i.e. ds = xr.open_dataset(file).load(), I also have reproducible results.

Is this supposed to happen? I really don't understand how.

What did you expect to happen?

The behaviour of sel should be the same in parallel or serial execution.

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-229.1.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.utf8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 2022.3.0 pandas: 1.2.3 numpy: 1.20.3 scipy: 1.8.1 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.1 cfgrib: None iris: None bottleneck: None dask: 2022.7.1 distributed: 2022.7.1 matplotlib: 3.5.2 cartopy: 0.18.0 seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.19.2 sparse: None setuptools: 59.8.0 pip: 22.2 conda: 4.13.0 pytest: None IPython: 8.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6904/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 12 rows from issue in issue_comments
Powered by Datasette · Queries took 0.458ms · About: xarray-datasette