issues: 1333650265

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1333650265	I_kwDOAMm_X85PfeNZ	6904	`sel` behaving randomly when applying to a dataset with multiprocessing	12760310	open	0			12	2022-08-09T18:43:06Z	2022-08-10T16:48:53Z		NONE				What happened? I have a script structured like this ```python def main(): global ds ds = xr.open_dataset(file) for point in points: compute(point) def compute(point): ds_point = ds.sel(lat=point['latitude'], lon=point['longitude'], method='nearest') print(ds_point.var.mean()) # do something with ds_point and other data... if name == "main": main() ``` This works as expected. However, if I try to parallelize `compute` by calling it with `python process_map(compute, points, max_workers=5, chunksize=1)` The results of the print are completely different from the serial example and they change every time that I run the script. it seems that the `sel` is giving back a different part of the dataset when there are multiple processes running in parallel. If I move the `open_dataset` statement inside `compute` then everything works also in the parallel case in the same way as in the serial one. Also, if I load the dataset at the beginning, i.e. `ds = xr.open_dataset(file).load()`, I also have reproducible results. Is this supposed to happen? I really don't understand how. What did you expect to happen? The behaviour of `sel` should be the same in parallel or serial execution. Minimal Complete Verifiable Example No response MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [ ] Complete example — the example is self-contained, including all data and the text of any traceback. [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.8.13 \| packaged by conda-forge \| (default, Mar 25 2022, 06:04:10) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-229.1.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.utf8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 2022.3.0 pandas: 1.2.3 numpy: 1.20.3 scipy: 1.8.1 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.1 cfgrib: None iris: None bottleneck: None dask: 2022.7.1 distributed: 2022.7.1 matplotlib: 3.5.2 cartopy: 0.18.0 seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.19.2 sparse: None setuptools: 59.8.0 pip: 22.2 conda: 4.13.0 pytest: None IPython: 8.4.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6904/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			13221727	issue

Links from other tables

1 row from issues_id in issues_labels
12 rows from issue in issue_comments