issues: 2206243581
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2206243581 | I_kwDOAMm_X86DgJr9 | 8876 | Possible race condition when appending to an existing zarr | 157591329 | closed | 0 | 4 | 2024-03-25T16:59:52Z | 2024-04-03T15:23:14Z | 2024-03-29T14:35:52Z | NONE | What happened?When appending to an existing zarr along a dimension ( What did you expect to happen?We would expected that zarr append to have the same behaviour as if we concatenate dataset in memory (using Minimal Complete Verifiable Example```Python from distributed import Client, LocalCluster import xarray as xr import tempfile ds1 = xr.Dataset({"a": ("x", [1., 1.])}, coords={'x': [1, 2]}).chunk({"x": 3}) ds2 = xr.Dataset({"a": ("x", [1., 1., 1., 1.])}, coords={'x': [3, 4, 5, 6]}).chunk({"x": 3}) with Client(LocalCluster(processes=False, n_workers=1, threads_per_worker=2)): # The issue happens only when: threads_per_worker > 1 for i in range(0, 100): with tempfile.TemporaryDirectory() as store: print(store) ds1.to_zarr(store, mode="w") # write first dataset ds2.to_zarr(store, mode="a", append_dim="x") # append first dataset
``` MVCE confirmation
Relevant log output
Anything else we need to know?The example code snippet provided here, reproduces the issue. Since the issue occurs randomly, we loop in the example for a few times and stop when the issue occurs. In the example, when Side note: This behaviour in itself is not problematic in this case, but the fact that the chunking is silently changed made this issue harder to spot. However, when we try to append the second dataset Zarr chunks:
+ chunk1 : Dask chunks for Both dask chunks A and B, are supposed to write on zarr chunk3
And depending on who writes first, we can end up with NaN on The issue obviously happens only when dask tasks are run in parallel.
Using We couldn't figure out from the documentation how to detect this kind of issues, and how to prevent them from happening (maybe using a synchronizer?) Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
python-bits: 64
OS: Linux
OS-release: 5.15.133.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2024.2.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: 2.17.1
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.8
dask: 2024.3.1
distributed: 2024.3.1
matplotlib: 3.8.3
cartopy: None
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.3.1
cupy: None
pint: None
sparse: None
flox: 0.9.5
numpy_groupies: 0.10.2
setuptools: 69.2.0
pip: 24.0
conda: None
pytest: 8.1.1
mypy: None
IPython: 8.22.2
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8876/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |