issues
1 row where repo = 13221727 and user = 157591329 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2206243581 | I_kwDOAMm_X86DgJr9 | 8876 | Possible race condition when appending to an existing zarr | rsemlal-murmuration 157591329 | closed | 0 | 4 | 2024-03-25T16:59:52Z | 2024-04-03T15:23:14Z | 2024-03-29T14:35:52Z | NONE | What happened?When appending to an existing zarr along a dimension ( What did you expect to happen?We would expected that zarr append to have the same behaviour as if we concatenate dataset in memory (using Minimal Complete Verifiable Example```Python from distributed import Client, LocalCluster import xarray as xr import tempfile ds1 = xr.Dataset({"a": ("x", [1., 1.])}, coords={'x': [1, 2]}).chunk({"x": 3}) ds2 = xr.Dataset({"a": ("x", [1., 1., 1., 1.])}, coords={'x': [3, 4, 5, 6]}).chunk({"x": 3}) with Client(LocalCluster(processes=False, n_workers=1, threads_per_worker=2)): # The issue happens only when: threads_per_worker > 1 for i in range(0, 100): with tempfile.TemporaryDirectory() as store: print(store) ds1.to_zarr(store, mode="w") # write first dataset ds2.to_zarr(store, mode="a", append_dim="x") # append first dataset
``` MVCE confirmation
Relevant log output
Anything else we need to know?The example code snippet provided here, reproduces the issue. Since the issue occurs randomly, we loop in the example for a few times and stop when the issue occurs. In the example, when Side note: This behaviour in itself is not problematic in this case, but the fact that the chunking is silently changed made this issue harder to spot. However, when we try to append the second dataset Zarr chunks:
+ chunk1 : Dask chunks for Both dask chunks A and B, are supposed to write on zarr chunk3
And depending on who writes first, we can end up with NaN on The issue obviously happens only when dask tasks are run in parallel.
Using We couldn't figure out from the documentation how to detect this kind of issues, and how to prevent them from happening (maybe using a synchronizer?) Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
python-bits: 64
OS: Linux
OS-release: 5.15.133.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2024.2.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: 2.17.1
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.8
dask: 2024.3.1
distributed: 2024.3.1
matplotlib: 3.8.3
cartopy: None
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.3.1
cupy: None
pint: None
sparse: None
flox: 0.9.5
numpy_groupies: 0.10.2
setuptools: 69.2.0
pip: 24.0
conda: None
pytest: 8.1.1
mypy: None
IPython: 8.22.2
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8876/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);