issues: 1960332384
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1960332384 | I_kwDOAMm_X8502Exg | 8371 | Writing to regions with unaligned chunks can lose data | 5635139 | closed | 0 | 20 | 2023-10-25T01:17:59Z | 2024-03-29T14:35:51Z | 2024-03-29T14:35:51Z | MEMBER | What happened?Writing with I've recreated an example below. While it's unlikely that folks are passing different values to (FWIW, this was fairly painful, and I managed to lose a lot of time by not noticing this, and then not really considering this could happen as I was trying to debug. I think we should really strive to ensure that we don't lose data / incorrectly report that we've successfully written data...) What did you expect to happen?If there's a risk of data loss, raise an error... Minimal Complete Verifiable Example```Python ds = xr.DataArray(np.arange(120).reshape(4,3,-1),dims=list("abc")).rename('var1').to_dataset().chunk(2) ds <xarray.Dataset>Dimensions: (a: 4, b: 3, c: 10)Dimensions without coordinates: a, b, cData variables:var1 (a, b, c) int64 dask.array<chunksize=(2, 2, 2), meta=np.ndarray>def write(ds): ds.chunk(5).to_zarr('foo.zarr', compute=False, mode='w') for r in (range(ds.sizes['a'])): ds.chunk(3).isel(a=[r]).to_zarr('foo.zarr', region=dict(a=slice(r, r+1))) def read(ds): result = xr.open_zarr('foo.zarr') assert result.compute().identical(ds) print(result.chunksizes, ds.chunksizes) write(ds); read(ds) AssertionErrorxr.open_zarr('foo.zarr').compute()['var1'] <xarray.DataArray 'var1' (a: 4, b: 3, c: 10)> array([[[ 0, 0, 0, 3, 4, 5, 0, 0, 0, 9], [ 0, 0, 0, 13, 14, 15, 0, 0, 0, 19], [ 0, 0, 0, 23, 24, 25, 0, 0, 0, 29]],
Dimensions without coordinates: a, b, c ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: ccc8f9987b553809fb6a40c52fa1a8a8095c8c5f
python: 3.9.18 (main, Aug 24 2023, 21:19:58)
[Clang 14.0.3 (clang-1403.0.22.14.1)]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2023.10.2.dev10+gccc8f998
pandas: 2.1.1
numpy: 1.25.2
scipy: 1.11.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.4.0
distributed: 2023.7.1
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: 0.2.3.dev30+gd26e29e
fsspec: 2021.11.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: 0.9.19
setuptools: 68.1.2
pip: 23.2.1
conda: None
pytest: 7.4.0
mypy: 1.6.0
IPython: 8.15.0
sphinx: 4.3.2
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8371/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |