home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1960332384

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1960332384 I_kwDOAMm_X8502Exg 8371 Writing to regions with unaligned chunks can lose data 5635139 closed 0     20 2023-10-25T01:17:59Z 2024-03-29T14:35:51Z 2024-03-29T14:35:51Z MEMBER      

What happened?

Writing with region with chunks that aren't aligned can lose data.

I've recreated an example below. While it's unlikely that folks are passing different values to .chunk for the template vs. the regions, I had an "auto" chunk, which can then set different chunk values.

(FWIW, this was fairly painful, and I managed to lose a lot of time by not noticing this, and then not really considering this could happen as I was trying to debug. I think we should really strive to ensure that we don't lose data / incorrectly report that we've successfully written data...)

What did you expect to happen?

If there's a risk of data loss, raise an error...

Minimal Complete Verifiable Example

```Python ds = xr.DataArray(np.arange(120).reshape(4,3,-1),dims=list("abc")).rename('var1').to_dataset().chunk(2)

ds

<xarray.Dataset>

Dimensions: (a: 4, b: 3, c: 10)

Dimensions without coordinates: a, b, c

Data variables:

var1 (a, b, c) int64 dask.array<chunksize=(2, 2, 2), meta=np.ndarray>

def write(ds): ds.chunk(5).to_zarr('foo.zarr', compute=False, mode='w') for r in (range(ds.sizes['a'])): ds.chunk(3).isel(a=[r]).to_zarr('foo.zarr', region=dict(a=slice(r, r+1)))

def read(ds): result = xr.open_zarr('foo.zarr') assert result.compute().identical(ds) print(result.chunksizes, ds.chunksizes)

write(ds); read(ds)

AssertionError

xr.open_zarr('foo.zarr').compute()['var1']

<xarray.DataArray 'var1' (a: 4, b: 3, c: 10)> array([[[ 0, 0, 0, 3, 4, 5, 0, 0, 0, 9], [ 0, 0, 0, 13, 14, 15, 0, 0, 0, 19], [ 0, 0, 0, 23, 24, 25, 0, 0, 0, 29]],

   [[ 30,  31,  32,   0,   0,  35,  36,  37,  38,   0],
    [ 40,  41,  42,   0,   0,  45,  46,  47,  48,   0],
    [ 50,  51,  52,   0,   0,  55,  56,  57,  58,   0]],

   [[ 60,  61,  62,   0,   0,  65,   0,   0,   0,  69],
    [ 70,  71,  72,   0,   0,  75,   0,   0,   0,  79],
    [ 80,  81,  82,   0,   0,  85,   0,   0,   0,  89]],

   [[  0,   0,   0,  93,  94,  95,  96,  97,  98,   0],
    [  0,   0,   0, 103, 104, 105, 106, 107, 108,   0],
    [  0,   0,   0, 113, 114, 115, 116, 117, 118,   0]]])

Dimensions without coordinates: a, b, c ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: ccc8f9987b553809fb6a40c52fa1a8a8095c8c5f python: 3.9.18 (main, Aug 24 2023, 21:19:58) [Clang 14.0.3 (clang-1403.0.22.14.1)] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.10.2.dev10+gccc8f998 pandas: 2.1.1 numpy: 1.25.2 scipy: 1.11.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.0 cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.4.0 distributed: 2023.7.1 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: 0.2.3.dev30+gd26e29e fsspec: 2021.11.1 cupy: None pint: None sparse: None flox: None numpy_groupies: 0.9.19 setuptools: 68.1.2 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: 1.6.0 IPython: 8.15.0 sphinx: 4.3.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8371/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 557.252ms · About: xarray-datasette