issues: 887711474
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
887711474 | MDU6SXNzdWU4ODc3MTE0NzQ= | 5290 | Inconclusive error messages using to_zarr with regions | 5802846 | closed | 0 | 4 | 2021-05-11T15:54:39Z | 2023-11-05T06:28:39Z | 2023-11-05T06:28:39Z | CONTRIBUTOR | What happened:
The idea is to use a xarray dataset (stored as dummy zarr file), which is subsequently filled with the It seems the current implementation is only designed to either store coordinates for the whole dataset and write them to disk or to write without coordinates. I failed to understand this from the documentation and tried to create a dataset without coordinates and fill it with a dataset subset with coordinates. It gave some inconclusive errors depending on the actual code example (see below).
It might also be a bug and it should in fact be possible to add a dataset with coordinates to a dummy dataset without coordinates. Then there seems to be an issue regarding the handling of the variables during storing the region. ... or I might just have done it wrong... and I'm looking forward to suggestions. What you expected to happen: Either an error message telling me that that i should use coordinates during creation of the dummy dataset. Alternatively, if this is a bug and should be possible then it should just work. Minimal Complete Verifiable Example: ```python import dask.array import xarray as xr import numpy as np error = 1 # choose between 0 (no error), 1, 2, 3 dummies = dask.array.zeros(30, chunks=10) chunks in coords are not taken into account while saving!?coord_x = dask.array.zeros(30, chunks=10) # or coord_x = np.zeros((30,)) if error == 0: ds = xr.Dataset({"foo": ("x", dummies)}, coords={"x":coord_x}) else: ds = xr.Dataset({"foo": ("x", dummies)}) print(ds) path = "./tmp/test.zarr" ds.to_zarr(path, mode='w', compute=False, consolidated=True) create a new dataset to be input into a regionds = xr.Dataset({"foo": ('x', np.arange(10))},coords={"x":np.arange(10)}) if error == 1: ds.to_zarr(path, region={"x": slice(10, 20)}) # ValueError: parameter 'value': expected array with shape (0,), got (10,) elif error == 2: ds.to_zarr(path, region={"x": slice(0, 10)}) ds.to_zarr(path, region={"x": slice(10, 20)}) # ValueError: conflicting sizes for dimension 'x': length 10 on 'x' and length 30 on 'foo' elif error == 3: ds.to_zarr(path, region={"x": slice(0, 10)}) ds = xr.Dataset({"foo": ('x', np.arange(10))},coords={"x":np.arange(10)}) ds.to_zarr(path, region={"x": slice(10, 20)}) # ValueError: parameter 'value': expected array with shape (0,), got (10,) else: ds.to_zarr(path, region={"x": slice(10, 20)}) ds = xr.open_zarr(path) print('reopen',ds['x']) ``` Anything else we need to know?: Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.0-16-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.18.0 pandas: 1.2.3 numpy: 1.19.2 scipy: 1.6.2 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.1 cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.04.0 distributed: None matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: None sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5290/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |