home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 868352536

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
868352536 MDU6SXNzdWU4NjgzNTI1MzY= 5219 Zarr encoding attributes persist after slicing data, raising error on `to_zarr` 4801430 open 0     9 2021-04-27T01:34:52Z 2022-12-06T16:16:20Z   CONTRIBUTOR      

What happened: Opened a dataset using open_zarr, sliced the dataset, and then tried to resave to a zarr store using to_zarr.

What you expected to happen: The file would save without needing to explicitly modify any encoding dictionary values

Minimal Complete Verifiable Example:

```python ds = xr.Dataset({"data": (("dimA", ), [10, 20, 30, 40])}, coords={"dimA": [1, 2, 3, 4]}) ds = ds.chunk({"dimA": 2}) ds.to_zarr("test.zarr", consolidated=True, mode="w")

ds2 = xr.open_zarr("test.zarr", consolidated=True).sel(dimA=[1,3]).persist() ds2.to_zarr("test2.zarr", consolidated=True, mode="w") ```

This raises: python NotImplementedError: Specified zarr chunks encoding['chunks']=(2,) for variable named 'data' would overlap multiple dask chunks ((1, 1),). This is not implemented in xarray yet. Consider either rechunking using `chunk()` or instead deleting or modifying `encoding['chunks']`. Anything else we need to know?:

Not sure if there is a good way around this (or perhaps this is even desired behavior?), but figured I would flag it as it seemed unexpected and took us a second to diagnose. Once you've loaded the data from a zarr store, I feel like the default behavior should probably be to forget the encodings used to save that zarr, treating the in-memory dataset object just like any other in-memory dataset object that could have been loaded from any source. But maybe I'm in the minority or missing some nuance about why you'd want the encoding to hang around.

Environment:

``` INSTALLED VERSIONS


commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.89+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4

xarray: 0.17.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.7.1 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.2.2 cfgrib: 0.9.9.0 iris: 3.0.1 bottleneck: 1.3.2 dask: 2021.04.1 distributed: 2021.04.1 matplotlib: 3.4.1 cartopy: 0.19.0 seaborn: 0.11.1 numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: 6.2.3 IPython: 7.22.0 sphinx: 3.5.4 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5219/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 9 rows from issue in issue_comments
Powered by Datasette · Queries took 0.703ms · About: xarray-datasette