home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1966264258

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1966264258 I_kwDOAMm_X851Ms_C 8385 The method to_netcdf does not preserve chunks 40218891 open 0     3 2023-10-27T22:29:45Z 2023-10-31T18:51:45Z   NONE      

What happened?

Methods to_zarr and to_netcdf behave inconsistently for chunked dataset. The latter does not preserve existing chunk information, the chunks must be specified within the encoding dictionary.

What did you expect to happen?

I expected the behaviour to be consistent for for all to_XXX() methods.

Minimal Complete Verifiable Example

```Python import xarray as xr import dask.array as da

rng = da.random.RandomState() shape = (20, 20) chunks = [10, 10] dims = ["x", "y"] z = rng.standard_normal(shape, chunks=chunks) ds = xr.DataArray(z, dims=dims, name="z").to_dataset() ds.chunks

This one is rechunked

ds.to_netcdf("/tmp/test1.nc", encoding={"z": {"chunksizes": (5, 5)}})

This one is not rechunked, also original chunks are lost

ds.chunk({"x": 5, "y": 5}).to_netcdf("/tmp/test2.nc")

This one is rechunked

ds.chunk({"x": 5, "y": 5}).to_zarr("/tmp/test2", mode="w")

Frozen({'x': (10, 10), 'y': (10, 10)}) <xarray.backends.zarr.ZarrStore at 0x7f3669f1af80>

xr.open_mfdataset("/tmp/test1.nc").chunks xr.open_mfdataset("/tmp/test2.nc").chunks xr.open_mfdataset("/tmp/test2", engine="zarr").chunks

Frozen({'x': (5, 5, 5, 5), 'y': (5, 5, 5, 5)}) Frozen({'x': (20,), 'y': (20,)}) Frozen({'x': (5, 5, 5, 5), 'y': (5, 5, 5, 5)}) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

I did get the same results for h5netcdf and scipy backends, so I am not sure whether this is a bug or not. The above code is a modified version of #2198. A suggestion: the documentation provides only examples of encoding styles. It would be helpful to provide links to a full specification.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.5.5-1-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.10.1 pandas: 2.1.1 numpy: 1.24.4 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: 1.2.0 h5py: 3.10.0 Nio: None zarr: 2.16.1 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.10.0 distributed: 2023.10.0 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: None numbagg: 0.5.1 fsspec: 2023.10.0 cupy: None pint: None sparse: 0.14.0 flox: 0.8.1 numpy_groupies: 0.10.2 setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: None mypy: None IPython: 8.16.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8385/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 80.667ms · About: xarray-datasette