home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1956383344

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1956383344 I_kwDOAMm_X850nApw 8358 Writing to zarr archive fails on resampled dataset 40218891 closed 0     1 2023-10-23T05:30:36Z 2023-10-23T15:46:20Z 2023-10-23T15:46:19Z NONE      

What happened?

I am not sure where this belongs: xarray, dask or zarr. When a dataset is resampled to a semi-monthly frequency, the method to_zarr complains about invalid chunks.

What did you expect to happen?

I think this should work without having to rechunk the result before writing to the archive.

Minimal Complete Verifiable Example

Python time = pd.date_range("2001-01-01", freq="D", periods=365) ds = xr.Dataset({"foo": ("time", np.arange(1, 366)), "time": time}).chunk(time=5) dsr = ds.resample(time="SM").mean() dsr.to_zarr('/tmp/foo', mode='w')

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

```Python

ValueError Traceback (most recent call last) Cell In[63], line 4 2 ds = xr.Dataset({"foo": ("time", np.arange(1, 366)), "time": time}).chunk(time=5) 3 dsr = ds.resample(time="SM").mean() ----> 4 dsr.to_zarr('/tmp/foo', mode='w') 5 #dsr.isel(time=slice(0, -1)).to_zarr('/tmp/foo', mode='w')

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:2490, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs) 2358 """Write dataset contents to a zarr group. 2359 2360 Zarr chunks are determined in the following way: (...) 2486 The I/O user guide, with more details and examples. 2487 """ 2488 from xarray.backends.api import to_zarr -> 2490 return to_zarr( # type: ignore[call-overload,misc] 2491 self, 2492 store=store, 2493 chunk_store=chunk_store, 2494 storage_options=storage_options, 2495 mode=mode, 2496 synchronizer=synchronizer, 2497 group=group, 2498 encoding=encoding, 2499 compute=compute, 2500 consolidated=consolidated, 2501 append_dim=append_dim, 2502 region=region, 2503 safe_chunks=safe_chunks, 2504 zarr_version=zarr_version, 2505 write_empty_chunks=write_empty_chunks, 2506 chunkmanager_store_kwargs=chunkmanager_store_kwargs, 2507 )

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/api.py:1708, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs) 1706 writer = ArrayWriter() 1707 # TODO: figure out how to properly handle unlimited_dims -> 1708 dump_to_store(dataset, zstore, writer, encoding=encoding) 1709 writes = writer.sync( 1710 compute=compute, chunkmanager_store_kwargs=chunkmanager_store_kwargs 1711 ) 1713 if compute:

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/api.py:1308, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1305 if encoder: 1306 variables, attrs = encoder(variables, attrs) -> 1308 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/zarr.py:631, in ZarrStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 628 self.set_attributes(attributes) 629 self.set_dimensions(variables_encoded, unlimited_dims=unlimited_dims) --> 631 self.set_variables( 632 variables_encoded, check_encoding_set, writer, unlimited_dims=unlimited_dims 633 ) 634 if self._consolidate_on_close: 635 zarr.consolidate_metadata(self.zarr_group.store)

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/zarr.py:687, in ZarrStore.set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 684 zarr_array = self.zarr_group[name] 685 else: 686 # new variable --> 687 encoding = extract_zarr_variable_encoding( 688 v, raise_on_invalid=check, name=vn, safe_chunks=self._safe_chunks 689 ) 690 encoded_attrs = {} 691 # the magic for storing the hidden dimension data

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/zarr.py:281, in extract_zarr_variable_encoding(variable, raise_on_invalid, name, safe_chunks) 278 if k not in valid_encodings: 279 del encoding[k] --> 281 chunks = _determine_zarr_chunks( 282 encoding.get("chunks"), variable.chunks, variable.ndim, name, safe_chunks 283 ) 284 encoding["chunks"] = chunks 285 return encoding

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/zarr.py:138, in _determine_zarr_chunks(enc_chunks, var_chunks, ndim, name, safe_chunks) 132 raise ValueError( 133 "Zarr requires uniform chunk sizes except for final chunk. " 134 f"Variable named {name!r} has incompatible dask chunks: {var_chunks!r}. " 135 "Consider rechunking using chunk()." 136 ) 137 if any((chunks[0] < chunks[-1]) for chunks in var_chunks): --> 138 raise ValueError( 139 "Final chunk of Zarr array must be the same size or smaller " 140 f"than the first. Variable named {name!r} has incompatible Dask chunks {var_chunks!r}." 141 "Consider either rechunking using chunk() or instead deleting " 142 "or modifying encoding['chunks']." 143 ) 144 # return the first chunk for each dimension 145 return tuple(chunk[0] for chunk in var_chunks)

ValueError: Final chunk of Zarr array must be the same size or smaller than the first. Variable named 'foo' has incompatible Dask chunks ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2),).Consider either rechunking using chunk() or instead deleting or modifying encoding['chunks']. ```

Anything else we need to know?

I can also achieve what I want without having to rechunk with dsr = ds.resample(time="SM", closed="right", label="right").mean().isel(time=slice(0, -1))

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.5.5-1-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.10.1 pandas: 2.1.1 numpy: 1.24.4 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.10.0 distributed: 2023.10.0 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: None numbagg: 0.5.1 fsspec: 2023.10.0 cupy: None pint: None sparse: 0.14.0 flox: 0.8.1 numpy_groupies: 0.10.2 setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: None mypy: None IPython: 8.16.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8358/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.579ms · About: xarray-datasette