home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1984961987

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1984961987 I_kwDOAMm_X852UB3D 8432 Writing a datetime coord ignores chunks 5635139 closed 0     5 2023-11-09T07:00:39Z 2024-01-29T19:12:33Z 2024-01-29T19:12:33Z MEMBER      

What happened?

When writing a coord with a datetime type, the chunking on the coord is ignored, and the whole coord is written as a single chunk. (or at least it can be, I haven't done enough to confirm whether it'll always be...)

This can be quite inconvenient. Any attempt to write to that dataset from a distributed process will have errors, since each process will be attempting to write another process's data, rather than only its region. And less severely, the chunks won't be unified.

Minimal Complete Verifiable Example

```Python ds = xr.tutorial.load_dataset('air_temperature')

( ds.chunk() .expand_dims(a=1000) .assign_coords( time2=lambda x: x.time, time_int=lambda x: (("time"), np.full(ds.sizes["time"], 1)), ) .chunk(time=10) .to_zarr("foo.zarr", mode="w") )

xr.open_zarr('foo.zarr')

Note the chunksize=(2920,) vs chunksize=(10,)!

<xarray.Dataset> Dimensions: (a: 1000, time: 2920, lat: 25, lon: 53) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 time2 (time) datetime64[ns] dask.array<chunksize=(2920,), meta=np.ndarray> # here time_int (time) int64 dask.array<chunksize=(10,), meta=np.ndarray> # here Dimensions without coordinates: a Data variables: air (a, time, lat, lon) float32 dask.array<chunksize=(1000, 10, 25, 53), meta=np.ndarray> Attributes: Conventions: COARDS description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly... title: 4x daily NMC reanalysis (1948)

xr.open_zarr('foo.zarr').chunks

ValueError Traceback (most recent call last) Cell In[13], line 1 ----> 1 xr.open_zarr('foo.zarr').chunks

File /opt/homebrew/lib/python3.9/site-packages/xarray/core/dataset.py:2567, in Dataset.chunks(self) 2552 @property 2553 def chunks(self) -> Mapping[Hashable, tuple[int, ...]]: 2554 """ 2555 Mapping from dimension names to block lengths for this dataset's data, or None if 2556 the underlying data is not a dask array. (...) 2565 xarray.unify_chunks 2566 """ -> 2567 return get_chunksizes(self.variables.values())

File /opt/homebrew/lib/python3.9/site-packages/xarray/core/common.py:2013, in get_chunksizes(variables) 2011 for dim, c in v.chunksizes.items(): 2012 if dim in chunks and c != chunks[dim]: -> 2013 raise ValueError( 2014 f"Object has inconsistent chunks along dimension {dim}. " 2015 "This can be fixed by calling unify_chunks()." 2016 ) 2017 chunks[dim] = c 2018 return Frozen(chunks)

ValueError: Object has inconsistent chunks along dimension time. This can be fixed by calling unify_chunks().

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.18 (main, Nov 2 2023, 16:51:22) [Clang 14.0.3 (clang-1403.0.22.14.1)] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2023.10.1 pandas: 2.1.1 numpy: 1.26.1 scipy: 1.11.1 netCDF4: None pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.16.0 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.5.0 distributed: 2023.5.0 matplotlib: 3.6.0 cartopy: None seaborn: 0.12.2 numbagg: 0.6.0 fsspec: 2022.8.2 cupy: None pint: 0.22 sparse: 0.14.0 flox: 0.8.1 numpy_groupies: 0.9.22 setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: 7.4.0 mypy: 1.6.1 IPython: 8.14.0 sphinx: 5.2.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8432/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.74ms · About: xarray-datasette