home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1177665302

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1177665302 I_kwDOAMm_X85GMb8W 6401 Unnecessary warning when specifying `chunks` opening dataset with empty dimension 4666753 closed 0     0 2022-03-23T06:38:25Z 2022-04-09T20:27:40Z 2022-04-09T20:27:40Z CONTRIBUTOR      

What happened?

I receive unnecessary warnings when opening Zarr datasets with empty dimensions/arrays using the chunks argument (for a non-empty dimension).

If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (#5742). If the chunks parameter is provided for other dimensions when loading the Zarr file (based on the expected chunksizes were the array nonempty), xarray gives a warning about potentially degraded performance from splitting the single chunk.

What did you expect to happen?

I expect no warning to be raised when there is no data:

  • performance degradation on an empty array should be negligible.
  • we don't always know if one of the dimensions is empty until loading. But we would use the chunks parameter for dimensions with consistent chunksizes (to specify a multiple of what's on disk) -- this is thrown off when other dimensions are empty.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

each a is expected to be chunked separately

ds = xr.Dataset({"x": (("a", "b"), np.empty((4, 0)))}).chunk({"a": 1})

but when we save it, it gets saved as a single chunk

ds.to_zarr("tmp.zarr")

so if we open it up with expected chunksizes (not knowing that b is empty):

ds2 = xr.open_zarr("tmp.zarr", chunks={"a": 1})

we get a warning :(

```

Relevant log output

Python {...}/miniconda3/envs/new-majiq/lib/python3.8/site-packages/xarray/core/dataset.py:410: UserWarning: Specified Dask chunks (1, 1, 1, 1) would separate on disks chunk shape 4 for dime nsion a. This could degrade performance. (chunks = {'a': (1, 1, 1, 1), 'b': (0,)}, preferred_ch unks = {'a': 4, 'b': 1}). Consider rechunking after loading instead. _check_chunks_compatibility(var, output_chunks, preferred_chunks)

Anything else we need to know?

This can be fixed by only calling _check_chunks_compatibility() whenever var is nonempty (PR forthcoming).

Environment

INSTALLED VERSIONS [3/1946]

commit: None python: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:42:07) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None

xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.11.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.4 dask: 2022.01.0 distributed: 2022.01.0 matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.4 conda: None pytest: 7.0.1 IPython: 8.1.1 sphinx: 4.4.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6401/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.603ms · About: xarray-datasette