home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 770937642

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
770937642 MDU6SXNzdWU3NzA5Mzc2NDI= 4708 Potentially spurious warning in rechunk 1312546 closed 0     0 2020-12-18T14:37:32Z 2020-12-24T11:32:43Z 2020-12-24T11:32:43Z MEMBER      

What happened:

When reading an zarr dataset where the last chunk is smaller than the chunk size, users see a UserWarning that this may be inefficient, since the chunking differs from the chunking on disk. In general that's a good warning, but it shouldn't appear when the only difference between the on-disk chunking and the Dataset chunking is the last chunk.

What you expected to happen:

No warning.

Minimal Complete Verifiable Example:

```python

Create and write the data

import numpy as np import pandas as pd import xarray as xr

np.random.seed(0) temperature = 15 + 8 * np.random.randn(2, 2, 3) precipitation = 10 * np.random.rand(2, 2, 3) lon = [[-99.83, -99.32], [-99.79, -99.23]] lat = [[42.25, 42.21], [42.63, 42.59]] time = pd.date_range("2014-09-06", periods=3) reference_time = pd.Timestamp("2014-09-05") ds = xr.Dataset( data_vars=dict( temperature=(["x", "y", "time"], temperature), precipitation=(["x", "y", "time"], precipitation), ), coords=dict( lon=(["x", "y"], lon), lat=(["x", "y"], lat), time=time, reference_time=reference_time, ), attrs=dict(description="Weather related data."), ) ds2 = ds.chunk(chunks=dict(time=(2, 1))) ds2['temperature'].chunks

ds2.to_zarr("/tmp/test.zarr", mode="w") ```

Reading it produces a warning

python xr.open_zarr("/tmp/test.zarr") /mnt/c/Users/taugspurger/src/xarray/xarray/core/dataset.py:408: UserWarning: Specified Dask chunks (2, 1) would separate on disks chunk shape 2 for dimension time. This could degrade performance. Consider rechunking after loading instead. _check_chunks_compatibility(var, output_chunks, preferred_chunks)

Anything else we need to know?:

The check around https://github.com/pydata/xarray/blob/91318d2ee63149669404489be9198f230d877642/xarray/core/dataset.py#L371-L378 should probably ignore the very last chunk, since Zarr allows it to be different?

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.128-microsoft-standard machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.16.3.dev21+g96e1aea0 pandas: 1.1.4 numpy: 1.19.4 scipy: 1.5.4 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.6.2.dev9+dirty cftime: 1.3.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.2.4 conda: None pytest: 5.4.3 IPython: 7.19.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4708/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.755ms · About: xarray-datasette