id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 770937642,MDU6SXNzdWU3NzA5Mzc2NDI=,4708,Potentially spurious warning in rechunk,1312546,closed,0,,,0,2020-12-18T14:37:32Z,2020-12-24T11:32:43Z,2020-12-24T11:32:43Z,MEMBER,,,,"**What happened**: When reading an zarr dataset where the last chunk is smaller than the chunk size, users see a `UserWarning` that this may be inefficient, since the chunking differs from the chunking on disk. In general that's a good warning, but it shouldn't appear when the only difference between the on-disk chunking and the Dataset chunking is the last chunk. **What you expected to happen**: No warning. **Minimal Complete Verifiable Example**: ```python # Create and write the data import numpy as np import pandas as pd import xarray as xr np.random.seed(0) temperature = 15 + 8 * np.random.randn(2, 2, 3) precipitation = 10 * np.random.rand(2, 2, 3) lon = [[-99.83, -99.32], [-99.79, -99.23]] lat = [[42.25, 42.21], [42.63, 42.59]] time = pd.date_range(""2014-09-06"", periods=3) reference_time = pd.Timestamp(""2014-09-05"") ds = xr.Dataset( data_vars=dict( temperature=([""x"", ""y"", ""time""], temperature), precipitation=([""x"", ""y"", ""time""], precipitation), ), coords=dict( lon=([""x"", ""y""], lon), lat=([""x"", ""y""], lat), time=time, reference_time=reference_time, ), attrs=dict(description=""Weather related data.""), ) ds2 = ds.chunk(chunks=dict(time=(2, 1))) ds2['temperature'].chunks ds2.to_zarr(""/tmp/test.zarr"", mode=""w"") ``` Reading it produces a warning ```python xr.open_zarr(""/tmp/test.zarr"") /mnt/c/Users/taugspurger/src/xarray/xarray/core/dataset.py:408: UserWarning: Specified Dask chunks (2, 1) would separate on disks chunk shape 2 for dimension time. This could degrade performance. Consider rechunking after loading instead. _check_chunks_compatibility(var, output_chunks, preferred_chunks) ``` **Anything else we need to know?**: The check around https://github.com/pydata/xarray/blob/91318d2ee63149669404489be9198f230d877642/xarray/core/dataset.py#L371-L378 should probably ignore the very last chunk, since Zarr allows it to be different? **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.128-microsoft-standard machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.16.3.dev21+g96e1aea0 pandas: 1.1.4 numpy: 1.19.4 scipy: 1.5.4 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.6.2.dev9+dirty cftime: 1.3.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.2.4 conda: None pytest: 5.4.3 IPython: 7.19.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4708/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue