id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1804983457,I_kwDOAMm_X85rldyh,7987,Existing chunks not being respected on to_zarr(),99441529,closed,0,,,2,2023-07-14T14:30:20Z,2023-11-24T22:15:04Z,2023-11-24T22:15:04Z,NONE,,,,"### What is your issue? Hi folks, I'm not sure what I'm doing wrong here. Context: I have a dataset, adding a coordinate variable with some specified chunking, and then that chunking is reset when writing to_zarr() and opening from disk. Even if I call unify_chunks() before writing or explicitly set the `preferred_chunks` encoding, the chunks get reset upon write. I have managed to force the chunks by explicitly setting `ds.end_date.encoding['chunks']= (2,)` before writing (note that before this, _no encoding was set_, so this isn't the old issue of previous encoding overwriting the dask chunks). It took a while to find this, and I don't understand why the existing behavior disregards the existing chunks on single-dimension variables or coordinates. The issue is that with this default behavior, the dataset written to disk has inconsistent dimensions on read, throws an error, and requires a call to unify_chunks() to be usable. Reproducer below: versions: xarray 2023.6.0 dask 2023.6.0 zarr 2.14.2 ``` import pandas as pd import numpy as np import xarray as xr from datetime import datetime import dask.array as da from pandas.tseries.offsets import MonthEnd # create toy dataset dates = [datetime(2021, 1, 1), datetime(2021, 2, 1), datetime(2021, 3, 1), datetime(2021, 4, 1)] ds = xr.Dataset( data_vars=dict( tas=([""time"", ""lat"", ""lon"",], np.array([300]) * np.ones((len(dates), 4, 4)), ), ), coords=dict( time=dates, lat=np.array([10, 11, 12, 13]), lon=np.array([20, 21, 22, 23]), ), ) # chunk dataset ds = ds.chunk(chunks={""lat"": 2, ""lon"": 2, 'time': 2}) # add end date coordinate time_df = pd.DataFrame({""time"": list(ds['time'].values)}) time_df['end_date'] = time_df['time'] + MonthEnd(0) ds = ds.assign_coords( end_date=(""time"", da.from_array(time_df['end_date'].values))) print(""---data before unify chunks--- \n"", ds.end_date.data) ds = ds.unify_chunks() # surely you'll respect my chunking print(""---data before writing to s3--- \n"", ds.end_date.data) ds.time.encoding['preferred_chunks']= {'time': 2} # please use this chunking? ds.end_date.encoding['preferred_chunks']= {'time': 2} # pretty please use this chunking? # write data to s3 data_path = ""/tmp/mydata.zarr"" ds.to_zarr(data_path, mode='w') # read data back from s3 ds_check = xr.open_zarr(data_path) print(""---data after writing to disk and reading back in--- \n"", ds_check.end_date.data) ``` output from code above: ``` ---data before unify chunks--- dask.array ---data before writing to s3--- dask.array ---data after writing to disk and reading back in--- dask.array ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7987/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue