issues: 1804983457
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1804983457 | I_kwDOAMm_X85rldyh | 7987 | Existing chunks not being respected on to_zarr() | 99441529 | closed | 0 | 2 | 2023-07-14T14:30:20Z | 2023-11-24T22:15:04Z | 2023-11-24T22:15:04Z | NONE | What is your issue?Hi folks, I'm not sure what I'm doing wrong here. Context: I have a dataset, adding a coordinate variable with some specified chunking, and then that chunking is reset when writing to_zarr() and opening from disk. Even if I call unify_chunks() before writing or explicitly set the Reproducer below: versions: xarray 2023.6.0 dask 2023.6.0 zarr 2.14.2 ``` import pandas as pd import numpy as np import xarray as xr from datetime import datetime import dask.array as da from pandas.tseries.offsets import MonthEnd create toy datasetdates = [datetime(2021, 1, 1), datetime(2021, 2, 1), datetime(2021, 3, 1), datetime(2021, 4, 1)] ds = xr.Dataset( data_vars=dict( tas=(["time", "lat", "lon",], np.array([300]) * np.ones((len(dates), 4, 4)), ), ), coords=dict( time=dates, lat=np.array([10, 11, 12, 13]), lon=np.array([20, 21, 22, 23]), ), ) chunk datasetds = ds.chunk(chunks={"lat": 2, "lon": 2, 'time': 2}) add end date coordinatetime_df = pd.DataFrame({"time": list(ds['time'].values)}) time_df['end_date'] = time_df['time'] + MonthEnd(0) ds = ds.assign_coords( end_date=("time", da.from_array(time_df['end_date'].values))) print("---data before unify chunks--- \n", ds.end_date.data) ds = ds.unify_chunks() # surely you'll respect my chunking print("---data before writing to s3--- \n", ds.end_date.data) ds.time.encoding['preferred_chunks']= {'time': 2} # please use this chunking? ds.end_date.encoding['preferred_chunks']= {'time': 2} # pretty please use this chunking? write data to s3data_path = "/tmp/mydata.zarr" ds.to_zarr(data_path, mode='w') read data back from s3ds_check = xr.open_zarr(data_path) print("---data after writing to disk and reading back in--- \n", ds_check.end_date.data) ``` output from code above:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7987/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |