issues: 1983891070

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1983891070	I_kwDOAMm_X852P8Z-	8427	Ambiguous behavior with coordinates when appending to Zarr store with append_dim	1197350	closed	0			4	2023-11-08T15:40:19Z	2023-12-01T03:58:56Z	2023-12-01T03:58:55Z	MEMBER				What happened? There are two quite different scenarios covered by "append" with Zarr Adding new variables to a dataset Extending arrays along a dimensions (via `append_dim`) This issue is about what should happen when using `append_dim` with variables that do not contain `append_dim`. Here's the current behavior. ```python import xarray as xr import zarr ds1 = xr.DataArray( np.array([1, 2, 3]).reshape(3, 1, 1), dims=('time', 'y', 'x'), coords={'x': [1], 'y': [2]}, name="foo" ).to_dataset() ds2 = xr.DataArray( np.array([4, 5]).reshape(2, 1, 1), dims=('time', 'y', 'x'), coords={'x':[-1], 'y': [-2]}, name="foo" ).to_dataset() how concat works: data are aligned ds_concat = xr.concat([ds1, ds2], dim="time") assert ds_concat.dims == {"time": 5, "y": 2, "x": 2} now do a Zarr append store = zarr.storage.MemoryStore() ds1.to_zarr(store, consolidated=False) we do not check that the coordinates are aligned--just that they have the same shape and dtype ds2.to_zarr(store, append_dim="time", consolidated=False) ds_append = xr.open_zarr(store, consolidated=False) coordinates data have been overwritten assert ds_append.dims == {"time": 5, "y": 1, "x": 1} ...with the latest values assert ds_append.x.data[0] == -1 ``` Currently, we always write all data variables in this scenario. That includes overwriting the coordinates every time we append. That makes appending more expensive than it needs to be. I don't think that is the behavior most users want or expect. What did you expect to happen? There are a couple of different options we could consider for how to handle this "extending" situation (with `append_dim`) [current behavior] Do not attempt to align coordinates a. [current behavior] Overwrite coordinates with new data b. Keep original coordinates c. Force the user to explicitly drop the coordinates, as we do for `region` operations. Attempt to align coordinates a. Fail if coordinates don't match b. Extend the arrays to replicate the behavior of `concat` We currently do 1a. I propose to switch to 1b. I think it is closer to what users want, and it requires less I/O. Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 \| packaged by conda-forge \| (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.10.176-157.645.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.10.1 pandas: 2.1.2 numpy: 1.24.4 scipy: 1.11.3 netCDF4: 1.6.5 pydap: installed h5netcdf: 1.2.0 h5py: 3.10.0 Nio: None zarr: 2.16.0 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.10.1 distributed: 2023.10.1 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: 0.13.0 numbagg: 0.6.0 fsspec: 2023.10.0 cupy: None pint: 0.22 sparse: 0.14.0 flox: 0.8.1 numpy_groupies: 0.10.2 setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: 7.4.3 mypy: None IPython: 8.16.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8427/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

2 rows from issues_id in issues_labels
0 rows from issue in issue_comments