id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1876858952,I_kwDOAMm_X85v3phI,8134,Unable to append data in s3 bucket with to_zarr() and append mode ,27021858,closed,0,,,2,2023-09-01T06:57:32Z,2023-09-01T16:03:50Z,2023-09-01T16:03:49Z,NONE,,,,"### What happened? I updated my packages and now xarray+zarr are unable to append data to an existing Zarr store in s3. ### What did you expect to happen? That data will be appended to an existing Zarr store. ### Minimal Complete Verifiable Example ```Python import s3fs import xarray import numpy as np from datetime import datetime from s3fs import S3FileSystem append_dim = 'dt_calc' consolidated = True ds = xarray.Dataset( {'temp': (('dt_calc', 'y', 'x'), np.array([[[1., 2., 3., 4.], [3., 4., 5., 6.]]]))}, coords={'lon': ('y', np.array([50., 51.])), 'lat': ('x', np.array([4., 5., 6., 7.])), 'dt_calc': ('dt_calc', [datetime(2022, 1, 1)])} ) ds_2 = xarray.Dataset( {'temp': (('dt_calc', 'y', 'x'), np.array([[[1., 2., 3., 4.], [3., 4., 5., 6.]]]))}, coords={'lon': ('y', np.array([50., 51.])), 'lat': ('x', np.array([4., 5., 6., 7.])), 'dt_calc': ('dt_calc', [datetime(2022, 1, 1, 1)])} ) s3_out = S3FileSystem( anon=False, s3_additional_kwargs={""StorageClass"": storage_class}, ) store_out = s3fs.S3Map( root=f""s3:///{bucket_name}/{dataset_name}.zarr"", s3=s3_out, check=False ) ds.to_zarr( store, mode=""w-"", compute=True, consolidated=consolidated ) try: ds_2.to_zarr( store, mode=""w-"", compute=True, consolidated=consolidated ) except zarr.errors.ContainsGroupError: ds_2.to_zarr( store, mode=""a"", append_dim=append_dim, compute=True, consolidated=consolidated, ) ``` ### MVCE confirmation - [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [ ] Complete example — the example is self-contained, including all data and the text of any traceback. - [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output ```Python In [6]: xarray.open_zarr(store_out, consolidated=True) Out[6]: Dimensions: (dt_calc: 1, x: 4, y: 2) Coordinates: * dt_calc (dt_calc) datetime64[ns] 2022-01-01 lat (x) float64 dask.array lon (y) float64 dask.array Dimensions without coordinates: x, y Data variables: temp (dt_calc, y, x) float64 dask.array In [7]: dataset.to_zarr( ...: store_out, ...: mode=""a"", ...: append_dim=append_dim, ...: compute=True, ...: consolidated=consolidated, ...: ) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[7], line 1 ----> 1 dataset.to_zarr( 2 store_out, 3 mode=""a"", 4 append_dim=append_dim, 5 compute=True, 6 consolidated=consolidated, 7 ) File /usr/local/lib/python3.9/site-packages/xarray/core/dataset.py:2461, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs) 2329 """"""Write dataset contents to a zarr group. 2330 2331 Zarr chunks are determined in the following way: (...) 2457 The I/O user guide, with more details and examples. 2458 """""" 2459 from xarray.backends.api import to_zarr -> 2461 return to_zarr( # type: ignore[call-overload,misc] 2462 self, 2463 store=store, 2464 chunk_store=chunk_store, 2465 storage_options=storage_options, 2466 mode=mode, 2467 synchronizer=synchronizer, 2468 group=group, 2469 encoding=encoding, 2470 compute=compute, 2471 consolidated=consolidated, 2472 append_dim=append_dim, 2473 region=region, 2474 safe_chunks=safe_chunks, 2475 zarr_version=zarr_version, 2476 write_empty_chunks=write_empty_chunks, 2477 chunkmanager_store_kwargs=chunkmanager_store_kwargs, 2478 ) File /usr/local/lib/python3.9/site-packages/xarray/backends/api.py:1670, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs) 1668 existing_dims = zstore.get_dimensions() 1669 if append_dim not in existing_dims: -> 1670 raise ValueError( 1671 f""append_dim={append_dim!r} does not match any existing "" 1672 f""dataset dimensions {existing_dims}"" 1673 ) 1674 existing_var_names = set(zstore.zarr_group.array_keys()) 1675 for var_name in existing_var_names: ValueError: append_dim='dt_calc' does not match any existing dataset dimensions {} In [8]: dataset Out[8]: Dimensions: (dt_calc: 1, y: 2, x: 4) Coordinates: lon (y) float64 50.0 51.0 lat (x) float64 4.0 5.0 6.0 7.0 * dt_calc (dt_calc) datetime64[ns] 2022-01-01T01:00:00 Dimensions without coordinates: y, x Data variables: temp (dt_calc, y, x) float64 1.0 2.0 3.0 4.0 3.0 4.0 5.0 6.0 In [9]: ``` ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.9.10 (main, Mar 2 2022, 04:31:58) [GCC 10.2.1 20210110] python-bits: 64 OS: Linux OS-release: 6.2.0-26-generic machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.8.0 pandas: 2.1.0 numpy: 1.25.2 scipy: 1.10.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.8.1 distributed: 2023.8.1 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 53.0.0 pip: 21.2.4 conda: None pytest: 6.1.1 mypy: None IPython: 8.12.0 sphinx: None
boto3==1.26.45 aiobotocore==2.5.0 botocore==1.29.76 s3fs==2023.6.0 zarr==2.16.1 xarray==2023.8.0 dask==2023.8.1 dask[distributed]==2023.8.1 dask-cloudprovider==2022.10.0","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8134/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 545297609,MDU6SXNzdWU1NDUyOTc2MDk=,3661,How to handle 2d latitude and longitude coordinates within DataArray creation,27021858,closed,0,,,3,2020-01-04T15:31:43Z,2022-07-09T21:54:21Z,2020-01-04T19:44:15Z,NONE,,,,"#### MCVE Code Sample ```python frames_z.shape -> (25, 1100, 900) grid[:, :, 1].shape -> (1100, 900) grid[:, :, 0].shape -> (1100, 900) xarray.DataArray(frames_z, coords={'time': timestamps, 'latitude':grid[:, :, 1],'longitude': grid[:, :, 0]}, dims=['time', 'latitude', 'longitude']) ``` #### Expected Output Array with latitude and longitude as coordinates. #### Problem Description I am receiving following error message: ``` MissingDimensionsError: cannot set variable 'latitude' with 2-dimensional data without explicit dimension names. Pass a tuple of (dims, data) instead. ``` And I tried several ways of defining coords and dims but it always fails. #### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here xarray==0.11.3
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3661/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 947796627,MDU6SXNzdWU5NDc3OTY2Mjc=,5620,`xr.where()` does not work like `np.where()`on meshgrids,27021858,closed,0,,,6,2021-07-19T15:55:10Z,2021-07-20T11:32:45Z,2021-07-20T07:58:18Z,NONE,,,,"In case of selecting 2D data, the `xarray.where()` does not work like `numpy.where()` . In the documentation you have mentioned that `np.where()` is the corresponding function for `xr.where()` but it seems that they are working totally different. Here is my code: ```python data = xarray.open_dataset('path_to_attached_file') minLat, minLon, maxLat, maxLon = (45.08903556483102, 5.625000000000013, 48.92249926375824, 11.249999999999993) latitudes = data.lat.values longitudes = data.lon.values slice_mask = np.where( (latitudes <= maxLat) & (latitudes > minLat) & (longitudes <= maxLon) & (longitudes > minLon) ) _sliced_data = data.where( (data.lat <= maxLat) & (data.lat > minLat) & (data.lon <= maxLon) & (data.lon > minLon), drop=True, ) _sliced_data.latitude.values.max() # 49.305596 latitudes[slice_mask].max() # 48.922172 ``` I have also tried to translate the numpy result into a boolen DataArray: ```python mask_array = data.copy() mask_array.update({'air_temperature_2m': ((""y"", ""x""), (latitudes <= maxLat) & (latitudes > minLat) & (longitudes <= maxLon) & (longitudes > minLon))}) _sliced_data = data.where( mask_array.air_temperature_2m, drop=True, ) ``` There we have the same result. It seems that the masking step does not really works correctly. **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.9.1 (default, Feb 9 2021, 07:55:26) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 5.10.0-1033-oem machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.20.0 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: 0.8.0 h5py: 3.2.1 Nio: None zarr: None cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.4 iris: None bottleneck: None dask: 2021.05.0 distributed: None matplotlib: 3.3.4 cartopy: None seaborn: None numbagg: None pint: None setuptools: 53.0.0 pip: 21.0.1 conda: None pytest: 6.1.1 IPython: 7.21.0 sphinx: None
[harmonie_knmi_grid_fixture.zip](https://github.com/pydata/xarray/files/6842196/harmonie_knmi_grid_fixture.zip) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5620/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 545324954,MDU6SXNzdWU1NDUzMjQ5NTQ=,3662,Select on 2 dimensional indices/How to define indices correctly?,27021858,closed,0,,,2,2020-01-04T19:49:16Z,2020-09-01T02:48:44Z,2020-09-01T02:48:44Z,NONE,,,,"#### MCVE Code Sample ```python ds = xarray.tutorial.open_dataset('rasm').load() ds.sel(yc=50, xc=50, method='nearest') ``` #### Expected Output An interpolated timeseries of data closest to 50, 50. #### Problem Description How do I have to define the indices that I can select on multi dimensional indices? #### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here xarray==0.11.3
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3662/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue