id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2115049090,I_kwDOAMm_X85-ERaC,8694,Error while saving an altered dataset to NetCDF when loaded from a file,12544636,open,0,,,4,2024-02-02T14:18:03Z,2024-02-07T13:38:40Z,,NONE,,,,"### What happened? When attempting to save an altered Xarray dataset to a NetCDF file using the `to_netcdf` method, an error occurs if the original dataset is loaded from a file. Specifically, this error does not occur when the dataset is created directly but only when it is loaded from a file. ### What did you expect to happen? The altered Xarray dataset is saved as a NetCDF file using the `to_netcdf` method. ### Minimal Complete Verifiable Example ```Python import xarray as xr ds = xr.Dataset( data_vars=dict( win_1=(""attempt"", [True, False, True, False, False, True]), win_2=(""attempt"", [False, True, False, True, False, False]), ), coords=dict( attempt=[1, 2, 3, 4, 5, 6], player_1=(""attempt"", [""paper"", ""paper"", ""scissors"", ""scissors"", ""paper"", ""paper""]), player_2=(""attempt"", [""rock"", ""scissors"", ""paper"", ""rock"", ""paper"", ""rock""]), ) ) ds.to_netcdf(""dataset.nc"") ds_from_file = xr.load_dataset(""dataset.nc"") ds_altered = ds_from_file.where(ds_from_file[""player_1""] == ""paper"", drop=True) ds_altered.to_netcdf(""dataset_altered.nc"") ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output ```Python Traceback (most recent call last): File ""example.py"", line 20, in ds_altered.to_netcdf(""dataset_altered.nc"") File "".../python3.9/site-packages/xarray/core/dataset.py"", line 2303, in to_netcdf return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( File "".../python3.9/site-packages/xarray/backends/api.py"", line 1315, in to_netcdf dump_to_store( File "".../python3.9/site-packages/xarray/backends/api.py"", line 1362, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File "".../python3.9/site-packages/xarray/backends/common.py"", line 356, in store self.set_variables( File "".../python3.9/site-packages/xarray/backends/common.py"", line 398, in set_variables writer.add(source, target) File "".../python3.9/site-packages/xarray/backends/common.py"", line 243, in add target[...] = source File "".../python3.9/site-packages/xarray/backends/scipy_.py"", line 78, in __setitem__ data[key] = value File "".../python3.9/site-packages/scipy/io/_netcdf.py"", line 1019, in __setitem__ self.data[index] = data ValueError: could not broadcast input array from shape (4,5) into shape (4,8) ``` ### Anything else we need to know? **Findings:** The issue is related to the encoding information of the dataset becoming invalid after filtering data with the `where` method. The `to_netcdf` method takes the available encoding information instead of considering the actual shape of the data. In the provided examples, the maximum length of strings stored in ""player_1"" and ""player_2"" is originally set to 8 characters. However, after filtering with the `where` method, the maximum length of the string becomes 5 in ""player_1"" and remains 8 in ""player_2."". But the encoding information of the variables still shows a length of 8, particularly the attribute `char_dim_name`. **Workaround:** A workaround to resolve this issue is to call the `drop_encoding` method on the dataset before saving it with `to_netcdf`. This action ensures that the encoding information is not available, and the `to_netcdf` method is forced to take the actual shapes of the data, preventing the broadcasting error. ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.9.14 (main, Aug 24 2023, 14:01:46) [GCC 11.4.0] python-bits: 64 OS: Linux OS-release: 6.3.1-060301-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.3 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 69.0.3 pip: 23.3.2 conda: None pytest: None mypy: None IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8694/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue