id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2132686003,PR_kwDOAMm_X85mxZFT,8744,Update docs on view / copies,16925278,closed,0,,,4,2024-02-13T16:14:40Z,2024-03-25T20:35:23Z,2024-03-25T20:35:19Z,CONTRIBUTOR,,0,pydata/xarray/pulls/8744,"- Add reference to numpy docs on view / copies in the corresponding section of the xarray docs, to help clarify pydata#8728 . - Add note that other xarray operations also return views rather than copies in the Copies vs. Views section of the docs - Add note that `da.values()` returns a view in the header for `da.values()`. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8744/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 2132593768,I_kwDOAMm_X85_HMxo,8743,`.reset_index()`/`.reset_coords()` maintain MultiIndex status ,16925278,closed,0,,,6,2024-02-13T15:32:47Z,2024-03-01T16:05:11Z,2024-03-01T16:05:11Z,CONTRIBUTOR,,,,"### What happened? Trying to save a dataset to NetCDF using `ds.to_netcdf()` will fail when one of the coordinates is a multiindex. The error message suggests using `.reset_index()` to remove the multiindex. However, saving still fails after resetting the index, including after moving the offending coordinates to be data variables instead using `.reset_coords()`. ### What did you expect to happen? After calling `.reset_index()`, and especially after calling `.reset_coords()`, the save should be successful. As shown in the example below, a dataset that asserts identical to the dataset that throws the error saves without a problem. (this also points to a current workaround - to recreate the Dataset from scratch). ### Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np # Create random dataset ds = xr.Dataset({'test':(['lat','lon'],np.random.rand(2,3))}, coords = {'lat':(['lat'],[0,1]), 'lon':(['lon'],[0,1,2])}) # Create multiindex by stacking ds = ds.stack(locv=('lat','lon')) # The index shows up as a MultiIndex print(ds.indexes) # Try to export (this fails as expected, since multiindex) #ds.to_netcdf('test.nc') # Now, get rid of multiindex by resetting coords (i.e., # turning coordinates into data variables) ds = ds.reset_index('locv').reset_coords() # The index is no longer a MultiIndex print(ds.indexes) # Try to export - this also fails! #ds.to_netcdf('test.nc') # A reference comparison dataset that is successfully asserted # as identical ds_compare = xr.Dataset({k:(['locv'],ds[k].values) for k in ds}) xr.testing.assert_identical(ds_compare,ds) # Try exporting (this succeeds) ds_compare.to_netcdf('test.nc') ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output ```Python --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) Cell In[109], line 1 ----> 1 ds.to_netcdf('test.nc') File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/core/dataset.py:2303, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2300 encoding = {} 2301 from xarray.backends.api import to_netcdf -> 2303 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2304 self, 2305 path, 2306 mode=mode, 2307 format=format, 2308 group=group, 2309 engine=engine, 2310 encoding=encoding, 2311 unlimited_dims=unlimited_dims, 2312 compute=compute, 2313 multifile=False, 2314 invalid_netcdf=invalid_netcdf, 2315 ) File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/api.py:1315, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1310 # TODO: figure out how to refactor this logic (here and in save_mfdataset) 1311 # to avoid this mess of conditionals 1312 try: 1313 # TODO: allow this work (setting up the file for writing array data) 1314 # to be parallelized with dask -> 1315 dump_to_store( 1316 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1317 ) 1318 if autoclose: 1319 store.close() File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/api.py:1362, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1359 if encoder: 1360 variables, attrs = encoder(variables, attrs) -> 1362 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/common.py:352, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 349 if writer is None: 350 writer = ArrayWriter() --> 352 variables, attributes = self.encode(variables, attributes) 354 self.set_attributes(attributes) 355 self.set_dimensions(variables, unlimited_dims=unlimited_dims) File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/common.py:441, in WritableCFDataStore.encode(self, variables, attributes) 438 def encode(self, variables, attributes): 439 # All NetCDF files get CF encoded by default, without this attempting 440 # to write times, for example, would fail. --> 441 variables, attributes = cf_encoder(variables, attributes) 442 variables = {k: self.encode_variable(v) for k, v in variables.items()} 443 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()} File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:791, in cf_encoder(variables, attributes) 788 # add encoding for time bounds variables if present. 789 _update_bounds_encoding(variables) --> 791 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} 793 # Remove attrs from bounds variables (issue #2921) 794 for var in new_vars.values(): File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:179, in encode_cf_variable(var, needs_copy, name) 157 def encode_cf_variable( 158 var: Variable, needs_copy: bool = True, name: T_Name = None 159 ) -> Variable: 160 """""" 161 Converts a Variable into a Variable which follows some 162 of the CF conventions: (...) 177 A variable which has been encoded as described above. 178 """""" --> 179 ensure_not_multiindex(var, name=name) 181 for coder in [ 182 times.CFDatetimeCoder(), 183 times.CFTimedeltaCoder(), (...) 190 variables.BooleanCoder(), 191 ]: 192 var = coder.encode(var, name=name) File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:88, in ensure_not_multiindex(var, name) 86 def ensure_not_multiindex(var: Variable, name: T_Name = None) -> None: 87 if isinstance(var._data, indexing.PandasMultiIndexingAdapter): ---> 88 raise NotImplementedError( 89 f""variable {name!r} is a MultiIndex, which cannot yet be "" 90 ""serialized. Instead, either use reset_index() "" 91 ""to convert MultiIndex levels into coordinate variables instead "" 92 ""or use https://cf-xarray.readthedocs.io/en/latest/coding.html."" 93 ) NotImplementedError: variable 'lat' is a MultiIndex, which cannot yet be serialized. Instead, either use reset_index() to convert MultiIndex levels into coordinate variables instead or use https://cf-xarray.readthedocs.io/en/latest/coding.html. ``` ### Anything else we need to know? This is a recent error that came up in some automated tests - an older version of it is still working; so `xarray v2023.1.0` does not have this issue. Given that saving works with a dataset that `xr.testing.assert_identical()` asserts is identical to the dataset that fails, and that `ds.indexes()` no longer shows a MultiIndex on the dataset that fails, perhaps the issue is in the error itself - i.e., [in](https://github.com/pydata/xarray/blob/4806412c7f08480d283f1565262f040bdc74d11d/xarray/conventions.py#L86) `xarray.conventions.ensure_not_multiindex` ? Looks like it was added recently https://github.com/pydata/xarray/commit/f9f4c730254073f0f5a8fce65f4bbaa0eefec5fd to address another bug. ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:05:03) [Clang 16.0.6 ] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.3 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.2 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: 0.15.1 flox: None numpy_groupies: None setuptools: 69.0.3 pip: 24.0 conda: None pytest: 7.4.0 mypy: None IPython: 8.21.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8743/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue