home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 2132593768

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2132593768 I_kwDOAMm_X85_HMxo 8743 `.reset_index()`/`.reset_coords()` maintain MultiIndex status 16925278 closed 0     6 2024-02-13T15:32:47Z 2024-03-01T16:05:11Z 2024-03-01T16:05:11Z CONTRIBUTOR      

What happened?

Trying to save a dataset to NetCDF using ds.to_netcdf() will fail when one of the coordinates is a multiindex. The error message suggests using .reset_index() to remove the multiindex. However, saving still fails after resetting the index, including after moving the offending coordinates to be data variables instead using .reset_coords().

What did you expect to happen?

After calling .reset_index(), and especially after calling .reset_coords(), the save should be successful.

As shown in the example below, a dataset that asserts identical to the dataset that throws the error saves without a problem. (this also points to a current workaround - to recreate the Dataset from scratch).

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

Create random dataset

ds = xr.Dataset({'test':(['lat','lon'],np.random.rand(2,3))}, coords = {'lat':(['lat'],[0,1]), 'lon':(['lon'],[0,1,2])})

Create multiindex by stacking

ds = ds.stack(locv=('lat','lon'))

The index shows up as a MultiIndex

print(ds.indexes)

Try to export (this fails as expected, since multiindex)

ds.to_netcdf('test.nc')

Now, get rid of multiindex by resetting coords (i.e.,

turning coordinates into data variables)

ds = ds.reset_index('locv').reset_coords()

The index is no longer a MultiIndex

print(ds.indexes)

Try to export - this also fails!

ds.to_netcdf('test.nc')

A reference comparison dataset that is successfully asserted

as identical

ds_compare = xr.Dataset({k:(['locv'],ds[k].values) for k in ds}) xr.testing.assert_identical(ds_compare,ds)

Try exporting (this succeeds)

ds_compare.to_netcdf('test.nc') ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

```Python

NotImplementedError Traceback (most recent call last) Cell In[109], line 1 ----> 1 ds.to_netcdf('test.nc')

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/core/dataset.py:2303, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2300 encoding = {} 2301 from xarray.backends.api import to_netcdf -> 2303 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2304 self, 2305 path, 2306 mode=mode, 2307 format=format, 2308 group=group, 2309 engine=engine, 2310 encoding=encoding, 2311 unlimited_dims=unlimited_dims, 2312 compute=compute, 2313 multifile=False, 2314 invalid_netcdf=invalid_netcdf, 2315 )

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/api.py:1315, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1310 # TODO: figure out how to refactor this logic (here and in save_mfdataset) 1311 # to avoid this mess of conditionals 1312 try: 1313 # TODO: allow this work (setting up the file for writing array data) 1314 # to be parallelized with dask -> 1315 dump_to_store( 1316 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1317 ) 1318 if autoclose: 1319 store.close()

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/api.py:1362, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1359 if encoder: 1360 variables, attrs = encoder(variables, attrs) -> 1362 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/common.py:352, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 349 if writer is None: 350 writer = ArrayWriter() --> 352 variables, attributes = self.encode(variables, attributes) 354 self.set_attributes(attributes) 355 self.set_dimensions(variables, unlimited_dims=unlimited_dims)

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/common.py:441, in WritableCFDataStore.encode(self, variables, attributes) 438 def encode(self, variables, attributes): 439 # All NetCDF files get CF encoded by default, without this attempting 440 # to write times, for example, would fail. --> 441 variables, attributes = cf_encoder(variables, attributes) 442 variables = {k: self.encode_variable(v) for k, v in variables.items()} 443 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:791, in cf_encoder(variables, attributes) 788 # add encoding for time bounds variables if present. 789 _update_bounds_encoding(variables) --> 791 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} 793 # Remove attrs from bounds variables (issue #2921) 794 for var in new_vars.values():

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:179, in encode_cf_variable(var, needs_copy, name) 157 def encode_cf_variable( 158 var: Variable, needs_copy: bool = True, name: T_Name = None 159 ) -> Variable: 160 """ 161 Converts a Variable into a Variable which follows some 162 of the CF conventions: (...) 177 A variable which has been encoded as described above. 178 """ --> 179 ensure_not_multiindex(var, name=name) 181 for coder in [ 182 times.CFDatetimeCoder(), 183 times.CFTimedeltaCoder(), (...) 190 variables.BooleanCoder(), 191 ]: 192 var = coder.encode(var, name=name)

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:88, in ensure_not_multiindex(var, name) 86 def ensure_not_multiindex(var: Variable, name: T_Name = None) -> None: 87 if isinstance(var._data, indexing.PandasMultiIndexingAdapter): ---> 88 raise NotImplementedError( 89 f"variable {name!r} is a MultiIndex, which cannot yet be " 90 "serialized. Instead, either use reset_index() " 91 "to convert MultiIndex levels into coordinate variables instead " 92 "or use https://cf-xarray.readthedocs.io/en/latest/coding.html." 93 )

NotImplementedError: variable 'lat' is a MultiIndex, which cannot yet be serialized. Instead, either use reset_index() to convert MultiIndex levels into coordinate variables instead or use https://cf-xarray.readthedocs.io/en/latest/coding.html. ```

Anything else we need to know?

This is a recent error that came up in some automated tests - an older version of it is still working; so xarray v2023.1.0 does not have this issue.

Given that saving works with a dataset that xr.testing.assert_identical() asserts is identical to the dataset that fails, and that ds.indexes() no longer shows a MultiIndex on the dataset that fails, perhaps the issue is in the error itself - i.e., in xarray.conventions.ensure_not_multiindex ?

Looks like it was added recently https://github.com/pydata/xarray/commit/f9f4c730254073f0f5a8fce65f4bbaa0eefec5fd to address another bug.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:05:03) [Clang 16.0.6 ] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.3 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.2 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: 0.15.1 flox: None numpy_groupies: None setuptools: 69.0.3 pip: 24.0 conda: None pytest: 7.4.0 mypy: None IPython: 8.21.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8743/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.803ms · About: xarray-datasette