id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1831975171,I_kwDOAMm_X85tMbkD,8039,Update assign_coords with a MultiIndex to match new Coordinates API,20629530,closed,0,,,11,2023-08-01T20:22:41Z,2023-08-29T14:23:30Z,2023-08-29T14:23:30Z,CONTRIBUTOR,,,,"### What is your issue? A pattern we used in `xclim` (and elsewhere) seems to be broken on the master. See MWE: ```python3 import pandas as pd import xarray as xr da = xr.DataArray([1] * 730, coords={""time"": xr.date_range('1900-01-01', periods=730, freq='D', calendar='noleap')}) mulind = pd.MultiIndex.from_arrays((da.time.dt.year.values, da.time.dt.dayofyear.values), names=('year', 'doy')) # Override previous time axis with new MultiIndex da.assign_coords(time=mulind).unstack('time') ``` Now this works ok with both the current master and the latest release. However, if we chunk `da`, the last line now fails: ```python da.chunk(time=50).assign_coords(time=mulind).unstack('time') ``` On the master, this gives: `ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}` Full traceback:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[44], line 1 ----> 1 da.chunk(time=50).assign_coords(time=mulind).unstack(""time"") File ~/Projets/xarray/xarray/core/dataarray.py:2868, in DataArray.unstack(self, dim, fill_value, sparse) 2808 def unstack( 2809 self, 2810 dim: Dims = None, 2811 fill_value: Any = dtypes.NA, 2812 sparse: bool = False, 2813 ) -> DataArray: 2814 """""" 2815 Unstack existing dimensions corresponding to MultiIndexes into 2816 multiple new dimensions. (...) 2866 DataArray.stack 2867 """""" -> 2868 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse) 2869 return self._from_temp_dataset(ds) File ~/Projets/xarray/xarray/core/dataset.py:5481, in Dataset.unstack(self, dim, fill_value, sparse) 5479 for d in dims: 5480 if needs_full_reindex: -> 5481 result = result._unstack_full_reindex( 5482 d, stacked_indexes[d], fill_value, sparse 5483 ) 5484 else: 5485 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse) File ~/Projets/xarray/xarray/core/dataset.py:5365, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse) 5362 else: 5363 # TODO: we may depreciate implicit re-indexing with a pandas.MultiIndex 5364 xr_full_idx = PandasMultiIndex(full_idx, dim) -> 5365 indexers = Indexes( 5366 {k: xr_full_idx for k in index_vars}, 5367 xr_full_idx.create_variables(index_vars), 5368 ) 5369 obj = self._reindex( 5370 indexers, copy=False, fill_value=fill_value, sparse=sparse 5371 ) 5373 for name, var in obj.variables.items(): File ~/Projets/xarray/xarray/core/indexes.py:1435, in Indexes.__init__(self, indexes, variables, index_type) 1433 unmatched_keys = set(indexes) ^ set(variables) 1434 if unmatched_keys: -> 1435 raise ValueError( 1436 f""unmatched keys found in indexes and variables: {unmatched_keys}"" 1437 ) 1439 if any(not isinstance(idx, index_type) for idx in indexes.values()): 1440 index_type_str = f""{index_type.__module__}.{index_type.__name__}"" ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}
This seems related to PR #7368. The reason for the title of this issue is that in both versions, I now realize the `da.assign_coords(time=mulind)` prints as: ``` dask.array, shape=(730,), dtype=int64, chunksize=(50,), chunktype=numpy.ndarray> Coordinates: * time (time) object MultiIndex ``` Something's fishy, because the two ""sub"" indexes are not showing. And indeed, with the current master, I can get this to work by doing (again changing the last line): ```python da2 = xr.DataArray(da.data, coords=xr.Coordinates.from_pandas_multiindex(mulind, 'time')) da2.chunk(time=50).unstack('time') ``` But it seems a bit odd to me that we need to reconstruct the DataArray to replace its coordinate with a ""MultiIndex"" one. Thus, my questions are: 1. How does one properly _override_ a coordinate by a MultiIndex ? Is there a way to use `assign_coords` ? If not, then this issue would become a feature request. 2. Is this a regression ? Or was I just ""lucky"" before ?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8039/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue