id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1831975171,I_kwDOAMm_X85tMbkD,8039,Update assign_coords with a MultiIndex to match new Coordinates API,20629530,closed,0,,,11,2023-08-01T20:22:41Z,2023-08-29T14:23:30Z,2023-08-29T14:23:30Z,CONTRIBUTOR,,,,"### What is your issue?
A pattern we used in `xclim` (and elsewhere) seems to be broken on the master.
See MWE:
```python3
import pandas as pd
import xarray as xr
da = xr.DataArray([1] * 730, coords={""time"": xr.date_range('1900-01-01', periods=730, freq='D', calendar='noleap')})
mulind = pd.MultiIndex.from_arrays((da.time.dt.year.values, da.time.dt.dayofyear.values), names=('year', 'doy'))
# Override previous time axis with new MultiIndex
da.assign_coords(time=mulind).unstack('time')
```
Now this works ok with both the current master and the latest release. However, if we chunk `da`, the last line now fails:
```python
da.chunk(time=50).assign_coords(time=mulind).unstack('time')
```
On the master, this gives: `ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}`
Full traceback:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[44], line 1
----> 1 da.chunk(time=50).assign_coords(time=mulind).unstack(""time"")
File ~/Projets/xarray/xarray/core/dataarray.py:2868, in DataArray.unstack(self, dim, fill_value, sparse)
2808 def unstack(
2809 self,
2810 dim: Dims = None,
2811 fill_value: Any = dtypes.NA,
2812 sparse: bool = False,
2813 ) -> DataArray:
2814 """"""
2815 Unstack existing dimensions corresponding to MultiIndexes into
2816 multiple new dimensions.
(...)
2866 DataArray.stack
2867 """"""
-> 2868 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse)
2869 return self._from_temp_dataset(ds)
File ~/Projets/xarray/xarray/core/dataset.py:5481, in Dataset.unstack(self, dim, fill_value, sparse)
5479 for d in dims:
5480 if needs_full_reindex:
-> 5481 result = result._unstack_full_reindex(
5482 d, stacked_indexes[d], fill_value, sparse
5483 )
5484 else:
5485 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse)
File ~/Projets/xarray/xarray/core/dataset.py:5365, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse)
5362 else:
5363 # TODO: we may depreciate implicit re-indexing with a pandas.MultiIndex
5364 xr_full_idx = PandasMultiIndex(full_idx, dim)
-> 5365 indexers = Indexes(
5366 {k: xr_full_idx for k in index_vars},
5367 xr_full_idx.create_variables(index_vars),
5368 )
5369 obj = self._reindex(
5370 indexers, copy=False, fill_value=fill_value, sparse=sparse
5371 )
5373 for name, var in obj.variables.items():
File ~/Projets/xarray/xarray/core/indexes.py:1435, in Indexes.__init__(self, indexes, variables, index_type)
1433 unmatched_keys = set(indexes) ^ set(variables)
1434 if unmatched_keys:
-> 1435 raise ValueError(
1436 f""unmatched keys found in indexes and variables: {unmatched_keys}""
1437 )
1439 if any(not isinstance(idx, index_type) for idx in indexes.values()):
1440 index_type_str = f""{index_type.__module__}.{index_type.__name__}""
ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}
This seems related to PR #7368.
The reason for the title of this issue is that in both versions, I now realize the `da.assign_coords(time=mulind)` prints as:
```
dask.array, shape=(730,), dtype=int64, chunksize=(50,), chunktype=numpy.ndarray>
Coordinates:
* time (time) object MultiIndex
```
Something's fishy, because the two ""sub"" indexes are not showing.
And indeed, with the current master, I can get this to work by doing (again changing the last line):
```python
da2 = xr.DataArray(da.data, coords=xr.Coordinates.from_pandas_multiindex(mulind, 'time'))
da2.chunk(time=50).unstack('time')
```
But it seems a bit odd to me that we need to reconstruct the DataArray to replace its coordinate with a ""MultiIndex"" one.
Thus, my questions are:
1. How does one properly _override_ a coordinate by a MultiIndex ? Is there a way to use `assign_coords` ? If not, then this issue would become a feature request.
2. Is this a regression ? Or was I just ""lucky"" before ?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8039/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue