id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 603309899,MDU6SXNzdWU2MDMzMDk4OTk=,3985,xarray=1.15.1 regression: Groupby drop multi-index,8419157,closed,0,,,4,2020-04-20T15:05:51Z,2021-02-16T15:59:46Z,2021-02-16T15:59:46Z,NONE,,,,"I have written a function `process_stacked_groupby` that stack all but one dimension of a dataset/dataarray and perform `groupby-apply-combine` on the stacked dimension. However, after upgrading to 0.15.1, the function cease to work. #### MCVE Code Sample ```python import xarray as xr # Dimensions N = xr.DataArray(np.arange(100), dims='N', name='N') reps = xr.DataArray(np.arange(5), dims='reps', name='reps') horizon = xr.DataArray([1, -1], dims='horizon', name='horizon') horizon.attrs = {'long_name': 'Horizonal', 'units': 'H'} vertical = xr.DataArray(np.arange(1, 4), dims='vertical', name='vertical') vertical.attrs = {'long_name': 'Vertical', 'units': 'V'} # Variables x = xr.DataArray(np.random.randn(len(N), len(reps), len(horizon), len(vertical)), dims=['N', 'reps', 'horizon', 'vertical'], name='x') y = x * 0.1 y.name = 'y' # Merge x, y data = xr.merge([x, y]) # Assign coords data = data.assign_coords(reps=reps, vertical=vertical, horizon=horizon) # Function that stack all but one diensions and groupby over the stacked dimension. def process_stacked_groupby(ds, dim, func, *args): # Function to apply to stacked groupby def apply_fn(ds, dim, func, *args): # Get groupby dim groupby_dim = list(ds.dims) groupby_dim.remove(dim) groupby_var = ds[groupby_dim] # Unstack groupby dim ds2 = ds.unstack(groupby_dim).squeeze() # perform function ds3 = func(ds2, *args) # Add mulit-index groupby_var to result ds3 = (ds3 .reset_coords(drop=True) .assign_coords(groupby_var) .expand_dims(groupby_dim) ) return ds3 # Get list of dimensions groupby_dims = list(ds.dims) # Remove dimension not grouped groupby_dims.remove(dim) # Stack all but one dimensions stack_dim = '_'.join(groupby_dims) ds2 = ds.stack({stack_dim: groupby_dims}) # Groupby and apply ds2 = ds2.groupby(stack_dim, squeeze=False).map(apply_fn, args=(dim, func, *args)) # Unstack ds2 = ds2.unstack(stack_dim) # Restore attrs for dim in groupby_dims: ds2[dim].attrs = ds[dim].attrs return ds2 # Function to apply on groupby def fn(ds): return ds # Run groupby with applied function data.pipe(process_stacked_groupby, 'N', fn) ``` #### Expected Output Prior to xarray=0.15.0, the above code produce a result that I wanted. The function should be able to 1. stack chosen dimensions 2. groupby the stacked dimension 3. apply a function on each group a. The function actually passes along another function with unstacked group coord b. Add multi-index stacked group coord back to the results of this function 4. combine the groups 5. Unstack stacked dimension #### Problem Description After upgrading to 0.15.1, the above code stopped working. The error occurred at the line ``` # Unstack ds2 = ds2.unstack(stack_dim) ``` with `ValueError: cannot unstack dimensions that do not have a MultiIndex: ['horizon_reps_vertical']`. This is on 5th step where the resulting combined object was found not to contain any multi-index. Somewhere in the 4th step, the combination of groups have lost the multi-index stacked dimension. #### Versions 0.15.1","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3985/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 748229907,MDU6SXNzdWU3NDgyMjk5MDc=,4598,Calling pd.to_datetime on cftime variable,17162724,closed,0,,,4,2020-11-22T12:14:27Z,2021-02-16T02:42:35Z,2021-02-16T02:42:35Z,CONTRIBUTOR,,,,"It would be nice to be able to convert cftime variables to pandas datetime to utilize the functionality there. I understand this is an upstream issue as pandas probably isn't aware of cftime. However, i'm curious if a method could be added to cftime such as .to_dataframe(). I've found `pd.to_datetime(np.datetime64(date_cf))` is the best way to do this currently. ``` import xarray as xr import numpy as np import pandas as pd date_str = '2020-01-01' date_np = np.datetime64(date_str) >>> date_np numpy.datetime64('2020-01-01') date_pd = pd.to_datetime(date_np) >>> date_pd Timestamp('2020-01-01 00:00:00') date_cf = xr.cftime_range(start=date_str, periods=1)[0] pd.to_datetime(date_cf) >>> pd.to_datetime(date_cf) Traceback (most recent call last): File """", line 1, in File ""/home/ray/local/bin/anaconda3/envs/a/lib/python3.8/site-packages/pandas/core/tools/datetimes.py"", line 830, in to_datetime result = convert_listlike(np.array([arg]), format)[0] File ""/home/ray/local/bin/anaconda3/envs/a/lib/python3.8/site-packages/pandas/core/tools/datetimes.py"", line 459, in _convert_listlike_datetimes result, tz_parsed = objects_to_datetime64ns( File ""/home/ray/local/bin/anaconda3/envs/a/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py"", line 2044, in objects_to_datetime64ns result, tz_parsed = tslib.array_to_datetime( File ""pandas/_libs/tslib.pyx"", line 352, in pandas._libs.tslib.array_to_datetime File ""pandas/_libs/tslib.pyx"", line 579, in pandas._libs.tslib.array_to_datetime File ""pandas/_libs/tslib.pyx"", line 718, in pandas._libs.tslib.array_to_datetime_object File ""pandas/_libs/tslib.pyx"", line 552, in pandas._libs.tslib.array_to_datetime TypeError: is not convertible to datetime ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4598/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue