html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2852#issuecomment-1100985429,https://api.github.com/repos/pydata/xarray/issues/2852,1100985429,IC_kwDOAMm_X85Bn7RV,26384082,2022-04-18T00:43:46Z,2022-04-18T00:43:46Z,NONE,"In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,425320466
https://github.com/pydata/xarray/issues/2852#issuecomment-652898319,https://api.github.com/repos/pydata/xarray/issues/2852,652898319,MDEyOklzc3VlQ29tbWVudDY1Mjg5ODMxOQ==,20053498,2020-07-02T09:29:32Z,2020-07-02T09:29:55Z,NONE,"I'm going to share a code snippet that might be useful to people reading this issue. I wanted to group my data by month and year, and take the mean for each group. 

I did not want to use `resample`, as I wanted the dimensions to be ('month', 'year'), rather than ('time'). The obvious way of doing this is to use a pd.MultiIndex to create a 'year_month' stacked coordinate: I found this did not have good perfomance.

My solution was to use `xr.apply_ufunc`, as suggested above. I think it should be OK with dask chunked data, provided it is not chunked in time.

Here is the code:

```
def _grouped_mean(
            data: np.ndarray,
            months: np.ndarray,
            years: np.ndarray) -> np.ndarray:
        """"""similar to grouping year_month MultiIndex, but faster.

        Should be used wrapped by _wrapped_grouped_mean""""""
        unique_months = np.sort(np.unique(months))
        unique_years = np.sort(np.unique(years))
        old_shape = list(data.shape)
        new_shape = old_shape[:-1]
        new_shape.append(unique_months.shape[0])
        new_shape.append(unique_years.shape[0])

        output = np.zeros(new_shape)

        for i_month, j_year in np.ndindex(output.shape[2:]):
            indices = np.intersect1d(
                (months == unique_months[i_month]).nonzero(),
                (years == unique_years[j_year]).nonzero()
            )

            output[:, :, i_month, j_year] =\
                np.mean(data[:, :, indices], axis=-1)

        return output

def _wrapped_grouped_mean(da: xr.DataArray) -> xr.DataArray:
        """"""similar to grouping by a year_month MultiIndex, but faster.

        Wraps a numpy-style function with xr.apply_ufunc
        """"""
        Y = xr.apply_ufunc(
            _grouped_mean,
            da,
            da.time.dt.month,
            da.time.dt.year,
            input_core_dims=[['lat', 'lon', 'time'], ['time'], ['time']],
            output_core_dims=[['lat', 'lon', 'month', 'year']],
        )
        Y = Y.assign_coords(
            {'month': np.sort(np.unique(da.time.dt.month)),
             'year': np.sort(np.unique(da.time.dt.year))})
        return Y
```","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,425320466