html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2852#issuecomment-1100985429,https://api.github.com/repos/pydata/xarray/issues/2852,1100985429,IC_kwDOAMm_X85Bn7RV,26384082,2022-04-18T00:43:46Z,2022-04-18T00:43:46Z,NONE,"In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,425320466 https://github.com/pydata/xarray/issues/2852#issuecomment-652898319,https://api.github.com/repos/pydata/xarray/issues/2852,652898319,MDEyOklzc3VlQ29tbWVudDY1Mjg5ODMxOQ==,20053498,2020-07-02T09:29:32Z,2020-07-02T09:29:55Z,NONE,"I'm going to share a code snippet that might be useful to people reading this issue. I wanted to group my data by month and year, and take the mean for each group. I did not want to use `resample`, as I wanted the dimensions to be ('month', 'year'), rather than ('time'). The obvious way of doing this is to use a pd.MultiIndex to create a 'year_month' stacked coordinate: I found this did not have good perfomance. My solution was to use `xr.apply_ufunc`, as suggested above. I think it should be OK with dask chunked data, provided it is not chunked in time. Here is the code: ``` def _grouped_mean( data: np.ndarray, months: np.ndarray, years: np.ndarray) -> np.ndarray: """"""similar to grouping year_month MultiIndex, but faster. Should be used wrapped by _wrapped_grouped_mean"""""" unique_months = np.sort(np.unique(months)) unique_years = np.sort(np.unique(years)) old_shape = list(data.shape) new_shape = old_shape[:-1] new_shape.append(unique_months.shape[0]) new_shape.append(unique_years.shape[0]) output = np.zeros(new_shape) for i_month, j_year in np.ndindex(output.shape[2:]): indices = np.intersect1d( (months == unique_months[i_month]).nonzero(), (years == unique_years[j_year]).nonzero() ) output[:, :, i_month, j_year] =\ np.mean(data[:, :, indices], axis=-1) return output def _wrapped_grouped_mean(da: xr.DataArray) -> xr.DataArray: """"""similar to grouping by a year_month MultiIndex, but faster. Wraps a numpy-style function with xr.apply_ufunc """""" Y = xr.apply_ufunc( _grouped_mean, da, da.time.dt.month, da.time.dt.year, input_core_dims=[['lat', 'lon', 'time'], ['time'], ['time']], output_core_dims=[['lat', 'lon', 'month', 'year']], ) Y = Y.assign_coords( {'month': np.sort(np.unique(da.time.dt.month)), 'year': np.sort(np.unique(da.time.dt.year))}) return Y ```","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,425320466