html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2852#issuecomment-1100985429,https://api.github.com/repos/pydata/xarray/issues/2852,1100985429,IC_kwDOAMm_X85Bn7RV,26384082,2022-04-18T00:43:46Z,2022-04-18T00:43:46Z,NONE,"In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,425320466
https://github.com/pydata/xarray/issues/2852#issuecomment-652898319,https://api.github.com/repos/pydata/xarray/issues/2852,652898319,MDEyOklzc3VlQ29tbWVudDY1Mjg5ODMxOQ==,20053498,2020-07-02T09:29:32Z,2020-07-02T09:29:55Z,NONE,"I'm going to share a code snippet that might be useful to people reading this issue. I wanted to group my data by month and year, and take the mean for each group.
I did not want to use `resample`, as I wanted the dimensions to be ('month', 'year'), rather than ('time'). The obvious way of doing this is to use a pd.MultiIndex to create a 'year_month' stacked coordinate: I found this did not have good perfomance.
My solution was to use `xr.apply_ufunc`, as suggested above. I think it should be OK with dask chunked data, provided it is not chunked in time.
Here is the code:
```
def _grouped_mean(
data: np.ndarray,
months: np.ndarray,
years: np.ndarray) -> np.ndarray:
""""""similar to grouping year_month MultiIndex, but faster.
Should be used wrapped by _wrapped_grouped_mean""""""
unique_months = np.sort(np.unique(months))
unique_years = np.sort(np.unique(years))
old_shape = list(data.shape)
new_shape = old_shape[:-1]
new_shape.append(unique_months.shape[0])
new_shape.append(unique_years.shape[0])
output = np.zeros(new_shape)
for i_month, j_year in np.ndindex(output.shape[2:]):
indices = np.intersect1d(
(months == unique_months[i_month]).nonzero(),
(years == unique_years[j_year]).nonzero()
)
output[:, :, i_month, j_year] =\
np.mean(data[:, :, indices], axis=-1)
return output
def _wrapped_grouped_mean(da: xr.DataArray) -> xr.DataArray:
""""""similar to grouping by a year_month MultiIndex, but faster.
Wraps a numpy-style function with xr.apply_ufunc
""""""
Y = xr.apply_ufunc(
_grouped_mean,
da,
da.time.dt.month,
da.time.dt.year,
input_core_dims=[['lat', 'lon', 'time'], ['time'], ['time']],
output_core_dims=[['lat', 'lon', 'month', 'year']],
)
Y = Y.assign_coords(
{'month': np.sort(np.unique(da.time.dt.month)),
'year': np.sort(np.unique(da.time.dt.year))})
return Y
```","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,425320466