html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2525#issuecomment-435213658,https://api.github.com/repos/pydata/xarray/issues/2525,435213658,MDEyOklzc3VlQ29tbWVudDQzNTIxMzY1OA==,1217238,2018-11-01T22:51:55Z,2018-11-01T22:51:55Z,MEMBER,"skimage implements `block_reduce` via the `view_as_blocks` utility function: https://github.com/scikit-image/scikit-image/blob/62e29cd89dc858d8fb9d3578034a2f456f298ed3/skimage/util/shape.py#L9-L103 But given that it doesn't actually duplicate any elements and needs a C-order array to work, I think it's actually just equivalent to use use `reshape` + `transpose`, e.g., `B = A.reshape(4, 1, 2, 2, 3, 2).transpose([0, 2, 4, 1, 3, 5])` reproduces `skimage.util.view_as_blocks(A, (1, 2, 2))` from the docstring example. So the super-simple version of block-reduce looks like: ```python def block_reduce(image, block_size, func=np.sum): # TODO: input validation # TODO: consider copying padding from skimage blocked_shape = [] for existing_size, block_size in zip(image.shape, block_size): blocked_shape.extend([existing_size // block_size, block_size]) blocked = np.reshape(image, tuple(blocked_shape)) return func(blocked, axis=tuple(range(1, blocked.ndim, 2))) ``` This would work on dask arrays out of the box but it's probably worth benchmarking whether you'd get better performance doing the operation chunk-wise (e.g., with `map_blocks`).","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758 https://github.com/pydata/xarray/issues/2525#issuecomment-435192382,https://api.github.com/repos/pydata/xarray/issues/2525,435192382,MDEyOklzc3VlQ29tbWVudDQzNTE5MjM4Mg==,1217238,2018-11-01T21:24:15Z,2018-11-01T21:24:15Z,MEMBER,"OK, so maybe `da.block({'lat': 2, 'lon': 2}).mean()` would be a good way to spell this, if that's not too confusing with `.chunk()`? Other possible method names: `groupby_block`, `blocked`? We could call this something like `coarsen()` or `block_reduce()` with a `how='mean'` or maybe `func=mean` argument, but I like the consistency with resample/rolling/groupby. We can save the full coordinate based version for a later addition to `.resample()`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758 https://github.com/pydata/xarray/issues/2525#issuecomment-434705757,https://api.github.com/repos/pydata/xarray/issues/2525,434705757,MDEyOklzc3VlQ29tbWVudDQzNDcwNTc1Nw==,1217238,2018-10-31T14:22:07Z,2018-10-31T14:22:07Z,MEMBER,"block_reduce from skimage is indeed a small function using strides/reshape, if I remember correctly. We should certainly copy or implement it ourselves rather than adding an skimage dependency. On Wed, Oct 31, 2018 at 12:36 AM Keisuke Fujii wrote: > block_reduce sounds nice, but I am a little hesitating to add a > soft-dependence of scikit-image only for this function... > It is using the strid trick, as we are doing in rolling.construct. Maybe > we can implement it by ourselves. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > , or mute > the thread > > . > ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758 https://github.com/pydata/xarray/issues/2525#issuecomment-434477550,https://api.github.com/repos/pydata/xarray/issues/2525,434477550,MDEyOklzc3VlQ29tbWVudDQzNDQ3NzU1MA==,1217238,2018-10-30T21:31:18Z,2018-10-30T21:31:18Z,MEMBER,"I'm +1 for adding this feature in some form as well. From an API perspective, should the window size be specified in term of integer or coordinates? - `rolling` is integer based - `resample` is coordinate based I would lean towards a coordinate based representation since it's a little more usable/certain to be correct. It might even make sense to still call this `resample`, though obviously the time options would no longer apply. Also, we would almost certainly want a faster underlying implementation than what we currently use for `resample()`. The API for resampling to a 2x2 degree latitude/longittude grid could look something like: `da.resample(lat=2, lon=2).mean()`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758