html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2525#issuecomment-447545224,https://api.github.com/repos/pydata/xarray/issues/2525,447545224,MDEyOklzc3VlQ29tbWVudDQ0NzU0NTIyNA==,6815844,2018-12-15T07:28:13Z,2018-12-15T07:28:13Z,MEMBER,"Thinking its API.
I like `rolling`-like API. One in my mind is
```python
ds.coarsen(x=2, y=2, side='left', trim_excess=True).mean()
```
To apply a customized callable other than `np.mean` to a particular coordinate, it would probably be
```python
ds.coarsen(x=2, y=2, side='left', trim_excess=True).mean(coordinate_apply={'surface_area': np.sum})
```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-439892007,https://api.github.com/repos/pydata/xarray/issues/2525,439892007,MDEyOklzc3VlQ29tbWVudDQzOTg5MjAwNw==,14314623,2018-11-19T13:26:45Z,2018-11-19T13:26:45Z,CONTRIBUTOR,"I think mean would be a good default (thinking about cell center dimensions like longitude and latitude) but I would very much like it if other functions could be specified e. g. for grid face dimensions (where min and max would be more appropriate) and other coordinates like surface area (where sum would be the most appropriate function).
> On Nov 18, 2018, at 11:13 PM, Ryan Abernathey wrote:
>
> What would the coordinates look like?
>
> apply func also for coordinate
> always apply mean to coordinate
> If I think about my applications, I would probably always want to apply mean to dimension coordinates, but would like to be able to choose for non-dimension coordinates.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub, or mute the thread.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-439766587,https://api.github.com/repos/pydata/xarray/issues/2525,439766587,MDEyOklzc3VlQ29tbWVudDQzOTc2NjU4Nw==,1197350,2018-11-19T04:13:37Z,2018-11-19T04:13:37Z,MEMBER,"> What would the coordinates look like?
>
> 1. apply `func` also for coordinate
> 2. always apply `mean` to coordinate
If I think about my applications, I would probably always want to apply `mean` to dimension coordinates, but would like to be able to choose for non-dimension coordinates.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-435272976,https://api.github.com/repos/pydata/xarray/issues/2525,435272976,MDEyOklzc3VlQ29tbWVudDQzNTI3Mjk3Ng==,2448579,2018-11-02T05:11:36Z,2018-11-02T05:11:36Z,MEMBER,"I like `coarsen` because it's a verb like resample, groupby.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-435268965,https://api.github.com/repos/pydata/xarray/issues/2525,435268965,MDEyOklzc3VlQ29tbWVudDQzNTI2ODk2NQ==,6815844,2018-11-02T04:37:35Z,2018-11-02T04:37:35Z,MEMBER,"+1 for `block`
What would the coordinates look like?
1. apply `func` also for coordinate
2. always apply `mean` to coordinate
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-435213658,https://api.github.com/repos/pydata/xarray/issues/2525,435213658,MDEyOklzc3VlQ29tbWVudDQzNTIxMzY1OA==,1217238,2018-11-01T22:51:55Z,2018-11-01T22:51:55Z,MEMBER,"skimage implements `block_reduce` via the `view_as_blocks` utility function: https://github.com/scikit-image/scikit-image/blob/62e29cd89dc858d8fb9d3578034a2f456f298ed3/skimage/util/shape.py#L9-L103
But given that it doesn't actually duplicate any elements and needs a C-order array to work, I think it's actually just equivalent to use use `reshape` + `transpose`, e.g., `B = A.reshape(4, 1, 2, 2, 3, 2).transpose([0, 2, 4, 1, 3, 5])` reproduces `skimage.util.view_as_blocks(A, (1, 2, 2))` from the docstring example.
So the super-simple version of block-reduce looks like:
```python
def block_reduce(image, block_size, func=np.sum):
# TODO: input validation
# TODO: consider copying padding from skimage
blocked_shape = []
for existing_size, block_size in zip(image.shape, block_size):
blocked_shape.extend([existing_size // block_size, block_size])
blocked = np.reshape(image, tuple(blocked_shape))
return func(blocked, axis=tuple(range(1, blocked.ndim, 2)))
```
This would work on dask arrays out of the box but it's probably worth benchmarking whether you'd get better performance doing the operation chunk-wise (e.g., with `map_blocks`).","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-435201618,https://api.github.com/repos/pydata/xarray/issues/2525,435201618,MDEyOklzc3VlQ29tbWVudDQzNTIwMTYxOA==,14314623,2018-11-01T21:59:19Z,2018-11-01T21:59:19Z,CONTRIBUTOR,"My favorite would be `da.coarsen({'lat': 2, 'lon': 2}).mean()`, but all the others sound reasonable to me.
Also +1 for consistency with resample/rolling/groupby.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-435192382,https://api.github.com/repos/pydata/xarray/issues/2525,435192382,MDEyOklzc3VlQ29tbWVudDQzNTE5MjM4Mg==,1217238,2018-11-01T21:24:15Z,2018-11-01T21:24:15Z,MEMBER,"OK, so maybe `da.block({'lat': 2, 'lon': 2}).mean()` would be a good way to spell this, if that's not too confusing with `.chunk()`? Other possible method names: `groupby_block`, `blocked`?
We could call this something like `coarsen()` or `block_reduce()` with a `how='mean'` or maybe `func=mean` argument, but I like the consistency with resample/rolling/groupby.
We can save the full coordinate based version for a later addition to `.resample()`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-434705757,https://api.github.com/repos/pydata/xarray/issues/2525,434705757,MDEyOklzc3VlQ29tbWVudDQzNDcwNTc1Nw==,1217238,2018-10-31T14:22:07Z,2018-10-31T14:22:07Z,MEMBER,"block_reduce from skimage is indeed a small function using strides/reshape,
if I remember correctly. We should certainly copy or implement it ourselves
rather than adding an skimage dependency.
On Wed, Oct 31, 2018 at 12:36 AM Keisuke Fujii
wrote:
> block_reduce sounds nice, but I am a little hesitating to add a
> soft-dependence of scikit-image only for this function...
> It is using the strid trick, as we are doing in rolling.construct. Maybe
> we can implement it by ourselves.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-434589377,https://api.github.com/repos/pydata/xarray/issues/2525,434589377,MDEyOklzc3VlQ29tbWVudDQzNDU4OTM3Nw==,6815844,2018-10-31T07:36:41Z,2018-10-31T07:36:41Z,MEMBER,"`block_reduce` sounds nice, but I am a little hesitating to add a soft-dependence of scikit-image only for this function...
It is using the strid trick, as we are doing in `rolling.construct`. Maybe we can implement it by ourselves.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-434531970,https://api.github.com/repos/pydata/xarray/issues/2525,434531970,MDEyOklzc3VlQ29tbWVudDQzNDUzMTk3MA==,14314623,2018-10-31T01:46:19Z,2018-10-31T01:46:19Z,CONTRIBUTOR,"I agree with @rabernat, and favor the index based approach.
For regular lon-lat grids its easy enough to implement a weighted mean, and for irregular spaced grids and other exotic grids the coordinate based approach might lead to errors. To me the resample API above might suggest to some users that some proper regridding (a la xESMF) onto a regular lat/lon grid is performed.
‚block_reduce‘ sounds good to me and sounds appropriate for non-dask arrays. Does anyone have experience how ‚dask.coarsen‘ compares performance wise? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-434480457,https://api.github.com/repos/pydata/xarray/issues/2525,434480457,MDEyOklzc3VlQ29tbWVudDQzNDQ4MDQ1Nw==,1197350,2018-10-30T21:41:17Z,2018-10-30T21:41:25Z,MEMBER,"> I would lean towards a coordinate based representation since it's a little more usable/certain to be correct.
I feel that this could become too complex in the case of irregularly spaced coordinates. I slightly favor the index-based approach (as in my function above), which one calls like
```python
aggregate_da(da, {'lat': 2, 'lon': 2})
```
If we do that, we can just use scikit-image's `block_reduce` function, which is vectorized and works great with `apply_ufunc`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-434477550,https://api.github.com/repos/pydata/xarray/issues/2525,434477550,MDEyOklzc3VlQ29tbWVudDQzNDQ3NzU1MA==,1217238,2018-10-30T21:31:18Z,2018-10-30T21:31:18Z,MEMBER,"I'm +1 for adding this feature in some form as well.
From an API perspective, should the window size be specified in term of integer or coordinates?
- `rolling` is integer based
- `resample` is coordinate based
I would lean towards a coordinate based representation since it's a little more usable/certain to be correct. It might even make sense to still call this `resample`, though obviously the time options would no longer apply. Also, we would almost certainly want a faster underlying implementation than what we currently use for `resample()`.
The API for resampling to a 2x2 degree latitude/longittude grid could look something like: `da.resample(lat=2, lon=2).mean()`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-434294356,https://api.github.com/repos/pydata/xarray/issues/2525,434294356,MDEyOklzc3VlQ29tbWVudDQzNDI5NDM1Ng==,1197350,2018-10-30T13:10:16Z,2018-10-30T13:10:39Z,MEMBER,"FYI, I do this often in my work with this sort of function:
```python
import xarray as xr
from skimage.measure import block_reduce
def aggregate_da(da, agg_dims, suf='_agg'):
input_core_dims = list(agg_dims)
n_agg = len(input_core_dims)
core_block_size = tuple([agg_dims[k] for k in input_core_dims])
block_size = (da.ndim - n_agg)*(1,) + core_block_size
output_core_dims = [dim + suf for dim in input_core_dims]
output_sizes = {(dim + suf): da.shape[da.get_axis_num(dim)]//agg_dims[dim]
for dim in input_core_dims}
output_dtypes = da.dtype
da_out = xr.apply_ufunc(block_reduce, da, kwargs={'block_size': block_size},
input_core_dims=[input_core_dims],
output_core_dims=[output_core_dims],
output_sizes=output_sizes,
output_dtypes=[output_dtypes],
dask='parallelized')
for dim in input_core_dims:
new_coord = block_reduce(da[dim].data, (agg_dims[dim],), func=np.mean)
da_out.coords[dim + suf] = (dim + suf, new_coord)
return da_out
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-434294114,https://api.github.com/repos/pydata/xarray/issues/2525,434294114,MDEyOklzc3VlQ29tbWVudDQzNDI5NDExNA==,1197350,2018-10-30T13:09:25Z,2018-10-30T13:09:25Z,MEMBER,"This is being discussed in #1192 under a different name.
Yes, we need this feature.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758
https://github.com/pydata/xarray/issues/2525#issuecomment-434261896,https://api.github.com/repos/pydata/xarray/issues/2525,434261896,MDEyOklzc3VlQ29tbWVudDQzNDI2MTg5Ng==,6815844,2018-10-30T11:17:17Z,2018-10-30T11:17:17Z,MEMBER,"This is from a [thread at SO](https://stackoverflow.com/questions/52886703/xarray-multidimensional-binning-array-reduction-on-sample-dataset-of-4-x4-to/52981916?noredirect=1#comment93001872_52981916).
Does anyone have an opinion if we add a `bin` (or `rolling_bin`) method to compute the binning?
For the above example, currently we need to do
```python
dsa.rolling(x=2).construct('tmp').isel(x=slice(1, None, 2)).mean('tmp')
```
which is a little complex.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,375126758