home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where author_association = "MEMBER" and issue = 375126758 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • rabernat 4
  • shoyer 4
  • fujiisoup 4
  • dcherian 1

issue 1

  • Multi-dimensional binning/resampling/coarsening · 13 ✖

author_association 1

  • MEMBER · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
447545224 https://github.com/pydata/xarray/issues/2525#issuecomment-447545224 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQ0NzU0NTIyNA== fujiisoup 6815844 2018-12-15T07:28:13Z 2018-12-15T07:28:13Z MEMBER

Thinking its API. I like rolling-like API. One in my mind is python ds.coarsen(x=2, y=2, side='left', trim_excess=True).mean() To apply a customized callable other than np.mean to a particular coordinate, it would probably be python ds.coarsen(x=2, y=2, side='left', trim_excess=True).mean(coordinate_apply={'surface_area': np.sum})

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
439766587 https://github.com/pydata/xarray/issues/2525#issuecomment-439766587 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzOTc2NjU4Nw== rabernat 1197350 2018-11-19T04:13:37Z 2018-11-19T04:13:37Z MEMBER

What would the coordinates look like?

  1. apply func also for coordinate
  2. always apply mean to coordinate

If I think about my applications, I would probably always want to apply mean to dimension coordinates, but would like to be able to choose for non-dimension coordinates.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
435272976 https://github.com/pydata/xarray/issues/2525#issuecomment-435272976 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNTI3Mjk3Ng== dcherian 2448579 2018-11-02T05:11:36Z 2018-11-02T05:11:36Z MEMBER

I like coarsen because it's a verb like resample, groupby.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
435268965 https://github.com/pydata/xarray/issues/2525#issuecomment-435268965 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNTI2ODk2NQ== fujiisoup 6815844 2018-11-02T04:37:35Z 2018-11-02T04:37:35Z MEMBER

+1 for block

What would the coordinates look like? 1. apply func also for coordinate 2. always apply mean to coordinate

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
435213658 https://github.com/pydata/xarray/issues/2525#issuecomment-435213658 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNTIxMzY1OA== shoyer 1217238 2018-11-01T22:51:55Z 2018-11-01T22:51:55Z MEMBER

skimage implements block_reduce via the view_as_blocks utility function: https://github.com/scikit-image/scikit-image/blob/62e29cd89dc858d8fb9d3578034a2f456f298ed3/skimage/util/shape.py#L9-L103

But given that it doesn't actually duplicate any elements and needs a C-order array to work, I think it's actually just equivalent to use use reshape + transpose, e.g., B = A.reshape(4, 1, 2, 2, 3, 2).transpose([0, 2, 4, 1, 3, 5]) reproduces skimage.util.view_as_blocks(A, (1, 2, 2)) from the docstring example.

So the super-simple version of block-reduce looks like: python def block_reduce(image, block_size, func=np.sum): # TODO: input validation # TODO: consider copying padding from skimage blocked_shape = [] for existing_size, block_size in zip(image.shape, block_size): blocked_shape.extend([existing_size // block_size, block_size]) blocked = np.reshape(image, tuple(blocked_shape)) return func(blocked, axis=tuple(range(1, blocked.ndim, 2)))

This would work on dask arrays out of the box but it's probably worth benchmarking whether you'd get better performance doing the operation chunk-wise (e.g., with map_blocks).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
435192382 https://github.com/pydata/xarray/issues/2525#issuecomment-435192382 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNTE5MjM4Mg== shoyer 1217238 2018-11-01T21:24:15Z 2018-11-01T21:24:15Z MEMBER

OK, so maybe da.block({'lat': 2, 'lon': 2}).mean() would be a good way to spell this, if that's not too confusing with .chunk()? Other possible method names: groupby_block, blocked?

We could call this something like coarsen() or block_reduce() with a how='mean' or maybe func=mean argument, but I like the consistency with resample/rolling/groupby.

We can save the full coordinate based version for a later addition to .resample()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434705757 https://github.com/pydata/xarray/issues/2525#issuecomment-434705757 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDcwNTc1Nw== shoyer 1217238 2018-10-31T14:22:07Z 2018-10-31T14:22:07Z MEMBER

block_reduce from skimage is indeed a small function using strides/reshape, if I remember correctly. We should certainly copy or implement it ourselves rather than adding an skimage dependency. On Wed, Oct 31, 2018 at 12:36 AM Keisuke Fujii notifications@github.com wrote:

block_reduce sounds nice, but I am a little hesitating to add a soft-dependence of scikit-image only for this function... It is using the strid trick, as we are doing in rolling.construct. Maybe we can implement it by ourselves.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2525#issuecomment-434589377, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1pyQW3C2CLJjQBMn49CnIOo5b3P0ks5uqVMJgaJpZM4X_q9a .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434589377 https://github.com/pydata/xarray/issues/2525#issuecomment-434589377 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDU4OTM3Nw== fujiisoup 6815844 2018-10-31T07:36:41Z 2018-10-31T07:36:41Z MEMBER

block_reduce sounds nice, but I am a little hesitating to add a soft-dependence of scikit-image only for this function... It is using the strid trick, as we are doing in rolling.construct. Maybe we can implement it by ourselves.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434480457 https://github.com/pydata/xarray/issues/2525#issuecomment-434480457 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDQ4MDQ1Nw== rabernat 1197350 2018-10-30T21:41:17Z 2018-10-30T21:41:25Z MEMBER

I would lean towards a coordinate based representation since it's a little more usable/certain to be correct.

I feel that this could become too complex in the case of irregularly spaced coordinates. I slightly favor the index-based approach (as in my function above), which one calls like python aggregate_da(da, {'lat': 2, 'lon': 2})

If we do that, we can just use scikit-image's block_reduce function, which is vectorized and works great with apply_ufunc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434477550 https://github.com/pydata/xarray/issues/2525#issuecomment-434477550 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDQ3NzU1MA== shoyer 1217238 2018-10-30T21:31:18Z 2018-10-30T21:31:18Z MEMBER

I'm +1 for adding this feature in some form as well.

From an API perspective, should the window size be specified in term of integer or coordinates? - rolling is integer based - resample is coordinate based

I would lean towards a coordinate based representation since it's a little more usable/certain to be correct. It might even make sense to still call this resample, though obviously the time options would no longer apply. Also, we would almost certainly want a faster underlying implementation than what we currently use for resample().

The API for resampling to a 2x2 degree latitude/longittude grid could look something like: da.resample(lat=2, lon=2).mean()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434294356 https://github.com/pydata/xarray/issues/2525#issuecomment-434294356 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDI5NDM1Ng== rabernat 1197350 2018-10-30T13:10:16Z 2018-10-30T13:10:39Z MEMBER

FYI, I do this often in my work with this sort of function:

python import xarray as xr from skimage.measure import block_reduce def aggregate_da(da, agg_dims, suf='_agg'): input_core_dims = list(agg_dims) n_agg = len(input_core_dims) core_block_size = tuple([agg_dims[k] for k in input_core_dims]) block_size = (da.ndim - n_agg)*(1,) + core_block_size output_core_dims = [dim + suf for dim in input_core_dims] output_sizes = {(dim + suf): da.shape[da.get_axis_num(dim)]//agg_dims[dim] for dim in input_core_dims} output_dtypes = da.dtype da_out = xr.apply_ufunc(block_reduce, da, kwargs={'block_size': block_size}, input_core_dims=[input_core_dims], output_core_dims=[output_core_dims], output_sizes=output_sizes, output_dtypes=[output_dtypes], dask='parallelized') for dim in input_core_dims: new_coord = block_reduce(da[dim].data, (agg_dims[dim],), func=np.mean) da_out.coords[dim + suf] = (dim + suf, new_coord) return da_out

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434294114 https://github.com/pydata/xarray/issues/2525#issuecomment-434294114 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDI5NDExNA== rabernat 1197350 2018-10-30T13:09:25Z 2018-10-30T13:09:25Z MEMBER

This is being discussed in #1192 under a different name.

Yes, we need this feature.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434261896 https://github.com/pydata/xarray/issues/2525#issuecomment-434261896 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDI2MTg5Ng== fujiisoup 6815844 2018-10-30T11:17:17Z 2018-10-30T11:17:17Z MEMBER

This is from a thread at SO.

Does anyone have an opinion if we add a bin (or rolling_bin) method to compute the binning? For the above example, currently we need to do python dsa.rolling(x=2).construct('tmp').isel(x=slice(1, None, 2)).mean('tmp') which is a little complex.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 535.666ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows