home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

16 rows where issue = 375126758 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • rabernat 4
  • shoyer 4
  • fujiisoup 4
  • jbusecke 3
  • dcherian 1

author_association 2

  • MEMBER 13
  • CONTRIBUTOR 3

issue 1

  • Multi-dimensional binning/resampling/coarsening · 16 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
447545224 https://github.com/pydata/xarray/issues/2525#issuecomment-447545224 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQ0NzU0NTIyNA== fujiisoup 6815844 2018-12-15T07:28:13Z 2018-12-15T07:28:13Z MEMBER

Thinking its API. I like rolling-like API. One in my mind is python ds.coarsen(x=2, y=2, side='left', trim_excess=True).mean() To apply a customized callable other than np.mean to a particular coordinate, it would probably be python ds.coarsen(x=2, y=2, side='left', trim_excess=True).mean(coordinate_apply={'surface_area': np.sum})

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
439892007 https://github.com/pydata/xarray/issues/2525#issuecomment-439892007 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzOTg5MjAwNw== jbusecke 14314623 2018-11-19T13:26:45Z 2018-11-19T13:26:45Z CONTRIBUTOR

I think mean would be a good default (thinking about cell center dimensions like longitude and latitude) but I would very much like it if other functions could be specified e. g. for grid face dimensions (where min and max would be more appropriate) and other coordinates like surface area (where sum would be the most appropriate function).

On Nov 18, 2018, at 11:13 PM, Ryan Abernathey notifications@github.com wrote:

What would the coordinates look like?

apply func also for coordinate always apply mean to coordinate If I think about my applications, I would probably always want to apply mean to dimension coordinates, but would like to be able to choose for non-dimension coordinates.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
439766587 https://github.com/pydata/xarray/issues/2525#issuecomment-439766587 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzOTc2NjU4Nw== rabernat 1197350 2018-11-19T04:13:37Z 2018-11-19T04:13:37Z MEMBER

What would the coordinates look like?

  1. apply func also for coordinate
  2. always apply mean to coordinate

If I think about my applications, I would probably always want to apply mean to dimension coordinates, but would like to be able to choose for non-dimension coordinates.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
435272976 https://github.com/pydata/xarray/issues/2525#issuecomment-435272976 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNTI3Mjk3Ng== dcherian 2448579 2018-11-02T05:11:36Z 2018-11-02T05:11:36Z MEMBER

I like coarsen because it's a verb like resample, groupby.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
435268965 https://github.com/pydata/xarray/issues/2525#issuecomment-435268965 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNTI2ODk2NQ== fujiisoup 6815844 2018-11-02T04:37:35Z 2018-11-02T04:37:35Z MEMBER

+1 for block

What would the coordinates look like? 1. apply func also for coordinate 2. always apply mean to coordinate

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
435213658 https://github.com/pydata/xarray/issues/2525#issuecomment-435213658 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNTIxMzY1OA== shoyer 1217238 2018-11-01T22:51:55Z 2018-11-01T22:51:55Z MEMBER

skimage implements block_reduce via the view_as_blocks utility function: https://github.com/scikit-image/scikit-image/blob/62e29cd89dc858d8fb9d3578034a2f456f298ed3/skimage/util/shape.py#L9-L103

But given that it doesn't actually duplicate any elements and needs a C-order array to work, I think it's actually just equivalent to use use reshape + transpose, e.g., B = A.reshape(4, 1, 2, 2, 3, 2).transpose([0, 2, 4, 1, 3, 5]) reproduces skimage.util.view_as_blocks(A, (1, 2, 2)) from the docstring example.

So the super-simple version of block-reduce looks like: python def block_reduce(image, block_size, func=np.sum): # TODO: input validation # TODO: consider copying padding from skimage blocked_shape = [] for existing_size, block_size in zip(image.shape, block_size): blocked_shape.extend([existing_size // block_size, block_size]) blocked = np.reshape(image, tuple(blocked_shape)) return func(blocked, axis=tuple(range(1, blocked.ndim, 2)))

This would work on dask arrays out of the box but it's probably worth benchmarking whether you'd get better performance doing the operation chunk-wise (e.g., with map_blocks).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
435201618 https://github.com/pydata/xarray/issues/2525#issuecomment-435201618 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNTIwMTYxOA== jbusecke 14314623 2018-11-01T21:59:19Z 2018-11-01T21:59:19Z CONTRIBUTOR

My favorite would be da.coarsen({'lat': 2, 'lon': 2}).mean(), but all the others sound reasonable to me. Also +1 for consistency with resample/rolling/groupby.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
435192382 https://github.com/pydata/xarray/issues/2525#issuecomment-435192382 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNTE5MjM4Mg== shoyer 1217238 2018-11-01T21:24:15Z 2018-11-01T21:24:15Z MEMBER

OK, so maybe da.block({'lat': 2, 'lon': 2}).mean() would be a good way to spell this, if that's not too confusing with .chunk()? Other possible method names: groupby_block, blocked?

We could call this something like coarsen() or block_reduce() with a how='mean' or maybe func=mean argument, but I like the consistency with resample/rolling/groupby.

We can save the full coordinate based version for a later addition to .resample()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434705757 https://github.com/pydata/xarray/issues/2525#issuecomment-434705757 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDcwNTc1Nw== shoyer 1217238 2018-10-31T14:22:07Z 2018-10-31T14:22:07Z MEMBER

block_reduce from skimage is indeed a small function using strides/reshape, if I remember correctly. We should certainly copy or implement it ourselves rather than adding an skimage dependency. On Wed, Oct 31, 2018 at 12:36 AM Keisuke Fujii notifications@github.com wrote:

block_reduce sounds nice, but I am a little hesitating to add a soft-dependence of scikit-image only for this function... It is using the strid trick, as we are doing in rolling.construct. Maybe we can implement it by ourselves.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2525#issuecomment-434589377, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1pyQW3C2CLJjQBMn49CnIOo5b3P0ks5uqVMJgaJpZM4X_q9a .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434589377 https://github.com/pydata/xarray/issues/2525#issuecomment-434589377 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDU4OTM3Nw== fujiisoup 6815844 2018-10-31T07:36:41Z 2018-10-31T07:36:41Z MEMBER

block_reduce sounds nice, but I am a little hesitating to add a soft-dependence of scikit-image only for this function... It is using the strid trick, as we are doing in rolling.construct. Maybe we can implement it by ourselves.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434531970 https://github.com/pydata/xarray/issues/2525#issuecomment-434531970 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDUzMTk3MA== jbusecke 14314623 2018-10-31T01:46:19Z 2018-10-31T01:46:19Z CONTRIBUTOR

I agree with @rabernat, and favor the index based approach. For regular lon-lat grids its easy enough to implement a weighted mean, and for irregular spaced grids and other exotic grids the coordinate based approach might lead to errors. To me the resample API above might suggest to some users that some proper regridding (a la xESMF) onto a regular lat/lon grid is performed.

‚block_reduce‘ sounds good to me and sounds appropriate for non-dask arrays. Does anyone have experience how ‚dask.coarsen‘ compares performance wise?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434480457 https://github.com/pydata/xarray/issues/2525#issuecomment-434480457 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDQ4MDQ1Nw== rabernat 1197350 2018-10-30T21:41:17Z 2018-10-30T21:41:25Z MEMBER

I would lean towards a coordinate based representation since it's a little more usable/certain to be correct.

I feel that this could become too complex in the case of irregularly spaced coordinates. I slightly favor the index-based approach (as in my function above), which one calls like python aggregate_da(da, {'lat': 2, 'lon': 2})

If we do that, we can just use scikit-image's block_reduce function, which is vectorized and works great with apply_ufunc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434477550 https://github.com/pydata/xarray/issues/2525#issuecomment-434477550 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDQ3NzU1MA== shoyer 1217238 2018-10-30T21:31:18Z 2018-10-30T21:31:18Z MEMBER

I'm +1 for adding this feature in some form as well.

From an API perspective, should the window size be specified in term of integer or coordinates? - rolling is integer based - resample is coordinate based

I would lean towards a coordinate based representation since it's a little more usable/certain to be correct. It might even make sense to still call this resample, though obviously the time options would no longer apply. Also, we would almost certainly want a faster underlying implementation than what we currently use for resample().

The API for resampling to a 2x2 degree latitude/longittude grid could look something like: da.resample(lat=2, lon=2).mean()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434294356 https://github.com/pydata/xarray/issues/2525#issuecomment-434294356 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDI5NDM1Ng== rabernat 1197350 2018-10-30T13:10:16Z 2018-10-30T13:10:39Z MEMBER

FYI, I do this often in my work with this sort of function:

python import xarray as xr from skimage.measure import block_reduce def aggregate_da(da, agg_dims, suf='_agg'): input_core_dims = list(agg_dims) n_agg = len(input_core_dims) core_block_size = tuple([agg_dims[k] for k in input_core_dims]) block_size = (da.ndim - n_agg)*(1,) + core_block_size output_core_dims = [dim + suf for dim in input_core_dims] output_sizes = {(dim + suf): da.shape[da.get_axis_num(dim)]//agg_dims[dim] for dim in input_core_dims} output_dtypes = da.dtype da_out = xr.apply_ufunc(block_reduce, da, kwargs={'block_size': block_size}, input_core_dims=[input_core_dims], output_core_dims=[output_core_dims], output_sizes=output_sizes, output_dtypes=[output_dtypes], dask='parallelized') for dim in input_core_dims: new_coord = block_reduce(da[dim].data, (agg_dims[dim],), func=np.mean) da_out.coords[dim + suf] = (dim + suf, new_coord) return da_out

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434294114 https://github.com/pydata/xarray/issues/2525#issuecomment-434294114 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDI5NDExNA== rabernat 1197350 2018-10-30T13:09:25Z 2018-10-30T13:09:25Z MEMBER

This is being discussed in #1192 under a different name.

Yes, we need this feature.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758
434261896 https://github.com/pydata/xarray/issues/2525#issuecomment-434261896 https://api.github.com/repos/pydata/xarray/issues/2525 MDEyOklzc3VlQ29tbWVudDQzNDI2MTg5Ng== fujiisoup 6815844 2018-10-30T11:17:17Z 2018-10-30T11:17:17Z MEMBER

This is from a thread at SO.

Does anyone have an opinion if we add a bin (or rolling_bin) method to compute the binning? For the above example, currently we need to do python dsa.rolling(x=2).construct('tmp').isel(x=slice(1, None, 2)).mean('tmp') which is a little complex.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-dimensional binning/resampling/coarsening 375126758

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 21.961ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows