home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

12 rows where issue = 58117200 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 9

  • shoyer 3
  • jhamman 2
  • alimanfoo 1
  • dcherian 1
  • clarkfitzg 1
  • hottwaj 1
  • matthiasdemuzere 1
  • jjpr-mit 1
  • stale[bot] 1

author_association 3

  • MEMBER 7
  • NONE 4
  • CONTRIBUTOR 1

issue 1

  • Support multi-dimensional grouped operations and group_over · 12 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1054569287 https://github.com/pydata/xarray/issues/324#issuecomment-1054569287 https://api.github.com/repos/pydata/xarray/issues/324 IC_kwDOAMm_X84-23NH dcherian 2448579 2022-02-28T19:03:17Z 2022-02-28T19:03:17Z MEMBER

I have this almost ready in flox (needs more tests). So we should be able to do this soon.

In the mean time note that we can view grouping over multiple variables as a "factorization" (group identification) problem for aggregations. That means you can 1. use pd.factorize, pd.cut, np.searchsorted or np.bincount to convert each by variable to an integer code, 2. then use np.ravel_multi_index to combine the codes to a single variable idx 3. Group by idx and accumulate 4. use np.unravel_index (or just a simple np.reshape) to convert the single grouped dimension to a multiple dimensions. 5. Construct output coordinate arrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
1054526670 https://github.com/pydata/xarray/issues/324#issuecomment-1054526670 https://api.github.com/repos/pydata/xarray/issues/324 IC_kwDOAMm_X84-2szO alimanfoo 703554 2022-02-28T18:10:02Z 2022-02-28T18:10:02Z CONTRIBUTOR

Still relevant, would like to be able to group by multiple variables along a single dimension.

{
    "total_count": 6,
    "+1": 6,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
781427391 https://github.com/pydata/xarray/issues/324#issuecomment-781427391 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDc4MTQyNzM5MQ== matthiasdemuzere 6926916 2021-02-18T15:33:06Z 2021-02-18T15:33:06Z NONE

still relevant, also for me ... I just wanted to group by half hours, for which I'd need access to.groupby(['time.hour','time.minutes'])

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
531964854 https://github.com/pydata/xarray/issues/324#issuecomment-531964854 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDUzMTk2NDg1NA== shoyer 1217238 2019-09-16T21:26:21Z 2019-09-16T21:26:21Z MEMBER

Still relevant.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
531937119 https://github.com/pydata/xarray/issues/324#issuecomment-531937119 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDUzMTkzNzExOQ== stale[bot] 26384082 2019-09-16T20:08:04Z 2019-09-16T20:08:04Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
336983333 https://github.com/pydata/xarray/issues/324#issuecomment-336983333 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDMzNjk4MzMzMw== shoyer 1217238 2017-10-16T18:24:33Z 2017-10-16T18:24:33Z MEMBER

Is use case 1 (Multiple groupby arguments along a single dimension) being held back for use case 2 (Multiple groupby arguments along different dimensions)? Use case 1 would be very useful by itself.

No, I think the biggest issue is that grouping variables into a MultiIndex on the result sort of works (with the current PR https://github.com/pydata/xarray/pull/924), but it's very easy to end up with weird conflicts between coordinates / MultiIndex levels that are hard to resolve right now within the xarray data model. Probably it would be best to resolve https://github.com/pydata/xarray/issues/1603 first, which will make this much easier.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
336925565 https://github.com/pydata/xarray/issues/324#issuecomment-336925565 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDMzNjkyNTU2NQ== jjpr-mit 25231875 2017-10-16T15:35:06Z 2017-10-16T15:35:06Z NONE

Is use case 1 (Multiple groupby arguments along a single dimension) being held back for use case 2 (Multiple groupby arguments along different dimensions)? Use case 1 would be very useful by itself.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
265462343 https://github.com/pydata/xarray/issues/324#issuecomment-265462343 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDI2NTQ2MjM0Mw== hottwaj 5629061 2016-12-07T14:35:01Z 2016-12-07T14:35:01Z NONE

In case it is of interest to anyone, the snippet below is a temporary and quite dirty solution I've used to do a multi-dimensional groupby...

It runs nested groupby-apply operations over each given dimension until no further grouping needs to be done, then applies the given function "apply_fn"

def nested_groupby_apply(dataarray, groupby, apply_fn): if len(groupby) == 1: return dataarray.groupby(groupby[0]).apply(apply_fn) else: return dataarray.groupby(groupby[0]).apply(nested_groupby_apply, groupby = groupby[1:], apply_fn = apply_fn)

Obviously performance can potentially be quite poor. Passing the dimensions to group over in order of increasing length will reduce your cost a little.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
131891348 https://github.com/pydata/xarray/issues/324#issuecomment-131891348 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDEzMTg5MTM0OA== clarkfitzg 5356122 2015-08-17T17:04:44Z 2015-08-17T17:04:44Z MEMBER

For (2) I think it makes sense to extend the existing groupby to deal with multiple dimensions. Ie, let it take an iterable of dimension names.

```

darray.groupby(['lat', 'lon']) ```

Then we'd have something similar to the SQL groupby, which is a good thing.

By the way, in #527 we were considering using this approach to make the faceted plots on both rows and columns.

{
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
131878081 https://github.com/pydata/xarray/issues/324#issuecomment-131878081 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDEzMTg3ODA4MQ== jhamman 2443309 2015-08-17T16:20:14Z 2015-08-17T16:20:14Z MEMBER

Agreed, we have two use cases here.

For (1), can we just use the pandas grouping infrastructure. We just need to allow xray.DataArray.groupby to support an iterable and pandas.Grouper objects. I personally don't like the MultiIndex format and prefer to unstack the grouper operations when possible. In xray, I think we can justify going that route since we support N-D labeled dimensions much better than pandas.

For (2), I'll need to think a bit more about how this would work. Do we add a groupby method to DataArrayGroupBy? That sounds messy. Maybe we need to write a N-D grouper object?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
131599877 https://github.com/pydata/xarray/issues/324#issuecomment-131599877 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDEzMTU5OTg3Nw== jhamman 2443309 2015-08-16T18:51:05Z 2015-08-17T16:07:41Z MEMBER

@shoyer -

I want to look into putting a PR together for this. I'm looking for the same functionality that you get with a pandas Series or DataFrame:

Python data.groupby([lambda x: x.hour, lambda x: x.timetuple().tm_yday]).mean()

The motivation comes in making a Hovmoller diagram. What we need is this functionality:

Python da.groupby(['time.hour', 'time.dayofyear']).mean().plot()

If you can point me in the right direction, I'll see if I can put something together.

{
    "total_count": 7,
    "+1": 7,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
131644079 https://github.com/pydata/xarray/issues/324#issuecomment-131644079 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDEzMTY0NDA3OQ== shoyer 1217238 2015-08-17T00:13:47Z 2015-08-17T00:13:47Z MEMBER

@jhamman For your use case, both hour and dayofyear are along the time dimension, so arguably the result should be 1D with a MultiIndex instead of 2D. So it might make more sense to start with that, and then layer on stack/unstack or pivot functionality.

I guess there are two related use cases here: 1. Multiple groupby arguments along a single dimension (pandas does this one already) 2. Multiple groupby arguments along different dimensions (pandas doesn't do this one).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1606.832ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows