home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "MEMBER" and issue = 58117200 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • shoyer 3
  • jhamman 2
  • dcherian 1
  • clarkfitzg 1

issue 1

  • Support multi-dimensional grouped operations and group_over · 7 ✖

author_association 1

  • MEMBER · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1054569287 https://github.com/pydata/xarray/issues/324#issuecomment-1054569287 https://api.github.com/repos/pydata/xarray/issues/324 IC_kwDOAMm_X84-23NH dcherian 2448579 2022-02-28T19:03:17Z 2022-02-28T19:03:17Z MEMBER

I have this almost ready in flox (needs more tests). So we should be able to do this soon.

In the mean time note that we can view grouping over multiple variables as a "factorization" (group identification) problem for aggregations. That means you can 1. use pd.factorize, pd.cut, np.searchsorted or np.bincount to convert each by variable to an integer code, 2. then use np.ravel_multi_index to combine the codes to a single variable idx 3. Group by idx and accumulate 4. use np.unravel_index (or just a simple np.reshape) to convert the single grouped dimension to a multiple dimensions. 5. Construct output coordinate arrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
531964854 https://github.com/pydata/xarray/issues/324#issuecomment-531964854 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDUzMTk2NDg1NA== shoyer 1217238 2019-09-16T21:26:21Z 2019-09-16T21:26:21Z MEMBER

Still relevant.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
336983333 https://github.com/pydata/xarray/issues/324#issuecomment-336983333 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDMzNjk4MzMzMw== shoyer 1217238 2017-10-16T18:24:33Z 2017-10-16T18:24:33Z MEMBER

Is use case 1 (Multiple groupby arguments along a single dimension) being held back for use case 2 (Multiple groupby arguments along different dimensions)? Use case 1 would be very useful by itself.

No, I think the biggest issue is that grouping variables into a MultiIndex on the result sort of works (with the current PR https://github.com/pydata/xarray/pull/924), but it's very easy to end up with weird conflicts between coordinates / MultiIndex levels that are hard to resolve right now within the xarray data model. Probably it would be best to resolve https://github.com/pydata/xarray/issues/1603 first, which will make this much easier.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
131891348 https://github.com/pydata/xarray/issues/324#issuecomment-131891348 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDEzMTg5MTM0OA== clarkfitzg 5356122 2015-08-17T17:04:44Z 2015-08-17T17:04:44Z MEMBER

For (2) I think it makes sense to extend the existing groupby to deal with multiple dimensions. Ie, let it take an iterable of dimension names.

```

darray.groupby(['lat', 'lon']) ```

Then we'd have something similar to the SQL groupby, which is a good thing.

By the way, in #527 we were considering using this approach to make the faceted plots on both rows and columns.

{
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
131878081 https://github.com/pydata/xarray/issues/324#issuecomment-131878081 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDEzMTg3ODA4MQ== jhamman 2443309 2015-08-17T16:20:14Z 2015-08-17T16:20:14Z MEMBER

Agreed, we have two use cases here.

For (1), can we just use the pandas grouping infrastructure. We just need to allow xray.DataArray.groupby to support an iterable and pandas.Grouper objects. I personally don't like the MultiIndex format and prefer to unstack the grouper operations when possible. In xray, I think we can justify going that route since we support N-D labeled dimensions much better than pandas.

For (2), I'll need to think a bit more about how this would work. Do we add a groupby method to DataArrayGroupBy? That sounds messy. Maybe we need to write a N-D grouper object?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
131599877 https://github.com/pydata/xarray/issues/324#issuecomment-131599877 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDEzMTU5OTg3Nw== jhamman 2443309 2015-08-16T18:51:05Z 2015-08-17T16:07:41Z MEMBER

@shoyer -

I want to look into putting a PR together for this. I'm looking for the same functionality that you get with a pandas Series or DataFrame:

Python data.groupby([lambda x: x.hour, lambda x: x.timetuple().tm_yday]).mean()

The motivation comes in making a Hovmoller diagram. What we need is this functionality:

Python da.groupby(['time.hour', 'time.dayofyear']).mean().plot()

If you can point me in the right direction, I'll see if I can put something together.

{
    "total_count": 7,
    "+1": 7,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200
131644079 https://github.com/pydata/xarray/issues/324#issuecomment-131644079 https://api.github.com/repos/pydata/xarray/issues/324 MDEyOklzc3VlQ29tbWVudDEzMTY0NDA3OQ== shoyer 1217238 2015-08-17T00:13:47Z 2015-08-17T00:13:47Z MEMBER

@jhamman For your use case, both hour and dayofyear are along the time dimension, so arguably the result should be 1D with a MultiIndex instead of 2D. So it might make more sense to start with that, and then layer on stack/unstack or pivot functionality.

I guess there are two related use cases here: 1. Multiple groupby arguments along a single dimension (pandas does this one already) 2. Multiple groupby arguments along different dimensions (pandas doesn't do this one).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support multi-dimensional grouped operations and group_over 58117200

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2169.124ms · About: xarray-datasette