home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "MEMBER" and issue = 46768521 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer 2

issue 1

  • groupby reduction sometimes collapses variables into scalars · 2 ✖

author_association 1

  • MEMBER · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
65114186 https://github.com/pydata/xarray/issues/268#issuecomment-65114186 https://api.github.com/repos/pydata/xarray/issues/268 MDEyOklzc3VlQ29tbWVudDY1MTE0MTg2 shoyer 1217238 2014-12-01T18:49:37Z 2014-12-01T18:49:37Z MEMBER

I finally got around to investigating this issue, and it turns out to be more subtle than I thought.

The collapsing of scalars occurs for two reasons: 1. By default. concat collapses constant variables (unless they are explicitly called out in concat_over) 2. Groupby's concatenate is intentionally agnostic about the input data, only looking at the transformed data.

The combination of these features means that the concat step of groupby has no way (currently) to tell the difference between a new variable that is only constant across the concatenated dimension by chance (e.g., because of the nature of the input data in this case) and a variable that is intentionally constant (e.g., because x was set to the scalar zero in the original dataset).

The scalar collapsing feature of concat is convenient in some cases (maybe not for groupby), but it really should be controllable and predictable. A few options: 1. Alleviate the consequences, e.g., 1. by fixing the issue with concatenating scalar with non-scalar variables (#243) 2. adding a function for manually broadcasting to a given set of dimensions (I already have most of this in the xray.broadcast_arrays function, see #261) 2. Come up with some set of rules or heuristics for which variables are always concatenated by a groupby, e.g., 1. all variables that weren't constant across groups in the original objects, and/or 2. all non-coordinate variables

Probably would be good to both (1) and (2); the former will useful regardless. I think using "all non-coordinate variables" (2 ii) might be a reasonable choice (it would be consistent with the current behavior for data arrays).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby reduction sometimes collapses variables into scalars 46768521
60430393 https://github.com/pydata/xarray/issues/268#issuecomment-60430393 https://api.github.com/repos/pydata/xarray/issues/268 MDEyOklzc3VlQ29tbWVudDYwNDMwMzkz shoyer 1217238 2014-10-24T18:36:13Z 2014-10-24T18:36:13Z MEMBER

This definitely looks broken.

Thanks for the report!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby reduction sometimes collapses variables into scalars 46768521

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 9.868ms · About: xarray-datasette