home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "MEMBER" and issue = 1473329967 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 2
  • dcherian 1

issue 1

  • Coordinate variable gains coordinate on subset · 3 ✖

author_association 1

  • MEMBER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1338119644 https://github.com/pydata/xarray/issues/7350#issuecomment-1338119644 https://api.github.com/repos/pydata/xarray/issues/7350 IC_kwDOAMm_X85PwhXc dcherian 2448579 2022-12-05T20:22:43Z 2022-12-05T20:25:31Z MEMBER

One way to fix this would be to never associate non-dim coords when creating a DataArray for a dimension coordinate. But I think it may be better to encourage users to use .variable instead.

The behavior I'd advocate for is that a subsetting/selection operation should never add new coordinates that weren't previously present.

What do you mean by "previously present"? We'd have to track state of some sort, which doesn't seem ideal. Alternatively, the user could use .sel(time=[0]). That would preserve the identity of time as a dimension coordinate (of size 1), and it would not get associated when extracting lat

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Coordinate variable gains coordinate on subset 1473329967
1338121102 https://github.com/pydata/xarray/issues/7350#issuecomment-1338121102 https://api.github.com/repos/pydata/xarray/issues/7350 IC_kwDOAMm_X85PwhuO shoyer 1217238 2022-12-05T20:23:46Z 2022-12-05T20:23:46Z MEMBER

IMO, it's not correctly implementing the rule as you phrased it. You said "still present", which isn't the case here since the coordinate wasn't present before.

Another way of describing the current behavior would be that xarray keeps around "every coordinate which could possibly still be valid," which is determined based upon dimension names.

The main challenge is that "Coordinate variables should not have their coordinates changed" doesn't really make sense in Xarray's data model. Only Dataset or DataArray objects have coordinates, which apply to the the entire Dataset/DataArray.

Let me give an example of why we might want to keep scalar coordinates around. Consider a Dataset where lat and lon need to be represented as 2D arrays, along x and y dimensions. If we index out a single lat/lon point, i.e., ds.isel(x=0, y=0) it would have scalar coordinates "x", "y", "lat" and "lon." If we now convert any of these to a DataArray, arguably all the coordinates are still valid.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Coordinate variable gains coordinate on subset 1473329967
1336302962 https://github.com/pydata/xarray/issues/7350#issuecomment-1336302962 https://api.github.com/repos/pydata/xarray/issues/7350 IC_kwDOAMm_X85Ppl1y shoyer 1217238 2022-12-04T02:16:25Z 2022-12-04T02:16:25Z MEMBER

This was an intentional design choice, back in the early days of Xarray.

The rule Xarray uses for choosing which coordinates to associate with a DataArray created from a Dataset or DataArray is "every coordinate whose dimensions are still present on the new DataArray." This includes scalar coordinates, which are always kept around (because their dimensions are always included).

What rule would you suggest instead? I agree that the behavior in this case "feels" wrong, but keep in mind that once time because a scalar coordinate, Xarray doesn't have any way of knowing that it used to have its own dimension.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Coordinate variable gains coordinate on subset 1473329967

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.529ms · About: xarray-datasette