home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 1495605827 and user = 4160723 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • benbovy · 3 ✖

issue 1

  • groupby+map performance regression on MultiIndex dataset · 3 ✖

author_association 1

  • MEMBER 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1352989233 https://github.com/pydata/xarray/issues/7376#issuecomment-1352989233 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85QpPox benbovy 4160723 2022-12-15T12:27:37Z 2022-12-15T12:27:37Z MEMBER

Thanks @benbovy! Are you also aware of the issue with plain assign being slower on MultiIndex (comment above: https://github.com/pydata/xarray/issues/7376#issuecomment-1350446546)? Do you know what could be the issue there by any chance?

I see that in ds.assign(foo=~ds["d3"]), the coordinates of ~ds["d3"] are dropped (#2087), which triggers re-indexing of the multi-index when aligning ds with ~ds["d3"]. This is a quite expensive operation.

It is not clear to me what would be a clean fix (see, e.g., #2180), but we could probably optimize the alignment logic so that when all unindexed dimension sizes match with indexed dimension sizes (like your example) no re-indexing is performed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby+map performance regression on MultiIndex dataset 1495605827
1352318926 https://github.com/pydata/xarray/issues/7376#issuecomment-1352318926 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85Qmr_O benbovy 4160723 2022-12-14T22:43:11Z 2022-12-14T22:47:37Z MEMBER

Are you aware of any workarounds for this issue with the current code (assuming I would like to preserve MultiIndex).

Unfortunately I don't know about any workaround that would preserve the MultiIndex. Depending on how you use the multi-index, you could instead set two single indexes for "i1" and "i2" respectively (it is supported now, use set_xindex()). I think that groupby will work well in that case. If you really need a multi-index, you could still build it afterwards from the groupby result.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby+map performance regression on MultiIndex dataset 1495605827
1350738301 https://github.com/pydata/xarray/issues/7376#issuecomment-1350738301 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85QgqF9 benbovy 4160723 2022-12-14T09:40:57Z 2022-12-14T09:40:57Z MEMBER

Thanks for the report @ravwojdyla.

Since #5692, multi-indexes level have each their own coordinate variable so copying takes a bit more time as we need to create more variables. Not sure what's happening with _maybe_cast_to_cftimeindex, though.

The real issue here, however, is the same than in #6836. In your example, .groupby("i1") creates 400 000 groups whereas it should create only 4 groups.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  groupby+map performance regression on MultiIndex dataset 1495605827

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 38.108ms · About: xarray-datasette