home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where author_association = "NONE" and user = 1419010 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 3

  • groupby+map performance regression on MultiIndex dataset 7
  • Flexible backends - Harmonise zarr chunking with other backends chunking 1
  • Allow chunk spec per variable 1

user 1

  • ravwojdyla · 9 ✖

author_association 1

  • NONE · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1353074006 https://github.com/pydata/xarray/issues/7376#issuecomment-1353074006 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85QpkVW ravwojdyla 1419010 2022-12-15T13:33:44Z 2022-12-15T13:33:44Z NONE

@benbovy thanks for the context and the PR #7382, exciting to see the improvement!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby+map performance regression on MultiIndex dataset 1495605827
1352434183 https://github.com/pydata/xarray/issues/7376#issuecomment-1352434183 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85QnIIH ravwojdyla 1419010 2022-12-15T01:18:32Z 2022-12-15T01:18:32Z NONE

Thanks @benbovy! Are you also aware of the issue with plain assign being slower on MultiIndex (comment above: https://github.com/pydata/xarray/issues/7376#issuecomment-1350446546)? Do you know what could be the issue there by any chance?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby+map performance regression on MultiIndex dataset 1495605827
1351540523 https://github.com/pydata/xarray/issues/7376#issuecomment-1351540523 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85Qjt8r ravwojdyla 1419010 2022-12-14T14:40:18Z 2022-12-14T19:47:12Z NONE

👋 @benbovy thanks for the update. Looking at https://github.com/pydata/xarray/pull/5692, it must have been a huge effort, thank you for your work on that! Coming back to this issue, in the example above the version 2022.6.0 is about 600x slower, in our internal code, the code would not finish in a reasonable time, so that forced us to downgrade to 2022.3.0. Are you aware of any workarounds for this issue with the current code (assuming I would like to preserve MultiIndex).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby+map performance regression on MultiIndex dataset 1495605827
1350446546 https://github.com/pydata/xarray/issues/7376#issuecomment-1350446546 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85Qfi3S ravwojdyla 1419010 2022-12-14T06:03:15Z 2022-12-14T06:06:02Z NONE

FYI this might warrant a separate issue(?), but an assign of a new DataArray e.g.: ds.assign(foo=~ds["d3"]) is also a couple of times (e.g. on 4M elements, same keys as above, ~7x slower) slower since 2022.6.0 (same commit).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby+map performance regression on MultiIndex dataset 1495605827
1350390046 https://github.com/pydata/xarray/issues/7376#issuecomment-1350390046 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85QfVEe ravwojdyla 1419010 2022-12-14T04:44:04Z 2022-12-14T04:52:26Z NONE

And just want to point out that the stacktraces/profile look very different between 2022.3.0 and main/latest. Looks like https://github.com/pydata/xarray/blob/021c73e12cccb06c017ce6420dd043a0cfbf9f08/xarray/core/indexes.py#L185 might be fairly expensive operation. Separately there seem to be quite a bit of time spend in copy -> copy_indexes path (deep copy?).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby+map performance regression on MultiIndex dataset 1495605827
1350378571 https://github.com/pydata/xarray/issues/7376#issuecomment-1350378571 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85QfSRL ravwojdyla 1419010 2022-12-14T04:22:28Z 2022-12-14T04:27:52Z NONE

3ead17ea9e99283e2511b65b9d864d1c7b10b3c4 (https://github.com/pydata/xarray/pull/5692) seems to be the commit that introduced this regression (cc: @benbovy)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby+map performance regression on MultiIndex dataset 1495605827
1350366220 https://github.com/pydata/xarray/issues/7376#issuecomment-1350366220 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85QfPQM ravwojdyla 1419010 2022-12-14T04:04:16Z 2022-12-14T04:04:16Z NONE

Also recorded py-spy flamegraphs and exported them in speedscope format at: https://gist.github.com/ravwojdyla/3b791debd3f97707d84748446dc07e39, you can view them in https://www.speedscope.app/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby+map performance regression on MultiIndex dataset 1495605827
748052860 https://github.com/pydata/xarray/issues/4623#issuecomment-748052860 https://api.github.com/repos/pydata/xarray/issues/4623 MDEyOklzc3VlQ29tbWVudDc0ODA1Mjg2MA== ravwojdyla 1419010 2020-12-18T12:12:04Z 2020-12-18T12:12:35Z NONE

Thought through a couple of options, including simple value classes, but in the end they did not fit the current API. If we try to stick with the current style, it makes a bit more sense to go in the direction of {dim: {var: chink_spec}} since there is already {dim: x}, so should a user want a variables specific chunking they would need to adjust it to {dim: {var: y, ...:x}}, .../Ellipsis standing for "all other variables" with dim. wdyt @shoyer?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow chunk spec per variable 753374426
732486436 https://github.com/pydata/xarray/issues/4496#issuecomment-732486436 https://api.github.com/repos/pydata/xarray/issues/4496 MDEyOklzc3VlQ29tbWVudDczMjQ4NjQzNg== ravwojdyla 1419010 2020-11-23T23:31:25Z 2020-11-23T23:31:39Z NONE

Hi. I'm trying to find an issue that is closest to the problem that I have, and this seems to be the best one, and most related.

Say, I have a zarr dataset with multiple variables Foo, Bar and Baz (and potentially, many more), there are 2 dimensions: x, y (potentially more). Say both Foo and Bar are large 2d arrays dims: x, y, Baz is relatively small 1d array dim: y. Say I would like to read that dataset with xarray but increase chunk from the native zarr chunk size for x and y but only for Foo and Bar, I would like to keep native chunking for Baz. afaiu currently I would do that with chunks parameter to open_dataset/open_zarr, but if I do do that via say dict(x=N, y=M) that will change chunking for all variables that use those dimensions, which isn't exactly what I need, I need those changed only for Foo and Bar. Is there a way to do that? Should that be part of the "harmonisation"? One could imagine that xarray could accept a dict of dict akin to {var: {dim: chunk_spec}} to specify chunking for specific variables.

Note that rechunk after reading is not what I want, I would like to specify chunking at read op.

Let me know if you would prefer me to open a completely new issue for this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible backends - Harmonise zarr chunking with other backends chunking 717410970

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 33.456ms · About: xarray-datasette