home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "COLLABORATOR", issue = 1412895383 and user = 43316012 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • headtr1ck · 7 ✖

issue 1

  • xarray 2022.10.0 much slower then 2022.6.0 · 7 ✖

author_association 1

  • COLLABORATOR · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1290349919 https://github.com/pydata/xarray/issues/7181#issuecomment-1290349919 https://api.github.com/repos/pydata/xarray/issues/7181 IC_kwDOAMm_X85M6S1f headtr1ck 43316012 2022-10-25T10:47:55Z 2022-10-25T10:47:55Z COLLABORATOR

Is this nesting now disallowed?

No, it is allowed. But a deep-copy will also deep-copy the attrs now, and therefore the DataArrays in there.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray 2022.10.0 much slower then 2022.6.0 1412895383
1290180708 https://github.com/pydata/xarray/issues/7181#issuecomment-1290180708 https://api.github.com/repos/pydata/xarray/issues/7181 IC_kwDOAMm_X85M5phk headtr1ck 43316012 2022-10-25T08:27:43Z 2022-10-25T08:29:28Z COLLABORATOR

You call DataArray.copy ~650 times which sounds reasonable (you could check if you really need deep copies!).

But then somehow DataArray._copy is called ~500k times. So somewhere you must have some DataArrays in places where there should be none (in attrs, e.g.).

There are some copy calls from alignments, but I don't see how that adds up to this number...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray 2022.10.0 much slower then 2022.6.0 1412895383
1290176737 https://github.com/pydata/xarray/issues/7181#issuecomment-1290176737 https://api.github.com/repos/pydata/xarray/issues/7181 IC_kwDOAMm_X85M5ojh headtr1ck 43316012 2022-10-25T08:24:37Z 2022-10-25T08:24:37Z COLLABORATOR

Every time I look at this graph Im getting more confused, haha. I still don't know why copy.deepcopy is called 24mio times.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray 2022.10.0 much slower then 2022.6.0 1412895383
1289611718 https://github.com/pydata/xarray/issues/7181#issuecomment-1289611718 https://api.github.com/repos/pydata/xarray/issues/7181 IC_kwDOAMm_X85M3enG headtr1ck 43316012 2022-10-24T21:01:19Z 2022-10-24T21:02:42Z COLLABORATOR

@dschwoerer could you test if you see any improvement with this PR: #7209 ?

The graph is not super helpfull, since most calls to deep_copy are indirectly from deep_copy itself. I tried my best to find some points of improvement.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray 2022.10.0 much slower then 2022.6.0 1412895383
1283747988 https://github.com/pydata/xarray/issues/7181#issuecomment-1283747988 https://api.github.com/repos/pydata/xarray/issues/7181 IC_kwDOAMm_X85MhHCU headtr1ck 43316012 2022-10-19T10:04:10Z 2022-10-19T10:04:10Z COLLABORATOR

I see, I think the problem is Variable.to_index_variable is deep copying. Probably the fault of deep=True being the default for copy...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray 2022.10.0 much slower then 2022.6.0 1412895383
1283533731 https://github.com/pydata/xarray/issues/7181#issuecomment-1283533731 https://api.github.com/repos/pydata/xarray/issues/7181 IC_kwDOAMm_X85MgSuj headtr1ck 43316012 2022-10-19T07:07:51Z 2022-10-19T07:07:51Z COLLABORATOR

You call DataArray.copy 658 times, that seems ok for a test suite. But I don't understand why it calls DataArray.__deepcopy__ 650k times. There must be something that deepcopies you arrays all over the place...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray 2022.10.0 much slower then 2022.6.0 1412895383
1282402639 https://github.com/pydata/xarray/issues/7181#issuecomment-1282402639 https://api.github.com/repos/pydata/xarray/issues/7181 IC_kwDOAMm_X85Mb-lP headtr1ck 43316012 2022-10-18T13:38:31Z 2022-10-18T13:38:31Z COLLABORATOR

Yes the behavior of deep copy was changed. But I don't see why it should affect runtime since we only pass on the memo dict now which is required for recursive structures.

Maybe there is some bug in the logic, or maybe before it was not doing true deep copies?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray 2022.10.0 much slower then 2022.6.0 1412895383

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 26.533ms · About: xarray-datasette