home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 995207525 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • zachglee 2
  • shoyer 1
  • dcherian 1
  • max-sixty 1
  • andersy005 1

author_association 2

  • MEMBER 4
  • NONE 2

issue 1

  • combining 2 arrays with xr.merge() causes temporary spike in memory usage ~3x the combined size of the arrays · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1008223776 https://github.com/pydata/xarray/issues/5790#issuecomment-1008223776 https://api.github.com/repos/pydata/xarray/issues/5790 IC_kwDOAMm_X848GEYg andersy005 13301940 2022-01-09T03:49:32Z 2022-01-09T03:49:32Z MEMBER

@shoyer Tried Dask as you suggested, and it helped significantly! Thanks for the suggestion!

@zachglee, should we close this issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combining 2 arrays with xr.merge() causes temporary spike in memory usage ~3x the combined size of the arrays 995207525
920145030 https://github.com/pydata/xarray/issues/5790#issuecomment-920145030 https://api.github.com/repos/pydata/xarray/issues/5790 IC_kwDOAMm_X8422EyG zachglee 23262800 2021-09-15T15:53:49Z 2021-09-15T15:53:49Z NONE

@shoyer Tried Dask as you suggested, and it helped significantly! Thanks for the suggestion!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combining 2 arrays with xr.merge() causes temporary spike in memory usage ~3x the combined size of the arrays 995207525
919153582 https://github.com/pydata/xarray/issues/5790#issuecomment-919153582 https://api.github.com/repos/pydata/xarray/issues/5790 IC_kwDOAMm_X842ySuu zachglee 23262800 2021-09-14T13:30:30Z 2021-09-14T13:30:30Z NONE

@dcherian Agreed -- I just used this example for the MVCE, but unfortunately we have no guarantees that our data will always fit nicely into this concat-friendly form :(

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combining 2 arrays with xr.merge() causes temporary spike in memory usage ~3x the combined size of the arrays 995207525
918735708 https://github.com/pydata/xarray/issues/5790#issuecomment-918735708 https://api.github.com/repos/pydata/xarray/issues/5790 IC_kwDOAMm_X842wstc dcherian 2448579 2021-09-14T02:12:17Z 2021-09-14T02:12:17Z MEMBER

You're using merge to concatenate along C That is indeed inefficient. Can you use xr.concat([da1, da2], dim="C") instead?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combining 2 arrays with xr.merge() causes temporary spike in memory usage ~3x the combined size of the arrays 995207525
918697549 https://github.com/pydata/xarray/issues/5790#issuecomment-918697549 https://api.github.com/repos/pydata/xarray/issues/5790 IC_kwDOAMm_X842wjZN shoyer 1217238 2021-09-14T00:39:03Z 2021-09-14T00:39:03Z MEMBER

I have a hunch that all arrays get aligned to the final merged coordinate space (which is much bigger), before they are combined, which means at some point in the middle of the process we have a bunch of arrays in memory that have been inflated to the size of the final output array.

Yes, I'm pretty sure this is the case.

If that's the case, it seems like it should be possible to make this operation more efficient by creating just one inflated array and adding the data from the input arrays to it in-place? Or is this an expected and unavoidable behavior with merging? (fwiw this also affects several other combination methods, presumably because they use merge() under the hood?)

Yes, I imagine this could work.

But on the other hand, the implementation would get more complex. For example, it's nice to be able to use np.concatenate() so things automatically work with other array backends like Dask.

By the way, if you haven't tried Dask already I would recommend it for this use-case. It can do streaming operations that can result in significant memory savings.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combining 2 arrays with xr.merge() causes temporary spike in memory usage ~3x the combined size of the arrays 995207525
918583565 https://github.com/pydata/xarray/issues/5790#issuecomment-918583565 https://api.github.com/repos/pydata/xarray/issues/5790 IC_kwDOAMm_X842wHkN max-sixty 5635139 2021-09-13T21:15:56Z 2021-09-13T21:15:56Z MEMBER

I'll let others respond, but temporary memory usage of 3X sounds within expectations, albeit towards the higher end.

If we can reduce it that would be great, but probably needs someone to work on this fairly methodically.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combining 2 arrays with xr.merge() causes temporary spike in memory usage ~3x the combined size of the arrays 995207525

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 10.234ms · About: xarray-datasette