home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "MEMBER" and issue = 944996552 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • max-sixty 3
  • TomNicholas 2

issue 1

  • Extremely Large Memory usage for a very small variable · 5 ✖

author_association 1

  • MEMBER · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
881641897 https://github.com/pydata/xarray/issues/5604#issuecomment-881641897 https://api.github.com/repos/pydata/xarray/issues/5604 IC_kwDOAMm_X840jMmp max-sixty 5635139 2021-07-16T18:36:45Z 2021-07-16T18:36:45Z MEMBER

The memory usage does seem high. Not having the indexes aligned makes it into an expensive operation, and I would vote to have that fail by default ref (https://github.com/pydata/xarray/discussions/5499#discussioncomment-929765).

Can the input files be aligned before attempting to combine the data? Or are you not in control of the input files?

To debug the memory, you probably need to do something like use memory_profiler, and try for varying numbers of files — unfortunately it's a complex problem and just looking at htop gives very course information.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extremely Large Memory usage for a very small variable  944996552
881111321 https://github.com/pydata/xarray/issues/5604#issuecomment-881111321 https://api.github.com/repos/pydata/xarray/issues/5604 IC_kwDOAMm_X840hLEZ max-sixty 5635139 2021-07-16T01:29:19Z 2021-07-16T01:29:19Z MEMBER

Again — where are you seeing this 1000GB or 1000x number?

(also have a look at GitHub docs on how to format the code)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extremely Large Memory usage for a very small variable  944996552
880853826 https://github.com/pydata/xarray/issues/5604#issuecomment-880853826 https://api.github.com/repos/pydata/xarray/issues/5604 MDEyOklzc3VlQ29tbWVudDg4MDg1MzgyNg== TomNicholas 35968931 2021-07-15T16:44:32Z 2021-07-15T16:44:32Z MEMBER

An example which we can reproduce locally would be the most helpful, if possible!

On Thu, 15 Jul 2021, 12:42 tommy307507, @.***> wrote:

I also don't understand how the chunksize of v2d_time is 59 instead of 1

Is v2d_time one of the dimensions being concatenated along by open_mfdataset?

Yes, I will try the above tomorrow, and post it back here. I did try to pass concat_dim = ["v2d_time", "v3d_time" ] but that still causes the problem

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/5604#issuecomment-880851062, or unsubscribe https://github.com/notifications/unsubscribe-auth/AISNPI5PCPY3XH4FSWTEQODTX4FXLANCNFSM5AMYCK2Q .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extremely Large Memory usage for a very small variable  944996552
880809255 https://github.com/pydata/xarray/issues/5604#issuecomment-880809255 https://api.github.com/repos/pydata/xarray/issues/5604 MDEyOklzc3VlQ29tbWVudDg4MDgwOTI1NQ== TomNicholas 35968931 2021-07-15T15:51:18Z 2021-07-15T15:51:18Z MEMBER

I also don't understand how the chunksize of v2d_time is 59 instead of 1

Is v2d_time one of the dimensions being concatenated along by open_mfdataset?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extremely Large Memory usage for a very small variable  944996552
880500336 https://github.com/pydata/xarray/issues/5604#issuecomment-880500336 https://api.github.com/repos/pydata/xarray/issues/5604 MDEyOklzc3VlQ29tbWVudDg4MDUwMDMzNg== max-sixty 5635139 2021-07-15T08:24:12Z 2021-07-15T08:24:12Z MEMBER

This will likely need much more detail. Though to start: what's the source of the 1000x number? What happens if you pass compat="identical", coords="minimal" to open_mfdataset? If that fails, the opening operation may be doing some expensive alignment.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extremely Large Memory usage for a very small variable  944996552

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.866ms · About: xarray-datasette