home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "NONE", issue = 944996552 and user = 49487505 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • tommy307507 · 7 ✖

issue 1

  • Extremely Large Memory usage for a very small variable · 7 ✖

author_association 1

  • NONE · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
881131177 https://github.com/pydata/xarray/issues/5604#issuecomment-881131177 https://api.github.com/repos/pydata/xarray/issues/5604 IC_kwDOAMm_X840hP6p tommy307507 49487505 2021-07-16T02:29:33Z 2021-07-16T02:29:44Z NONE

Again — where are you seeing this 1000GB or 1000x number?

(also have a look at GitHub docs on how to format the code)

Sorry I think the 1000x is a confusion on my part on not reading the numbers correctly or poor understanding of how memory units work, but I will explain it again. on the top command, it draws all 100GiB of memory and started to use swap files that it causes the system to automately kill the code. The ubar variable should only draw 5911001249*8 = 648,480,800 bytes of memory, which is only 0.648GiB (Gigabytes), however the top command shows that it uses 92.5Gib Mem and all 16Gib of swap files, the actual drawn memory of the program is about 109 Gib (because that's all that is avaliable before it gets automatically killed) and it is in fact only 168x what's really needed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extremely Large Memory usage for a very small variable  944996552
881106553 https://github.com/pydata/xarray/issues/5604#issuecomment-881106553 https://api.github.com/repos/pydata/xarray/issues/5604 IC_kwDOAMm_X840hJ55 tommy307507 49487505 2021-07-16T01:13:11Z 2021-07-16T01:13:11Z NONE

For Ubar it says dask.array<where, shape=(59, 1100, 1249), dtype=float64, chunksize=(59, 1100, 1249), chunktype=numpy.ndarray> But for U it says dask.array<concatenate, shape=(59, 35, 1100, 1249), dtype=float64, chunksize=(1, 1, 1100, 1249), chunktype=numpy.ndarray> Those are very different operations, is that the reason for the 1000Gb consumption?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extremely Large Memory usage for a very small variable  944996552
881100884 https://github.com/pydata/xarray/issues/5604#issuecomment-881100884 https://api.github.com/repos/pydata/xarray/issues/5604 IC_kwDOAMm_X840hIhU tommy307507 49487505 2021-07-16T00:55:25Z 2021-07-16T00:57:19Z NONE

This will likely need much more detail. Though to start: what's the source of the 1000x number? What happens if you pass compat="identical", coords="minimal" to open_mfdataset? If that fails, the opening operation may be doing some expensive alignment.

Trying this gives me "conflicting values for variable 'ubar' on objects to be combined.", actually it makes sense as identical requires values to be the same right?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extremely Large Memory usage for a very small variable  944996552
880857434 https://github.com/pydata/xarray/issues/5604#issuecomment-880857434 https://api.github.com/repos/pydata/xarray/issues/5604 MDEyOklzc3VlQ29tbWVudDg4MDg1NzQzNA== tommy307507 49487505 2021-07-15T16:49:33Z 2021-07-15T16:49:33Z NONE

An example which we can reproduce locally would be the most helpful, if possible! … On Thu, 15 Jul 2021, 12:42 tommy307507, @.***> wrote: I also don't understand how the chunksize of v2d_time is 59 instead of 1 Is v2d_time one of the dimensions being concatenated along by open_mfdataset? Yes, I will try the above tomorrow, and post it back here. I did try to pass concat_dim = ["v2d_time", "v3d_time" ] but that still causes the problem — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5604 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AISNPI5PCPY3XH4FSWTEQODTX4FXLANCNFSM5AMYCK2Q .

Thanks for your quick reply but I am not at work right now as it's 1am over here I might test the limit of this happening tomorrow, I am trying to merge 59 files right now so might try less files for the lower limit. as passing 20 Gb of files around would be quite hard.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extremely Large Memory usage for a very small variable  944996552
880854744 https://github.com/pydata/xarray/issues/5604#issuecomment-880854744 https://api.github.com/repos/pydata/xarray/issues/5604 MDEyOklzc3VlQ29tbWVudDg4MDg1NDc0NA== tommy307507 49487505 2021-07-15T16:45:17Z 2021-07-15T16:45:17Z NONE

My temporary bypass around this is to do open_dataset on all of the files, storing the u and ubar in two separate lists and saving to file after doing an xr.concat on both of them They can be concatenated just fine and the file is about the expected size of 23Gb. The operation also takes up similar memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extremely Large Memory usage for a very small variable  944996552
880851062 https://github.com/pydata/xarray/issues/5604#issuecomment-880851062 https://api.github.com/repos/pydata/xarray/issues/5604 MDEyOklzc3VlQ29tbWVudDg4MDg1MTA2Mg== tommy307507 49487505 2021-07-15T16:42:18Z 2021-07-15T16:42:18Z NONE

I also don't understand how the chunksize of v2d_time is 59 instead of 1

Is v2d_time one of the dimensions being concatenated along by open_mfdataset?

Yes, I will try the above tomorrow, and post it back here. I did try to pass concat_dim = ["v2d_time", "v3d_time" ] but that still causes the problem

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extremely Large Memory usage for a very small variable  944996552
880410159 https://github.com/pydata/xarray/issues/5604#issuecomment-880410159 https://api.github.com/repos/pydata/xarray/issues/5604 MDEyOklzc3VlQ29tbWVudDg4MDQxMDE1OQ== tommy307507 49487505 2021-07-15T05:36:50Z 2021-07-15T05:36:50Z NONE

The variable can be combined using xr.concat if I open the individual files using xr.open_dataset and takes only 1.1g memory , I think the issue is somehow inside open_mfdataset, I also don't understand how the chunksize of v2d_time is 59 instead of 1

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extremely Large Memory usage for a very small variable  944996552

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.963ms · About: xarray-datasette