home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 479190812 and user = 6213168 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • crusaderky · 2 ✖

issue 1

  • open_mfdataset memory leak, very simple case. v0.12 · 2 ✖

author_association 1

  • MEMBER 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
520136799 https://github.com/pydata/xarray/issues/3200#issuecomment-520136799 https://api.github.com/repos/pydata/xarray/issues/3200 MDEyOklzc3VlQ29tbWVudDUyMDEzNjc5OQ== crusaderky 6213168 2019-08-10T10:10:11Z 2019-08-10T10:11:18Z MEMBER

Oh but first and foremost - CPython memory management is designed so that, when PyMem_Free() is invoked, CPython will hold on to it and not invoke the underlying free() syscall, hoping to reuse it on the next PyMem_Alloc(). An increase in RAM usage from 160 to 200MB could very well be explained by this. Try increasing the number of loops in your test 100-fold and see if you get a 100-fold increase in memory usage too (from 160MB to 1.2GB). If yes, it's a real leak; if it remains much more contained, it's normal CPython behaviour.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset memory leak, very simple case. v0.12 479190812
520136482 https://github.com/pydata/xarray/issues/3200#issuecomment-520136482 https://api.github.com/repos/pydata/xarray/issues/3200 MDEyOklzc3VlQ29tbWVudDUyMDEzNjQ4Mg== crusaderky 6213168 2019-08-10T10:06:07Z 2019-08-10T10:06:07Z MEMBER

Hi,

xarray doesn't have any global objects that I know of that can cause the leak - I'm willing to bet on the underlying libraries.

  1. given your installed packages, open_mfdataset should be defaulting NetCDF4. Please try your measure again after setting it explicitly open_mfdataset(..., engine='netcdf4')
  2. See if the problem disappears if you pass engine='h5netcdf'
  3. Once you have confirmed the actual underlying library, try using it directly without xarray in your ReadFiles test: for every file returned by glob, open it with the netCDF4 package and load into memory all coords (not the data).
  4. Once NetCDF4 is confirmed to be the culprit, if you can it would be great if you could rewrite the test (only the read part) in C using the NetCDF C library to figure out if the leak is in it or in the Python wrapper.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset memory leak, very simple case. v0.12 479190812

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 21.03ms · About: xarray-datasette