home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "CONTRIBUTOR" and issue = 326533369 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • lumbric 1
  • Karel-van-de-Plassche 1

issue 1

  • Memory leak while looping through a Dataset · 2 ✖

author_association 1

  • CONTRIBUTOR · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1046665303 https://github.com/pydata/xarray/issues/2186#issuecomment-1046665303 https://api.github.com/repos/pydata/xarray/issues/2186 IC_kwDOAMm_X84-YthX lumbric 691772 2022-02-21T09:41:00Z 2022-02-21T09:41:00Z CONTRIBUTOR

I just stumbled across the same issue and created a minimal example similar to @lkilcher. I am using xr.open_dataarray() with chunks and do some simple computation. After that 800mb of RAM is used, no matter whether I close the file explicitly, delete the xarray objects or invoke the Python garbage collector.

What seems to work: do not use the threading Dask scheduler. The issue does not seem to occur with the single-threaded or processes scheduler. Also setting MALLOC_MMAP_MAX_=40960 seems to solve the issue as suggested above (disclaimer: I don't fully understand the details here).

If I understand things correctly, this indicates that the issue is a consequence of dask/dask#3530. Not sure if there is anything to be fixed on the xarray side or what would be the best work around. I will try to use the processes scheduler.

I can create a new (xarray) ticket with all details about the minimal example, if anyone thinks that this might be helpful (to collect work-a-rounds or discuss fixes on the xarray side).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory leak while looping through a Dataset 326533369
393846595 https://github.com/pydata/xarray/issues/2186#issuecomment-393846595 https://api.github.com/repos/pydata/xarray/issues/2186 MDEyOklzc3VlQ29tbWVudDM5Mzg0NjU5NQ== Karel-van-de-Plassche 6404167 2018-06-01T10:57:09Z 2018-06-01T10:57:09Z CONTRIBUTOR

@meridionaljet I might've run into the same issue, but I'm not 100% sure. In my case I'm looping over a Dataset containing variables from 3 different files, all of them with a .sel and some of them with a more complicated (dask) calculation. (still, mostly sums and divisions) The leak seems mostly happening for those with the calculation.

Can you see what happens when using the distributed client? Put client = dask.distributed.Client() in front of your code. This leads to many distributed.utils_perf - WARNING - full garbage collections took 40% CPU time recently (threshold: 10%) messages being shown for me, indeed pointing to something garbage-collecty.

Also, for me the memory behaviour looks very different between the threaded and multi-process scheduler, although they both leak. (I'm not sure if leaking is the right term here). Maybe you can try memory_profiler?

I've tried without succes: - explicitly deleting ds[varname] and running gc.collect() - explicitly clearing dask cache with client.cancel and client.restart - Moving the leaky code in its own function (should not matter, but I seemed to remember that it sometimes helps for garbage collect in edge cases) - Explicitly triggering computation with either dask persist or xarray load and then explicitly deleting the result

For my messy and very much work in process code, look here: https://github.com/Karel-van-de-Plassche/QLKNN-develop/blob/master/qlknn/dataset/hypercube_to_pandas.py

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory leak while looping through a Dataset 326533369

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1358.008ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows