github: issue_comments: 2 rows where author_association = "CONTRIBUTOR" and issue = 326533369 sorted by updated

2 rows where author_association = "CONTRIBUTOR" and issue = 326533369 sorted by updated_at descending

Search:

✖

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	performed_via_github_app	issue
1046665303	https://github.com/pydata/xarray/issues/2186#issuecomment-1046665303	https://api.github.com/repos/pydata/xarray/issues/2186	IC_kwDOAMm_X84-YthX	lumbric 691772	2022-02-21T09:41:00Z	2022-02-21T09:41:00Z	CONTRIBUTOR	I just stumbled across the same issue and created a minimal example similar to @lkilcher. I am using `xr.open_dataarray()` with chunks and do some simple computation. After that 800mb of RAM is used, no matter whether I close the file explicitly, delete the xarray objects or invoke the Python garbage collector. What seems to work: do not use the `threading` Dask scheduler. The issue does not seem to occur with the single-threaded or processes scheduler. Also setting `MALLOC_MMAP_MAX_=40960` seems to solve the issue as suggested above (disclaimer: I don't fully understand the details here). If I understand things correctly, this indicates that the issue is a consequence of dask/dask#3530. Not sure if there is anything to be fixed on the xarray side or what would be the best work around. I will try to use the processes scheduler. I can create a new (xarray) ticket with all details about the minimal example, if anyone thinks that this might be helpful (to collect work-a-rounds or discuss fixes on the xarray side).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		Memory leak while looping through a Dataset 326533369
393846595	https://github.com/pydata/xarray/issues/2186#issuecomment-393846595	https://api.github.com/repos/pydata/xarray/issues/2186	MDEyOklzc3VlQ29tbWVudDM5Mzg0NjU5NQ==	Karel-van-de-Plassche 6404167	2018-06-01T10:57:09Z	2018-06-01T10:57:09Z	CONTRIBUTOR	@meridionaljet I might've run into the same issue, but I'm not 100% sure. In my case I'm looping over a Dataset containing variables from 3 different files, all of them with a `.sel` and some of them with a more complicated (dask) calculation. (still, mostly sums and divisions) The leak seems mostly happening for those with the calculation. Can you see what happens when using the distributed client? Put `client = dask.distributed.Client()` in front of your code. This leads to many `distributed.utils_perf - WARNING - full garbage collections took 40% CPU time recently (threshold: 10%)` messages being shown for me, indeed pointing to something garbage-collecty. Also, for me the memory behaviour looks very different between the threaded and multi-process scheduler, although they both leak. (I'm not sure if leaking is the right term here). Maybe you can try `memory_profiler`? I've tried without succes: - explicitly deleting `ds[varname]` and running `gc.collect()` - explicitly clearing dask cache with `client.cancel` and `client.restart` - Moving the leaky code in its own function (should not matter, but I seemed to remember that it sometimes helps for garbage collect in edge cases) - Explicitly triggering computation with either dask `persist` or xarray `load` and then explicitly deleting the result For my messy and very much work in process code, look here: https://github.com/Karel-van-de-Plassche/QLKNN-develop/blob/master/qlknn/dataset/hypercube_to_pandas.py	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		Memory leak while looping through a Dataset 326533369

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);