issue_comments
13 rows where issue = 326533369 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Memory leak while looping through a Dataset · 13 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1061602285 | https://github.com/pydata/xarray/issues/2186#issuecomment-1061602285 | https://api.github.com/repos/pydata/xarray/issues/2186 | IC_kwDOAMm_X84_RsPt | hmkhatri 17830036 | 2022-03-08T10:00:07Z | 2022-03-08T10:00:07Z | NONE | Hello, I am facing the same memory leak issue. I am using from dask.distributed import Client client = Client() main code goes hereds = xr.open_mfdataset("*nc") for i in range(0, len(ds.time)): ds1 = ds.isel(time=i) # perform some computations here
ds.close() ```` I have tried the following - explicit ds.close() calls on datasets - gc.collect() - client.cancel(vars) None of the solutions worked for me. I have also tried increasing RAM but that didn't help either. I was wondering if anyone has found a work around this problem. @lumbric @shoyer @lkilcher I am using |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 | |
1046665303 | https://github.com/pydata/xarray/issues/2186#issuecomment-1046665303 | https://api.github.com/repos/pydata/xarray/issues/2186 | IC_kwDOAMm_X84-YthX | lumbric 691772 | 2022-02-21T09:41:00Z | 2022-02-21T09:41:00Z | CONTRIBUTOR | I just stumbled across the same issue and created a minimal example similar to @lkilcher. I am using What seems to work: do not use the If I understand things correctly, this indicates that the issue is a consequence of dask/dask#3530. Not sure if there is anything to be fixed on the xarray side or what would be the best work around. I will try to use the processes scheduler. I can create a new (xarray) ticket with all details about the minimal example, if anyone thinks that this might be helpful (to collect work-a-rounds or discuss fixes on the xarray side). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 | |
1035611864 | https://github.com/pydata/xarray/issues/2186#issuecomment-1035611864 | https://api.github.com/repos/pydata/xarray/issues/2186 | IC_kwDOAMm_X849ui7Y | shoyer 1217238 | 2022-02-10T22:49:40Z | 2022-02-10T22:50:01Z | MEMBER | For what it's wroth, the recommended way to do this is to explicitly close the Dataset with Or with a context manager, e.g.,
|
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 | |
1035593183 | https://github.com/pydata/xarray/issues/2186#issuecomment-1035593183 | https://api.github.com/repos/pydata/xarray/issues/2186 | IC_kwDOAMm_X849ueXf | lkilcher 2273361 | 2022-02-10T22:24:37Z | 2022-02-10T22:24:37Z | NONE | Hey folks, I ran into a similar memory leak issue. In my case a had the following:
For some reason (maybe having to do with the |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 | |
454162269 | https://github.com/pydata/xarray/issues/2186#issuecomment-454162269 | https://api.github.com/repos/pydata/xarray/issues/2186 | MDEyOklzc3VlQ29tbWVudDQ1NDE2MjI2OQ== | max-sixty 5635139 | 2019-01-14T21:09:36Z | 2019-01-14T21:09:36Z | MEMBER | In an effort to reduce the issue backlog, I'll close this, but please reopen if you disagree |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 | |
393996561 | https://github.com/pydata/xarray/issues/2186#issuecomment-393996561 | https://api.github.com/repos/pydata/xarray/issues/2186 | MDEyOklzc3VlQ29tbWVudDM5Mzk5NjU2MQ== | shoyer 1217238 | 2018-06-01T20:13:18Z | 2018-06-01T20:13:18Z | MEMBER | This might be the same issue as https://github.com/dask/dask/issues/3530 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 | |
393846595 | https://github.com/pydata/xarray/issues/2186#issuecomment-393846595 | https://api.github.com/repos/pydata/xarray/issues/2186 | MDEyOklzc3VlQ29tbWVudDM5Mzg0NjU5NQ== | Karel-van-de-Plassche 6404167 | 2018-06-01T10:57:09Z | 2018-06-01T10:57:09Z | CONTRIBUTOR | @meridionaljet I might've run into the same issue, but I'm not 100% sure. In my case I'm looping over a Dataset containing variables from 3 different files, all of them with a Can you see what happens when using the distributed client? Put Also, for me the memory behaviour looks very different between the threaded and multi-process scheduler, although they both leak. (I'm not sure if leaking is the right term here). Maybe you can try I've tried without succes:
- explicitly deleting For my messy and very much work in process code, look here: https://github.com/Karel-van-de-Plassche/QLKNN-develop/blob/master/qlknn/dataset/hypercube_to_pandas.py |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 | |
392235000 | https://github.com/pydata/xarray/issues/2186#issuecomment-392235000 | https://api.github.com/repos/pydata/xarray/issues/2186 | MDEyOklzc3VlQ29tbWVudDM5MjIzNTAwMA== | meridionaljet 12929327 | 2018-05-26T04:11:18Z | 2018-05-26T04:11:18Z | NONE | Using Thanks for the explanation of |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 | |
392234293 | https://github.com/pydata/xarray/issues/2186#issuecomment-392234293 | https://api.github.com/repos/pydata/xarray/issues/2186 | MDEyOklzc3VlQ29tbWVudDM5MjIzNDI5Mw== | shoyer 1217238 | 2018-05-26T03:58:14Z | 2018-05-26T03:58:14Z | MEMBER | I might try experimenting with setting Memory growth with |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 | |
392226004 | https://github.com/pydata/xarray/issues/2186#issuecomment-392226004 | https://api.github.com/repos/pydata/xarray/issues/2186 | MDEyOklzc3VlQ29tbWVudDM5MjIyNjAwNA== | meridionaljet 12929327 | 2018-05-26T01:35:36Z | 2018-05-26T01:35:36Z | NONE | I've discovered that setting the environment variable MALLOC_MMAP_MAX_ to a reasonably small value can partially mitigate this memory fragmentation. Performing 4 iterations over dataset slices of shape ~(5424, 5424) without this tweak was yielding >800MB of memory usage (an increase of ~400MB over the first iteration). Setting MALLOC_MMAP_MAX_=40960 yielded ~410 MB of memory usage (an increase of only ~130MB over the first iteration). This level of fragmentation is still offensive, but this does suggest the problem may lie deeper within the entire unix, glibc, Python, xarray, dask ecosystem. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 | |
392217441 | https://github.com/pydata/xarray/issues/2186#issuecomment-392217441 | https://api.github.com/repos/pydata/xarray/issues/2186 | MDEyOklzc3VlQ29tbWVudDM5MjIxNzQ0MQ== | meridionaljet 12929327 | 2018-05-26T00:03:59Z | 2018-05-26T00:03:59Z | NONE | I'm now wondering if this issue is in dask land, based on this issue: https://github.com/dask/dask/issues/3247 It has been suggested in other places to get around the memory accumulation by running each loop iteration in a forked process: ```python def worker(ds, k): print('accessing data') data = ds.datavar[k,:,:].values print('data acquired') for k in range(ds.dims['t']): p = multiprocessing.Process(target=worker, args=(ds, k)) p.start() p.join() ``` But apparently one can't access dask-wrapped xarray datasets in subprocesses without a deadlock. I don't know enough about the internals to understand why. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 | |
392110253 | https://github.com/pydata/xarray/issues/2186#issuecomment-392110253 | https://api.github.com/repos/pydata/xarray/issues/2186 | MDEyOklzc3VlQ29tbWVudDM5MjExMDI1Mw== | meridionaljet 12929327 | 2018-05-25T16:23:55Z | 2018-05-25T16:24:33Z | NONE | Yes, I understand the garbage collection. The problem I'm struggling with here is that normally when working with arrays, maintaining only one reference to an array and replacing the data that reference points to within a loop does not result in memory accumulation because GC is triggered on the prior, now dereferenced array from the previous iteration. Here, it seems that under the hood, references to arrays have been created other than my "data" variable that are not being dereferenced when I reassign to "data," so stuff is accumulating in memory. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 | |
392108417 | https://github.com/pydata/xarray/issues/2186#issuecomment-392108417 | https://api.github.com/repos/pydata/xarray/issues/2186 | MDEyOklzc3VlQ29tbWVudDM5MjEwODQxNw== | rabernat 1197350 | 2018-05-25T16:17:15Z | 2018-05-25T16:17:15Z | MEMBER | The memory management here is handled by python, not xarray. Python decides when to perform garbage collection. I know that doesn't help solve your problem... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Memory leak while looping through a Dataset 326533369 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 8