github: issue_comments: 6 rows where author_association = "NONE" and issue = 326533369 sorted by updated

6 rows where author_association = "NONE" and issue = 326533369 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1061602285	https://github.com/pydata/xarray/issues/2186#issuecomment-1061602285	https://api.github.com/repos/pydata/xarray/issues/2186	IC_kwDOAMm_X84_RsPt	hmkhatri 17830036	2022-03-08T10:00:07Z	2022-03-08T10:00:07Z	NONE	Hello, I am facing the same memory leak issue. I am using `mpirun` and `dask-mpi` on a slurm batch submission (see below). I am running through a time loop to perform some computations. After few iterations, the code blows up because `out of memory` issue. This does not happen if I execute the same code as a serial job. ``` from dask_mpi import initialize initialize() from dask.distributed import Client client = Client() main code goes here ds = xr.open_mfdataset("*nc") for i in range(0, len(ds.time)): ds1 = ds.isel(time=i) # perform some computations here `ds1.close()` ds.close() ```` I have tried the following - explicit ds.close() calls on datasets - gc.collect() - client.cancel(vars) None of the solutions worked for me. I have also tried increasing RAM but that didn't help either. I was wondering if anyone has found a work around this problem. @lumbric @shoyer @lkilcher I am using `dask 2022.2.0` `dask-mpi 2021.11.0` `xarray 0.21.1`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Memory leak while looping through a Dataset 326533369
1035593183	https://github.com/pydata/xarray/issues/2186#issuecomment-1035593183	https://api.github.com/repos/pydata/xarray/issues/2186	IC_kwDOAMm_X849ueXf	lkilcher 2273361	2022-02-10T22:24:37Z	2022-02-10T22:24:37Z	NONE	Hey folks, I ran into a similar memory leak issue. In my case a had the following: `for num in range(100): ds = xr.open_dataset('data.{}.nc'.format(num)) # This data was compressed with zlib, not sure if that matters # do some stuff, but NOT assigning any data in ds to new variables del ds` For some reason (maybe having to do with the `# do some stuff`), `ds` wasn't actually getting cleared. I was able to fix the problem by manually triggering garbage collection (`import gc`, and `gc.collect()` after the `del ds` statement). Perhaps this will help others who end up here...	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Memory leak while looping through a Dataset 326533369
392235000	https://github.com/pydata/xarray/issues/2186#issuecomment-392235000	https://api.github.com/repos/pydata/xarray/issues/2186	MDEyOklzc3VlQ29tbWVudDM5MjIzNTAwMA==	meridionaljet 12929327	2018-05-26T04:11:18Z	2018-05-26T04:11:18Z	NONE	Using `autoclose=True` doesn't seem to make a difference. My test only uses 4 files anyway. Thanks for the explanation of `open_dataset()` - that makes sense.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Memory leak while looping through a Dataset 326533369
392226004	https://github.com/pydata/xarray/issues/2186#issuecomment-392226004	https://api.github.com/repos/pydata/xarray/issues/2186	MDEyOklzc3VlQ29tbWVudDM5MjIyNjAwNA==	meridionaljet 12929327	2018-05-26T01:35:36Z	2018-05-26T01:35:36Z	NONE	I've discovered that setting the environment variable MALLOC_MMAP_MAX_ to a reasonably small value can partially mitigate this memory fragmentation. Performing 4 iterations over dataset slices of shape ~(5424, 5424) without this tweak was yielding >800MB of memory usage (an increase of ~400MB over the first iteration). Setting MALLOC_MMAP_MAX_=40960 yielded ~410 MB of memory usage (an increase of only ~130MB over the first iteration). This level of fragmentation is still offensive, but this does suggest the problem may lie deeper within the entire unix, glibc, Python, xarray, dask ecosystem.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Memory leak while looping through a Dataset 326533369
392217441	https://github.com/pydata/xarray/issues/2186#issuecomment-392217441	https://api.github.com/repos/pydata/xarray/issues/2186	MDEyOklzc3VlQ29tbWVudDM5MjIxNzQ0MQ==	meridionaljet 12929327	2018-05-26T00:03:59Z	2018-05-26T00:03:59Z	NONE	I'm now wondering if this issue is in dask land, based on this issue: https://github.com/dask/dask/issues/3247 It has been suggested in other places to get around the memory accumulation by running each loop iteration in a forked process: ```python def worker(ds, k): print('accessing data') data = ds.datavar[k,:,:].values print('data acquired') for k in range(ds.dims['t']): p = multiprocessing.Process(target=worker, args=(ds, k)) p.start() p.join() ``` But apparently one can't access dask-wrapped xarray datasets in subprocesses without a deadlock. I don't know enough about the internals to understand why.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Memory leak while looping through a Dataset 326533369
392110253	https://github.com/pydata/xarray/issues/2186#issuecomment-392110253	https://api.github.com/repos/pydata/xarray/issues/2186	MDEyOklzc3VlQ29tbWVudDM5MjExMDI1Mw==	meridionaljet 12929327	2018-05-25T16:23:55Z	2018-05-25T16:24:33Z	NONE	Yes, I understand the garbage collection. The problem I'm struggling with here is that normally when working with arrays, maintaining only one reference to an array and replacing the data that reference points to within a loop does not result in memory accumulation because GC is triggered on the prior, now dereferenced array from the previous iteration. Here, it seems that under the hood, references to arrays have been created other than my "data" variable that are not being dereferenced when I reassign to "data," so stuff is accumulating in memory.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Memory leak while looping through a Dataset 326533369

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

6 rows where author_association = "NONE" and issue = 326533369 sorted by updated_at descending

main code goes here

Advanced export