github: issue_comments: 4 rows where author_association = "MEMBER" and issue = 479190812 sorted by updated

4 rows where author_association = "MEMBER" and issue = 479190812 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
520182257	https://github.com/pydata/xarray/issues/3200#issuecomment-520182257	https://api.github.com/repos/pydata/xarray/issues/3200	MDEyOklzc3VlQ29tbWVudDUyMDE4MjI1Nw==	shoyer 1217238	2019-08-10T21:53:39Z	2019-08-10T21:53:39Z	MEMBER	Also, if you're having memory issues I also would definitely recommend upgrading to a newer version of xarray. There was a recent fix that helps ensure that files get automatically closed when they are garbage collected, even if you don't call `close()` or use a context manager explicitly.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset memory leak, very simple case. v0.12 479190812
520182139	https://github.com/pydata/xarray/issues/3200#issuecomment-520182139	https://api.github.com/repos/pydata/xarray/issues/3200	MDEyOklzc3VlQ29tbWVudDUyMDE4MjEzOQ==	shoyer 1217238	2019-08-10T21:51:25Z	2019-08-10T21:52:24Z	MEMBER	Thanks for the profiling script. I ran a few permutations of this: - `xarray.open_mfdataset` with `engine='netcdf4'` (default) - `xarray.open_mfdataset` with `engine='h5netcdf'` - `xarray.open_dataset` with `engine='netcdf4'` (default) - `xarray.open_dataset` with `engine='h5netcdf'` Here are some plots: `xarray.open_mfdataset` with `engine='netcdf4'`: pretty noticeable memory leak, about 0.5 MB / `open_mfdataset` call: `xarray.open_mfdataset` with `engine='h5netcdf'`: looks like a small memory leak, about 0.1 MB / `open_mfdataset` call: `xarray.open_dataset` with `engine='netcdf4'` (default): definitely has a memory leak: `xarray.open_dataset` with `engine='h5netcdf'`: does not appear to have a memory leak: So in conclusion, it looks like there are memory leaks: 1. when using netCDF4-Python (I was also able to confirm these without using xarray at all, just using `netCDF4.Dataset`) 2. when using `xarray.open_mfdataset` (1) looks like by far the bigger issue, which you can work around by switching to scipy or h5netcdf to read your files. (2) is an issue for xarray. We do do some caching, specifically with our backend file manager, but given that issues only seem to appear when using `open_mfdataset`, I suspect it may have more to do with the interaction with Dask, though to be honest I'm not exactly sure how. Note: I modified your script to xarray's file cache size to 1, which helps smooth out the memory usage: ```python def CreateTestFiles(): # create a bunch of files xlen = int(1e2) ylen = int(1e2) xdim = np.arange(xlen) ydim = np.arange(ylen) `nfiles = 100 for i in range(nfiles): data = np.random.rand(xlen, ylen, 1) datafile = xr.DataArray(data, coords=[xdim, ydim, [i]], dims=['x', 'y', 'time']) datafile.to_netcdf('testfile_{}.nc'.format(i))` @profile def ReadFiles(): # for i in range(100): # ds = xr.open_dataset('testfile_{}.nc'.format(i), engine='netcdf4') # ds.close() ds = xr.open_mfdataset(glob.glob('testfile_'), engine='h5netcdf', concat_dim='time') ds.close() if name* == 'main': # write out files for testing CreateTestFiles() `xr.set_options(file_cache_maxsize=1) # loop thru file read step for i in range(100): ReadFiles()` ```	{ "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset memory leak, very simple case. v0.12 479190812
520136799	https://github.com/pydata/xarray/issues/3200#issuecomment-520136799	https://api.github.com/repos/pydata/xarray/issues/3200	MDEyOklzc3VlQ29tbWVudDUyMDEzNjc5OQ==	crusaderky 6213168	2019-08-10T10:10:11Z	2019-08-10T10:11:18Z	MEMBER	Oh but first and foremost - CPython memory management is designed so that, when PyMem_Free() is invoked, CPython will hold on to it and not invoke the underlying free() syscall, hoping to reuse it on the next PyMem_Alloc(). An increase in RAM usage from 160 to 200MB could very well be explained by this. Try increasing the number of loops in your test 100-fold and see if you get a 100-fold increase in memory usage too (from 160MB to 1.2GB). If yes, it's a real leak; if it remains much more contained, it's normal CPython behaviour.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset memory leak, very simple case. v0.12 479190812
520136482	https://github.com/pydata/xarray/issues/3200#issuecomment-520136482	https://api.github.com/repos/pydata/xarray/issues/3200	MDEyOklzc3VlQ29tbWVudDUyMDEzNjQ4Mg==	crusaderky 6213168	2019-08-10T10:06:07Z	2019-08-10T10:06:07Z	MEMBER	Hi, xarray doesn't have any global objects that I know of that can cause the leak - I'm willing to bet on the underlying libraries. given your installed packages, open_mfdataset should be defaulting NetCDF4. Please try your measure again after setting it explicitly `open_mfdataset(..., engine='netcdf4')` See if the problem disappears if you pass `engine='h5netcdf'` Once you have confirmed the actual underlying library, try using it directly without xarray in your ReadFiles test: for every file returned by glob, open it with the netCDF4 package and load into memory all coords (not the data). Once NetCDF4 is confirmed to be the culprit, if you can it would be great if you could rewrite the test (only the read part) in C using the NetCDF C library to figure out if the leak is in it or in the Python wrapper.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset memory leak, very simple case. v0.12 479190812

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);