issue_comments: 520182139
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/3200#issuecomment-520182139 | https://api.github.com/repos/pydata/xarray/issues/3200 | 520182139 | MDEyOklzc3VlQ29tbWVudDUyMDE4MjEzOQ== | 1217238 | 2019-08-10T21:51:25Z | 2019-08-10T21:52:24Z | MEMBER | Thanks for the profiling script. I ran a few permutations of this:
- Here are some plots:
So in conclusion, it looks like there are memory leaks:
1. when using netCDF4-Python (I was also able to confirm these without using xarray at all, just using (1) looks like by far the bigger issue, which you can work around by switching to scipy or h5netcdf to read your files. (2) is an issue for xarray. We do do some caching, specifically with our backend file manager, but given that issues only seem to appear when using Note: I modified your script to xarray's file cache size to 1, which helps smooth out the memory usage: ```python def CreateTestFiles(): # create a bunch of files xlen = int(1e2) ylen = int(1e2) xdim = np.arange(xlen) ydim = np.arange(ylen)
@profile def ReadFiles(): # for i in range(100): # ds = xr.open_dataset('testfile_{}.nc'.format(i), engine='netcdf4') # ds.close() ds = xr.open_mfdataset(glob.glob('testfile_*'), engine='h5netcdf', concat_dim='time') ds.close() if name == 'main': # write out files for testing CreateTestFiles()
``` |
{ "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
479190812 |