html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3200#issuecomment-520182257,https://api.github.com/repos/pydata/xarray/issues/3200,520182257,MDEyOklzc3VlQ29tbWVudDUyMDE4MjI1Nw==,1217238,2019-08-10T21:53:39Z,2019-08-10T21:53:39Z,MEMBER,"Also, if you're having memory issues I also would definitely recommend upgrading to a newer version of xarray. There was a recent fix that helps ensure that files get automatically closed when they are garbage collected, even if you don't call `close()` or use a context manager explicitly.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,479190812
https://github.com/pydata/xarray/issues/3200#issuecomment-520182139,https://api.github.com/repos/pydata/xarray/issues/3200,520182139,MDEyOklzc3VlQ29tbWVudDUyMDE4MjEzOQ==,1217238,2019-08-10T21:51:25Z,2019-08-10T21:52:24Z,MEMBER,"Thanks for the profiling script. I ran a few permutations of this:
- `xarray.open_mfdataset` with `engine='netcdf4'` (default)
- `xarray.open_mfdataset` with `engine='h5netcdf'`
- `xarray.open_dataset` with `engine='netcdf4'` (default)
- `xarray.open_dataset` with `engine='h5netcdf'`
Here are some plots:
`xarray.open_mfdataset` with `engine='netcdf4'`: pretty noticeable memory leak, about 0.5 MB / `open_mfdataset` call:

`xarray.open_mfdataset` with `engine='h5netcdf'`: looks like a small memory leak, about 0.1 MB / `open_mfdataset` call:

`xarray.open_dataset` with `engine='netcdf4'` (default): definitely has a memory leak:

`xarray.open_dataset` with `engine='h5netcdf'`: does not appear to have a memory leak:

So in conclusion, it looks like there are memory leaks:
1. when using netCDF4-Python (I was also able to confirm these without using xarray at all, just using `netCDF4.Dataset`)
2. when using `xarray.open_mfdataset`
(1) looks like by far the bigger issue, which you can work around by switching to scipy or h5netcdf to read your files.
(2) is an issue for xarray. We do do some caching, specifically with our [backend file manager](https://github.com/pydata/xarray/blob/befc72f052a189c114695b4ae9a1c66533617336/xarray/backends/file_manager.py), but given that issues only seem to appear when using `open_mfdataset`, I suspect it may have more to do with the interaction with Dask, though to be honest I'm not exactly sure how.
Note: I modified your script to xarray's file cache size to 1, which helps smooth out the memory usage:
```python
def CreateTestFiles():
# create a bunch of files
xlen = int(1e2)
ylen = int(1e2)
xdim = np.arange(xlen)
ydim = np.arange(ylen)
nfiles = 100
for i in range(nfiles):
data = np.random.rand(xlen, ylen, 1)
datafile = xr.DataArray(data, coords=[xdim, ydim, [i]], dims=['x', 'y', 'time'])
datafile.to_netcdf('testfile_{}.nc'.format(i))
@profile
def ReadFiles():
# for i in range(100):
# ds = xr.open_dataset('testfile_{}.nc'.format(i), engine='netcdf4')
# ds.close()
ds = xr.open_mfdataset(glob.glob('testfile_*'), engine='h5netcdf', concat_dim='time')
ds.close()
if __name__ == '__main__':
# write out files for testing
CreateTestFiles()
xr.set_options(file_cache_maxsize=1)
# loop thru file read step
for i in range(100):
ReadFiles()
```
","{""total_count"": 2, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,479190812