id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 326533369,MDU6SXNzdWUzMjY1MzMzNjk=,2186,Memory leak while looping through a Dataset,12929327,closed,0,,,13,2018-05-25T13:53:31Z,2022-03-08T10:00:07Z,2019-01-14T21:09:36Z,NONE,,,,"I'm encountering a detrimental memory leak when simply accessing data from a Dataset repeatedly within a loop. I'm opening netCDF files concatenated in time, and looping through time to create plots. In this case the x-y slices are about 5000 x 5000 in size ```python import xarray as xr import os, psutil ds = xr.open_mfdataset('*.nc', chunks={'x': 4000, 'y': 4000}, concat_dim='t') for k in range(ds.dims['t']): data = ds.datavar[k,:,:].values print('memory=', process.memory_info().rss) >>> memory= 566484992 >>> memory= 823836672 >>> memory= 951439360 >>> memory= 1039261696 ``` I tried explicitly dereferencing the array by calling ```del data``` at the end of each iteration, which reduces the memory growth a little bit, but not much. Strangely, in this simplified example I can greatly reduce the memory growth by using much smaller chunk sizes, but in my real-world example, opening all data with smaller chunk sizes does not mitigate the problem. Either way, it's not clear to me why the memory usage should grow for any chunk size at all. ```python ds = xr.open_mfdataset('*.nc', chunks={'x': 1000, 'y': 1000}, concat_dim='t') >>> memory= 514043904 >>> memory= 499363840 >>> memory= 502509568 >>> memory= 522133504 ``` I can also generate memory growth when cutting dask out entirely with ```open_dataset(chunks=None)``` and simply looping through different variables in the Dataset: ```python ds = xr.open_dataset('data.nc', chunks=None) # x-y dataset 5424 x 5424 for var in ['var1', 'var2', ... , 'var15']: data = ds[var].values print('memory =', process.memory_info().rss) >>> memory = 246087680 >>> memory = 280604672 >>> memory = 285810688 >>> memory = 315834368 >>> memory = 344510464 >>> memory = 374530048 >>> memory = 403742720 >>> memory = 403804160 >>> memory = 404140032 >>> memory = 403660800 >>> memory = 404262912 >>> memory = 403513344 >>> memory = 404115456 >>> memory = 403636224 ``` Though you can see that, strangely, the growth stops after several iterations. This isn't always true. Sometimes it asymptotes for a few interations and then begins growing again. I feel like I'm missing something fundamental about xarray memory management. It seems like a great impediment that arrays (or something) read from a Dataset are not garbage collected while looping through that Dataset, which kind of defeats the purpose of only accessing and working with the data you need in the first place. I have to access rather large chunks of data at a time, so being able to discard that slice of data and move onto the next one without filling up the RAM is a big deal. Any ideas what's going on? Or what I'm missing?
print(ds) # from open_mfdataset() Dimensions: (band: 1, number_of_image_bounds: 2, number_of_time_bounds: 2, t: 4, x: 5424, y: 5424) Coordinates: * y (y) float32 0.151844 0.151788 ... * x (x) float32 -0.151844 -0.151788 ... * t (t) datetime64[ns] 2018-05-25T00:36:02.796268032 ... Data variables: data (t, y, x) float32 dask.array INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.16.8-300.fc28.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.4 pandas: 0.22.0 numpy: 1.14.3 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: 0.5.1 h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.17.4 distributed: 1.21.8 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: None setuptools: 39.1.0 pip: 10.0.1 conda: None pytest: None IPython: 6.4.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2186/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue