issues: 326533369

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
326533369	MDU6SXNzdWUzMjY1MzMzNjk=	2186	Memory leak while looping through a Dataset	12929327	closed	0			13	2018-05-25T13:53:31Z	2022-03-08T10:00:07Z	2019-01-14T21:09:36Z	NONE				I'm encountering a detrimental memory leak when simply accessing data from a Dataset repeatedly within a loop. I'm opening netCDF files concatenated in time, and looping through time to create plots. In this case the x-y slices are about 5000 x 5000 in size ```python import xarray as xr import os, psutil ds = xr.open_mfdataset('.nc', chunks={'x': 4000, 'y': 4000}, concat_dim='t') for k in range(ds.dims['t']): data = ds.datavar[k,:,:].values print('memory=', process.memory_info().rss) memory= 566484992 memory= 823836672 memory= 951439360 memory= 1039261696 `I tried explicitly dereferencing the array by calling`del data``` at the end of each iteration, which reduces the memory growth a little bit, but not much. Strangely, in this simplified example I can greatly reduce the memory growth by using much smaller chunk sizes, but in my real-world example, opening all data with smaller chunk sizes does not mitigate the problem. Either way, it's not clear to me why the memory usage should grow for any chunk size at all. ```python ds = xr.open_mfdataset('.nc', chunks={'x': 1000, 'y': 1000}, concat_dim='t') memory= 514043904 memory= 499363840 memory= 502509568 memory= 522133504 ``` I can also generate memory growth when cutting dask out entirely with `open_dataset(chunks=None)` and simply looping through different variables in the Dataset: ```python ds = xr.open_dataset('data.nc', chunks=None) # x-y dataset 5424 x 5424 for var in ['var1', 'var2', ... , 'var15']: data = ds[var].values print('memory =', process.memory_info().rss) memory = 246087680 memory = 280604672 memory = 285810688 memory = 315834368 memory = 344510464 memory = 374530048 memory = 403742720 memory = 403804160 memory = 404140032 memory = 403660800 memory = 404262912 memory = 403513344 memory = 404115456 memory = 403636224 ``` Though you can see that, strangely, the growth stops after several iterations. This isn't always true. Sometimes it asymptotes for a few interations and then begins growing again. I feel like I'm missing something fundamental about xarray memory management. It seems like a great impediment that arrays (or something) read from a Dataset are not garbage collected while looping through that Dataset, which kind of defeats the purpose of only accessing and working with the data you need in the first place. I have to access rather large chunks of data at a time, so being able to discard that slice of data and move onto the next one without filling up the RAM is a big deal. Any ideas what's going on? Or what I'm missing? print(ds) # from open_mfdataset() <xarray.Dataset> Dimensions: (band: 1, number_of_image_bounds: 2, number_of_time_bounds: 2, t: 4, x: 5424, y: 5424) Coordinates: * y (y) float32 0.151844 0.151788 ... * x (x) float32 -0.151844 -0.151788 ... * t (t) datetime64[ns] 2018-05-25T00:36:02.796268032 ... Data variables: data (t, y, x) float32 dask.array<shape=(4, 5424, 5424), chunksize=(1, 4000, 4000)> INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.16.8-300.fc28.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.4 pandas: 0.22.0 numpy: 1.14.3 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: 0.5.1 h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.17.4 distributed: 1.21.8 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: None setuptools: 39.1.0 pip: 10.0.1 conda: None pytest: None IPython: 6.4.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2186/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

0 rows from issues_id in issues_labels
13 rows from issue in issue_comments