issue_comments: 393846595

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/2186#issuecomment-393846595	https://api.github.com/repos/pydata/xarray/issues/2186	393846595	MDEyOklzc3VlQ29tbWVudDM5Mzg0NjU5NQ==	6404167	2018-06-01T10:57:09Z	2018-06-01T10:57:09Z	CONTRIBUTOR	@meridionaljet I might've run into the same issue, but I'm not 100% sure. In my case I'm looping over a Dataset containing variables from 3 different files, all of them with a `.sel` and some of them with a more complicated (dask) calculation. (still, mostly sums and divisions) The leak seems mostly happening for those with the calculation. Can you see what happens when using the distributed client? Put `client = dask.distributed.Client()` in front of your code. This leads to many `distributed.utils_perf - WARNING - full garbage collections took 40% CPU time recently (threshold: 10%)` messages being shown for me, indeed pointing to something garbage-collecty. Also, for me the memory behaviour looks very different between the threaded and multi-process scheduler, although they both leak. (I'm not sure if leaking is the right term here). Maybe you can try `memory_profiler`? I've tried without succes: - explicitly deleting `ds[varname]` and running `gc.collect()` - explicitly clearing dask cache with `client.cancel` and `client.restart` - Moving the leaky code in its own function (should not matter, but I seemed to remember that it sometimes helps for garbage collect in edge cases) - Explicitly triggering computation with either dask `persist` or xarray `load` and then explicitly deleting the result For my messy and very much work in process code, look here: https://github.com/Karel-van-de-Plassche/QLKNN-develop/blob/master/qlknn/dataset/hypercube_to_pandas.py	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		326533369