html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2186#issuecomment-393846595,https://api.github.com/repos/pydata/xarray/issues/2186,393846595,MDEyOklzc3VlQ29tbWVudDM5Mzg0NjU5NQ==,6404167,2018-06-01T10:57:09Z,2018-06-01T10:57:09Z,CONTRIBUTOR,"@meridionaljet I might've run into the same issue, but I'm not 100% sure. In my case I'm looping over a Dataset containing variables from 3 different files, all of them with a `.sel` and some of them with a more complicated (dask) calculation. (still, mostly sums and divisions) The leak seems mostly happening for those with the calculation.
Can you see what happens when using the distributed client? Put `client = dask.distributed.Client()` in front of your code. This leads to many `distributed.utils_perf - WARNING - full garbage collections took 40% CPU time recently (threshold: 10%)` messages being shown for me, indeed pointing to something garbage-collecty.
Also, for me the memory behaviour looks very different between the threaded and multi-process scheduler, although they both leak. (I'm not sure if leaking is the right term here). Maybe you can try `memory_profiler`?
I've tried without succes:
- explicitly deleting `ds[varname]` and running `gc.collect()`
- explicitly clearing dask cache with `client.cancel` and `client.restart`
- Moving the leaky code in its own function (should not matter, but I seemed to remember that it sometimes helps for garbage collect in edge cases)
- Explicitly triggering computation with either dask `persist` or xarray `load` and then explicitly deleting the result
For my messy and very much work in process code, look here: https://github.com/Karel-van-de-Plassche/QLKNN-develop/blob/master/qlknn/dataset/hypercube_to_pandas.py
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326533369