home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 326533369

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
326533369 MDU6SXNzdWUzMjY1MzMzNjk= 2186 Memory leak while looping through a Dataset 12929327 closed 0     13 2018-05-25T13:53:31Z 2022-03-08T10:00:07Z 2019-01-14T21:09:36Z NONE      

I'm encountering a detrimental memory leak when simply accessing data from a Dataset repeatedly within a loop. I'm opening netCDF files concatenated in time, and looping through time to create plots. In this case the x-y slices are about 5000 x 5000 in size

```python import xarray as xr import os, psutil ds = xr.open_mfdataset('*.nc', chunks={'x': 4000, 'y': 4000}, concat_dim='t') for k in range(ds.dims['t']): data = ds.datavar[k,:,:].values print('memory=', process.memory_info().rss)

memory= 566484992 memory= 823836672 memory= 951439360 memory= 1039261696 I tried explicitly dereferencing the array by callingdel data``` at the end of each iteration, which reduces the memory growth a little bit, but not much.

Strangely, in this simplified example I can greatly reduce the memory growth by using much smaller chunk sizes, but in my real-world example, opening all data with smaller chunk sizes does not mitigate the problem. Either way, it's not clear to me why the memory usage should grow for any chunk size at all.

```python ds = xr.open_mfdataset('*.nc', chunks={'x': 1000, 'y': 1000}, concat_dim='t')

memory= 514043904 memory= 499363840 memory= 502509568 memory= 522133504 ```

I can also generate memory growth when cutting dask out entirely with open_dataset(chunks=None) and simply looping through different variables in the Dataset:

```python ds = xr.open_dataset('data.nc', chunks=None) # x-y dataset 5424 x 5424 for var in ['var1', 'var2', ... , 'var15']: data = ds[var].values print('memory =', process.memory_info().rss)

memory = 246087680 memory = 280604672 memory = 285810688 memory = 315834368 memory = 344510464 memory = 374530048 memory = 403742720 memory = 403804160 memory = 404140032 memory = 403660800 memory = 404262912 memory = 403513344 memory = 404115456 memory = 403636224 ```

Though you can see that, strangely, the growth stops after several iterations. This isn't always true. Sometimes it asymptotes for a few interations and then begins growing again.

I feel like I'm missing something fundamental about xarray memory management. It seems like a great impediment that arrays (or something) read from a Dataset are not garbage collected while looping through that Dataset, which kind of defeats the purpose of only accessing and working with the data you need in the first place. I have to access rather large chunks of data at a time, so being able to discard that slice of data and move onto the next one without filling up the RAM is a big deal.

Any ideas what's going on? Or what I'm missing?

print(ds) # from open_mfdataset() <xarray.Dataset> Dimensions: (band: 1, number_of_image_bounds: 2, number_of_time_bounds: 2, t: 4, x: 5424, y: 5424) Coordinates: * y (y) float32 0.151844 0.151788 ... * x (x) float32 -0.151844 -0.151788 ... * t (t) datetime64[ns] 2018-05-25T00:36:02.796268032 ... Data variables: data (t, y, x) float32 dask.array<shape=(4, 5424, 5424), chunksize=(1, 4000, 4000)> INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.16.8-300.fc28.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.4 pandas: 0.22.0 numpy: 1.14.3 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: 0.5.1 h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.17.4 distributed: 1.21.8 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: None setuptools: 39.1.0 pip: 10.0.1 conda: None pytest: None IPython: 6.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2186/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 13 rows from issue in issue_comments
Powered by Datasette · Queries took 0.908ms · About: xarray-datasette