home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1060265915

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1060265915 I_kwDOAMm_X84_Ml-7 6013 Memory leak with `open_zarr` default chunking option 39069044 closed 0     3 2021-11-22T15:06:33Z 2023-11-10T03:08:35Z 2023-11-10T02:32:49Z CONTRIBUTOR      

What happened: I've been using xarray to open zarr datasets within a Flask app, and spent some time debugging a memory leak. What I found is that open_zarr() defaults to chunks='auto', rather than chunks=None which is the default for open_dataset(). The result is that open_zarr() ends up calling _maybe_chunk() on the dataset's variables by default.

For whatever reason this function is generating dask items that are not easily cleared from memory within the context of a Flask route, and memory usage continues to grow within my app, at least towards some plateau. This memory growth isn't reproducible outside of a Flask route, so it's a bit of a niche problem.

First proposal would be to simply align the default chunks argument between open_zarr() and open_dataset(). I'm happy to submit a PR there if this makes sense to others. The other more challenging piece would be to figure out whats going on in _maybe_chunk() to cause memory growth. The problem is specific to this function rather than any particular storage backend (other than the difference in default chunk args).

What you expected to happen: Memory usage should not grow when opening a zarr dataset within a Flask route.

Minimal Complete Verifiable Example:

```python from flask import Flask import xarray as xr import gc import dask.array as da

save a test dataset to zarr locally

ds_test = xr.Dataset({"foo": (["x", "y", "z"], da.random.random(size=(300,300,300)))}) ds_test.to_zarr('test.zarr', mode='w')

app = Flask(name)

ping this route repeatedly to see memory increase

@app.route('/open_zarr') def open_zarr(): # with default chunks='auto', memory grows, with chunks=None, memory is ok ds = xr.open_zarr('test.zarr', chunks='auto').compute() # Try to explicity clear memory but this doesn't help del ds gc.collect() return 'check memory'

if name == 'main': app.run(host='0.0.0.0', port=8080, debug=True) ```

Anything else we need to know?:

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.11.0-40-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.1 pandas: 1.3.4 numpy: 1.19.5 scipy: 1.7.2 netCDF4: 1.5.8 pydap: None h5netcdf: 0.11.0 h5py: 3.1.0 Nio: None zarr: 2.10.1 cftime: 1.5.1.1 nc_time_axis: 1.4.0 PseudoNetCDF: None rasterio: 1.2.10 cfgrib: 0.9.9.1 iris: None bottleneck: 1.3.2 dask: 2021.11.1 distributed: 2021.11.1 matplotlib: 3.4.3 cartopy: 0.20.1 seaborn: None numbagg: None fsspec: 2021.11.0 cupy: None pint: 0.18 sparse: None setuptools: 58.5.3 pip: 21.3.1 conda: None pytest: None IPython: 7.29.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6013/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 2.272ms · About: xarray-datasette