home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 479190812

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
479190812 MDU6SXNzdWU0NzkxOTA4MTI= 3200 open_mfdataset memory leak, very simple case. v0.12 19933988 open 0     7 2019-08-09T22:38:39Z 2023-02-03T22:58:32Z   NONE      

MCVE Code Sample

```python import glob import xarray as xr import numpy as np from memory_profiler import profile

def CreateTestFiles(): # create a bunch of files xlen = int(1e2) ylen = int(1e2) xdim = np.arange(xlen) ydim = np.arange(ylen)

    nfiles = 100
    for i in range(nfiles):
            data = np.random.rand(xlen, ylen, 1)
            datafile = xr.DataArray(data, coords=[xdim, ydim, i], dims=['x', 'y', 'time'])
            datafile.to_netcdf('testfiles/datafile_{}.nc'.format(i))

@profile def ReadFiles(): xr.open_mfdataset(glob.glob('testfiles/*'), concat_dim='time')

if name == 'main': # write out files for testing CreateTestFiles()

    # loop thru file read step
    for i in range(100):
            ReadFiles()

~ ~ ```

usage:

mprof run simplest_case.py mprof plot

(mprof is a python memory profiling library)

Problem Description

dask version 1.1.4 xarray version 0.12 python 3.7.3

There appears to be a persistent memory leak in open_mfdataset. I'm creating a model calibration script that runs for ~1000 iterations, opening and closing the same set of files (dimensions are the same, but the data is different) with each iteration. I eventually run out of memory because of the leak. This simple case captures the same behavior. Closing the files with .close() does not fix the problem.

Is there a work around for this? I've perused some of the issues but cannot tell if this has been resolved.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-693.17.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.0 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: 1.5.5 zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 1.1.4 distributed: 1.26.0 matplotlib: 3.0.2 cartopy: 0.17.0 seaborn: None setuptools: 41.0.1 pip: 19.1.1 conda: None pytest: None IPython: 7.3.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3200/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 7 rows from issue in issue_comments
Powered by Datasette · Queries took 0.687ms · About: xarray-datasette