home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 1512460818 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • deepgabani8 3
  • DocOtak 1
  • shoyer 1
  • keewis 1

author_association 3

  • NONE 3
  • MEMBER 2
  • CONTRIBUTOR 1

issue 1

  • Memory leak - xr.open_dataset() not releasing memory. · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1369656073 https://github.com/pydata/xarray/issues/7404#issuecomment-1369656073 https://api.github.com/repos/pydata/xarray/issues/7404 IC_kwDOAMm_X85Ro0sJ deepgabani8 60647051 2023-01-03T11:22:21Z 2023-01-03T11:23:55Z NONE

Thanks @DocOtak for the observation.

This is valid only when iterating over the same file. I am observing the same behavior. Here is a memory usage against the iterations.

When I tried to validate this by iterating over different files, the memory is gradually increasing. Here is a memory usage.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory leak - xr.open_dataset() not releasing memory. 1512460818
1367488859 https://github.com/pydata/xarray/issues/7404#issuecomment-1367488859 https://api.github.com/repos/pydata/xarray/issues/7404 IC_kwDOAMm_X85Rgjlb DocOtak 868027 2022-12-29T17:45:33Z 2022-12-29T17:45:33Z CONTRIBUTOR

I've personally seen a lot of what looks like memory reuse in numpy and related libraries. I don't think any of this happens explicitly but have never investigated. I would have some expectation that if memory was not being released as expected, that opening and closing the dataset in a loop would increase memory usage, it didn't on the recent library versions I have.

```python Start: 89.71875 MiB Before opening file: 90.203125 MiB After opening file: 96.6875 MiB Filename: test.py

Line # Mem usage Increment Occurrences Line Contents

 6     90.2 MiB     90.2 MiB           1   @profile
 7                                         def main():
 8     90.2 MiB      0.0 MiB           1       path = 'ECMWF_ERA-40_subset.nc'
 9     90.2 MiB      0.0 MiB           1       print(f"Before opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")
10     96.7 MiB     -0.1 MiB        1001       for i in range(1000):
11     96.7 MiB      6.4 MiB        1000           with xr.open_dataset(path) as ds:
12     96.7 MiB     -0.1 MiB        1000             ...
13     96.7 MiB      0.0 MiB           1       print(f"After opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")

End: 96.6875 MiB ```

Show Versions ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.13 (default, Jul 23 2022, 17:00:57) [Clang 13.1.6 (clang-1316.0.21.2.5)] python-bits: 64 OS: Darwin OS-release: 22.1.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0 xarray: 2022.11.0 pandas: 1.4.3 numpy: 1.23.5 scipy: None netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.3 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 56.0.0 pip: 22.0.4 conda: None pytest: 6.2.5 IPython: 8.4.0 sphinx: 5.1.1 ```
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory leak - xr.open_dataset() not releasing memory. 1512460818
1367443148 https://github.com/pydata/xarray/issues/7404#issuecomment-1367443148 https://api.github.com/repos/pydata/xarray/issues/7404 IC_kwDOAMm_X85RgYbM deepgabani8 60647051 2022-12-29T16:20:41Z 2022-12-29T16:20:41Z NONE

Thanks @shoyer , but closing the dataset explicitly also doesn't seem to be releasing the memory. ```bash Start: 185.5078125 MiB Before opening file: 186.28515625 MiB After opening file: 307.75390625 MiB Filename: temp.py

Line # Mem usage Increment Occurrences Line Contents

 7    186.1 MiB    186.1 MiB           1   @profile
 8                                         def main():
 9    186.1 MiB      0.0 MiB           1       path = 'ECMWF_ERA-40_subset.nc'
10    186.1 MiB      0.0 MiB           1       gc.collect()
11    186.3 MiB      0.2 MiB           1       print(f"Before opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")
12    307.8 MiB    121.5 MiB           1       ds = xr.open_dataset(path)
13    307.8 MiB      0.0 MiB           1       ds.close()
14    307.8 MiB      0.0 MiB           1       del ds
15    307.8 MiB      0.0 MiB           1       gc.collect()
16    307.8 MiB      0.0 MiB           1       print(f"After opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")

End: 307.75390625 MiB ```

I also tried the context manager but the same memory consumption. ```bash Start: 185.5625 MiB Before opening file: 186.36328125 MiB After opening file: 307.265625 MiB Filename: temp.py

Line # Mem usage Increment Occurrences Line Contents

 7    186.2 MiB    186.2 MiB           1   @profile
 8                                         def main():
 9    186.2 MiB      0.0 MiB           1       path = 'ECMWF_ERA-40_subset.nc'
10    186.2 MiB      0.0 MiB           1       gc.collect()
11    186.4 MiB      0.2 MiB           1       print(f"Before opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")
12    307.3 MiB    120.9 MiB           1       with xr.open_dataset(path) as ds:
13    307.3 MiB      0.0 MiB           1           ds.close()
14    307.3 MiB      0.0 MiB           1           del ds
15    307.3 MiB      0.0 MiB           1       gc.collect()
16    307.3 MiB      0.0 MiB           1       print(f"After opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")

End: 307.265625 MiB ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory leak - xr.open_dataset() not releasing memory. 1512460818
1366880017 https://github.com/pydata/xarray/issues/7404#issuecomment-1366880017 https://api.github.com/repos/pydata/xarray/issues/7404 IC_kwDOAMm_X85ReO8R shoyer 1217238 2022-12-28T19:46:07Z 2022-12-28T19:46:07Z MEMBER

If you care about memory usage, you should explicitly close files after you use them, e.g., by calling ds.close() or by using a context manager. Does that work for you?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory leak - xr.open_dataset() not releasing memory. 1512460818
1366807012 https://github.com/pydata/xarray/issues/7404#issuecomment-1366807012 https://api.github.com/repos/pydata/xarray/issues/7404 IC_kwDOAMm_X85Rd9Hk deepgabani8 60647051 2022-12-28T17:21:28Z 2022-12-28T17:21:28Z NONE

It still shows similar memory consumption. ```bash Start: 185.6015625 MiB Before opening file: 186.24609375 MiB After opening file: 307.1328125 MiB Filename: temp.py

Line # Mem usage Increment Occurrences Line Contents

 7    186.0 MiB    186.0 MiB           1   @profile
 8                                         def main():
 9    186.0 MiB      0.0 MiB           1       path = 'ECMWF_ERA-40_subset.nc'
10    186.0 MiB      0.0 MiB           1       gc.collect()
11    186.2 MiB      0.2 MiB           1       print(f"Before opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")
12    307.1 MiB    120.9 MiB           1       ds = xr.open_dataset(path)
13    307.1 MiB      0.0 MiB           1       del ds
14    307.1 MiB      0.0 MiB           1       gc.collect()
15    307.1 MiB      0.0 MiB           1       print(f"After opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")

End: 307.1328125 MiB ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory leak - xr.open_dataset() not releasing memory. 1512460818
1366545996 https://github.com/pydata/xarray/issues/7404#issuecomment-1366545996 https://api.github.com/repos/pydata/xarray/issues/7404 IC_kwDOAMm_X85Rc9ZM keewis 14808389 2022-12-28T10:28:22Z 2022-12-28T10:28:22Z MEMBER

I'm not sure how memory_profiler calculates the memory usage, but I suspect that this happens because python's garbage collector does not have to run immediately after the del.

Can you try manually triggering the garbage collector? ```python import gc import os import psutil import xarray as xr from memory_profiler import profile

@profile def main(): path = 'ECMWF_ERA-40_subset.nc' gc.collect() print(f"Before opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB") ds = xr.open_dataset(path) del ds gc.collect() print(f"After opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")

if name == 'main': print(f"Start: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB") main() print(f"End: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB") ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory leak - xr.open_dataset() not releasing memory. 1512460818

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1515.338ms · About: xarray-datasette