issue_comments
7 rows where issue = 479190812 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- open_mfdataset memory leak, very simple case. v0.12 · 7 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1416446874 | https://github.com/pydata/xarray/issues/3200#issuecomment-1416446874 | https://api.github.com/repos/pydata/xarray/issues/3200 | IC_kwDOAMm_X85UbUOa | deeplycloudy 1325771 | 2023-02-03T21:52:57Z | 2023-02-03T21:52:57Z | CONTRIBUTOR | I was iterating today over a large dataset loaded with I can confirm that
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset memory leak, very simple case. v0.12 479190812 | |
530800751 | https://github.com/pydata/xarray/issues/3200#issuecomment-530800751 | https://api.github.com/repos/pydata/xarray/issues/3200 | MDEyOklzc3VlQ29tbWVudDUzMDgwMDc1MQ== | floschl 1262767 | 2019-09-12T12:24:12Z | 2019-09-12T12:36:02Z | NONE | I have observed a similar memleak (config see below). It occurs for both parameters engine=netcdf4 and engine=h5netcdf. Example for loading a 1.2GB netCDF file:
In contrast, the memory is just released with a
Output of
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:18:42)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-62-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.2
xarray: 0.12.3
pandas: 0.25.1
numpy: 1.16.4
scipy: 1.2.1
netCDF4: 1.5.1.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.3.0
distributed: 2.3.2
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 41.0.1
pip: 19.2.3
conda: None
pytest: 5.0.1
IPython: 7.7.0
sphinx: None
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset memory leak, very simple case. v0.12 479190812 | |
520571376 | https://github.com/pydata/xarray/issues/3200#issuecomment-520571376 | https://api.github.com/repos/pydata/xarray/issues/3200 | MDEyOklzc3VlQ29tbWVudDUyMDU3MTM3Ng== | bsu-wrudisill 19933988 | 2019-08-12T19:56:09Z | 2019-08-12T19:56:09Z | NONE | Awesome, thanks @shoyer and @crusaderky for looking into this. I've tested it with the h5netcdf engine and it the leak is mostly mitigated... for the simple case at least. Unfortunately the actual model files that I'm working with do not appear to be compatible with h5py (I believe related to this issue https://github.com/h5py/h5py/issues/719). But that's another problem entirely! @crusaderky, I will hopefully get to trying your suggestions 3) and 4). As for your last point, I haven't tested explicitly, but yes I believe that it does continue to grow linearly more iterations. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset memory leak, very simple case. v0.12 479190812 | |
520182257 | https://github.com/pydata/xarray/issues/3200#issuecomment-520182257 | https://api.github.com/repos/pydata/xarray/issues/3200 | MDEyOklzc3VlQ29tbWVudDUyMDE4MjI1Nw== | shoyer 1217238 | 2019-08-10T21:53:39Z | 2019-08-10T21:53:39Z | MEMBER | Also, if you're having memory issues I also would definitely recommend upgrading to a newer version of xarray. There was a recent fix that helps ensure that files get automatically closed when they are garbage collected, even if you don't call |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset memory leak, very simple case. v0.12 479190812 | |
520182139 | https://github.com/pydata/xarray/issues/3200#issuecomment-520182139 | https://api.github.com/repos/pydata/xarray/issues/3200 | MDEyOklzc3VlQ29tbWVudDUyMDE4MjEzOQ== | shoyer 1217238 | 2019-08-10T21:51:25Z | 2019-08-10T21:52:24Z | MEMBER | Thanks for the profiling script. I ran a few permutations of this:
- Here are some plots:
So in conclusion, it looks like there are memory leaks:
1. when using netCDF4-Python (I was also able to confirm these without using xarray at all, just using (1) looks like by far the bigger issue, which you can work around by switching to scipy or h5netcdf to read your files. (2) is an issue for xarray. We do do some caching, specifically with our backend file manager, but given that issues only seem to appear when using Note: I modified your script to xarray's file cache size to 1, which helps smooth out the memory usage: ```python def CreateTestFiles(): # create a bunch of files xlen = int(1e2) ylen = int(1e2) xdim = np.arange(xlen) ydim = np.arange(ylen)
@profile def ReadFiles(): # for i in range(100): # ds = xr.open_dataset('testfile_{}.nc'.format(i), engine='netcdf4') # ds.close() ds = xr.open_mfdataset(glob.glob('testfile_*'), engine='h5netcdf', concat_dim='time') ds.close() if name == 'main': # write out files for testing CreateTestFiles()
``` |
{ "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset memory leak, very simple case. v0.12 479190812 | |
520136799 | https://github.com/pydata/xarray/issues/3200#issuecomment-520136799 | https://api.github.com/repos/pydata/xarray/issues/3200 | MDEyOklzc3VlQ29tbWVudDUyMDEzNjc5OQ== | crusaderky 6213168 | 2019-08-10T10:10:11Z | 2019-08-10T10:11:18Z | MEMBER | Oh but first and foremost - CPython memory management is designed so that, when PyMem_Free() is invoked, CPython will hold on to it and not invoke the underlying free() syscall, hoping to reuse it on the next PyMem_Alloc(). An increase in RAM usage from 160 to 200MB could very well be explained by this. Try increasing the number of loops in your test 100-fold and see if you get a 100-fold increase in memory usage too (from 160MB to 1.2GB). If yes, it's a real leak; if it remains much more contained, it's normal CPython behaviour. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset memory leak, very simple case. v0.12 479190812 | |
520136482 | https://github.com/pydata/xarray/issues/3200#issuecomment-520136482 | https://api.github.com/repos/pydata/xarray/issues/3200 | MDEyOklzc3VlQ29tbWVudDUyMDEzNjQ4Mg== | crusaderky 6213168 | 2019-08-10T10:06:07Z | 2019-08-10T10:06:07Z | MEMBER | Hi, xarray doesn't have any global objects that I know of that can cause the leak - I'm willing to bet on the underlying libraries.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset memory leak, very simple case. v0.12 479190812 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 5