issues
1 row where repo = 13221727 and user = 35295509 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
938340119 | MDU6SXNzdWU5MzgzNDAxMTk= | 5585 | Memory Leak open_mfdataset | N4321D 35295509 | closed | 0 | 2 | 2021-07-06T23:37:54Z | 2023-09-12T16:15:37Z | 2023-09-12T15:40:58Z | NONE | I used xarray to combine a couple of h5py saved numpy arrays. It worked so far, but recently I updated to version 0.18.2 and my code stopped working. The kernel died all the time, because of lack of memory. Whenever opening multiple hdf files with open_mfdataset and trying to save them, memory (incl swap) completely fills up before any writing happens. If all the files fit in memory, the script works, but if the files are more than would fit, it crashes. (when i disable swap it crashed with around 30 files in the example script, with swap at around 50) I forgot to note from which version of xarray I was coming where my script worked (I think it was 0.16?). Python 3.8.2 (default, Mar 26 2020, 15:53:00) IPython 7.22.0 Xarray version: 0.18.2 (in anaconda) to reproduce: Create Data files (warning this takes a couple of GBs): ``` import numpy as np import xarray as xr import h5py def makefile(data, n, nfiles): for i in range(nfiles): with h5py.File(f"{i}.h5", 'w') as file: for par in range(n): file.create_dataset(f'data/{par}', data=data, dtype='f4', maxshape=(None,), chunks= (32000,), # (dlength,), compression='gzip', compression_opts=5, fletcher32=True, shuffle=True, ) data = np.random.randint(0, 0xFFFF, int(2e7)) makefile(data, 10, 50) # ~50 files is enough to create an error on my 16GB RAM 24GB swap, increase if you have more RAM? ``` Load files and save as xr dataset netcdf: ``` from dask.diagnostics import ProgressBar ProgressBar().register() # see something happening load files:ds = xr.open_mfdataset("*.h5", parallel=True, combine='nested', concat_dim='phony_dim_0', group='/data') save files:save_opts = {key: {'zlib': True, # change to blosc whenever available in xarray 'complevel': 5, 'shuffle': True, 'fletcher32': True, } for key in ds} ds.to_netcdf('delme.h5', encoding=save_opts, mode="w", #engine="h5netcdf", # "netcdf4", "scipy", "h5netcdf" engine='netcdf4', ) wait for kernel to die because of mem overload.```
output:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5585/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);