github: issues: 1 row where repo = 13221727 and user = 35295509 sorted by updated

1 row where repo = 13221727 and user = 35295509 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
938340119	MDU6SXNzdWU5MzgzNDAxMTk=	5585	Memory Leak open_mfdataset	N4321D 35295509	closed	0			2	2021-07-06T23:37:54Z	2023-09-12T16:15:37Z	2023-09-12T15:40:58Z	NONE				I used xarray to combine a couple of h5py saved numpy arrays. It worked so far, but recently I updated to version 0.18.2 and my code stopped working. The kernel died all the time, because of lack of memory. Whenever opening multiple hdf files with open_mfdataset and trying to save them, memory (incl swap) completely fills up before any writing happens. If all the files fit in memory, the script works, but if the files are more than would fit, it crashes. (when i disable swap it crashed with around 30 files in the example script, with swap at around 50) I forgot to note from which version of xarray I was coming where my script worked (I think it was 0.16?). Python 3.8.2 (default, Mar 26 2020, 15:53:00) IPython 7.22.0 Xarray version: 0.18.2 (in anaconda) to reproduce: Create Data files (warning this takes a couple of GBs): ``` import numpy as np import xarray as xr import h5py def makefile(data, n, nfiles): for i in range(nfiles): with h5py.File(f"{i}.h5", 'w') as file: for par in range(n): file.create_dataset(f'data/{par}', data=data, dtype='f4', maxshape=(None,), chunks= (32000,), # (dlength,), compression='gzip', compression_opts=5, fletcher32=True, shuffle=True, ) data = np.random.randint(0, 0xFFFF, int(2e7)) makefile(data, 10, 50) # ~50 files is enough to create an error on my 16GB RAM 24GB swap, increase if you have more RAM? ``` Load files and save as xr dataset netcdf: ``` from dask.diagnostics import ProgressBar ProgressBar().register() # see something happening load files: ds = xr.open_mfdataset(".h5", parallel=True, combine='nested', concat_dim='phony_dim_0', group='/data') save files: save_opts = {key: {'zlib': True, # change to blosc whenever available in xarray 'complevel': 5, 'shuffle': True, 'fletcher32': True, } for key in ds} ds.to_netcdf('delme.h5', encoding=save_opts, mode="w", #engine="h5netcdf", # "netcdf4", "scipy", "h5netcdf" engine='netcdf4', ) wait for kernel to die because of mem overload. ``` output:* Kernel restarted after around 8%, onpy 96kb of data was written to the disk	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5585/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

1 row where repo = 13221727 and user = 35295509 sorted by updated_at descending

load files:

save files:

wait for kernel to die because of mem overload.

Advanced export