home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

1 row where repo = 13221727 and user = 35295509 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 1

state 1

  • closed 1

repo 1

  • xarray · 1 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
938340119 MDU6SXNzdWU5MzgzNDAxMTk= 5585 Memory Leak open_mfdataset N4321D 35295509 closed 0     2 2021-07-06T23:37:54Z 2023-09-12T16:15:37Z 2023-09-12T15:40:58Z NONE      

I used xarray to combine a couple of h5py saved numpy arrays. It worked so far, but recently I updated to version 0.18.2 and my code stopped working. The kernel died all the time, because of lack of memory.

Whenever opening multiple hdf files with open_mfdataset and trying to save them, memory (incl swap) completely fills up before any writing happens. If all the files fit in memory, the script works, but if the files are more than would fit, it crashes. (when i disable swap it crashed with around 30 files in the example script, with swap at around 50)

I forgot to note from which version of xarray I was coming where my script worked (I think it was 0.16?).

Python 3.8.2 (default, Mar 26 2020, 15:53:00) IPython 7.22.0 Xarray version: 0.18.2 (in anaconda)

to reproduce: Create Data files (warning this takes a couple of GBs):

``` import numpy as np import xarray as xr import h5py

def makefile(data, n, nfiles): for i in range(nfiles): with h5py.File(f"{i}.h5", 'w') as file: for par in range(n): file.create_dataset(f'data/{par}', data=data, dtype='f4', maxshape=(None,), chunks= (32000,), # (dlength,), compression='gzip', compression_opts=5, fletcher32=True, shuffle=True, )

data = np.random.randint(0, 0xFFFF, int(2e7)) makefile(data, 10, 50) # ~50 files is enough to create an error on my 16GB RAM 24GB swap, increase if you have more RAM?

```

Load files and save as xr dataset netcdf: ``` from dask.diagnostics import ProgressBar ProgressBar().register() # see something happening

load files:

ds = xr.open_mfdataset("*.h5", parallel=True, combine='nested', concat_dim='phony_dim_0', group='/data')

save files:

save_opts = {key: {'zlib': True, # change to blosc whenever available in xarray 'complevel': 5, 'shuffle': True, 'fletcher32': True, } for key in ds} ds.to_netcdf('delme.h5', encoding=save_opts, mode="w", #engine="h5netcdf", # "netcdf4", "scipy", "h5netcdf" engine='netcdf4', )

wait for kernel to die because of mem overload.

``` output: Kernel restarted after around 8%, onpy 96kb of data was written to the disk

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5585/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 28.696ms · About: xarray-datasette