home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 627735640 and user = 6815844 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • fujiisoup · 2 ✖

issue 1

  • xarray.DataArray.stack load data into memory · 2 ✖

author_association 1

  • MEMBER 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
636619598 https://github.com/pydata/xarray/issues/4113#issuecomment-636619598 https://api.github.com/repos/pydata/xarray/issues/4113 MDEyOklzc3VlQ29tbWVudDYzNjYxOTU5OA== fujiisoup 6815844 2020-06-01T05:24:35Z 2020-06-01T05:24:35Z MEMBER

Reading with chunks load the memory more than reading without chunks, but not loading an amount of memory equals to the size of the array (300MB for a 800MB array in the example below). And by the way, also loading up the memory a bit more when stacking.

I think it depends on the chunk size. If I use the chunks chunks=dict(x=128, y=128), the memory usage is RAM: 118.14 MB da: 800.0 MB RAM: 119.14 MB RAM: 125.59 MB RAM: 943.79 MB

When stacking a chunked array, only chunks alongside the first stacking dimension are conserved, and chunks along the second stacking dimension seem to be merged.

I am not sure where 512 comes from in your example (maybe dask does something). If I work with chunks=dict(x=128, y=128), the chunksize after the stacking was (100, 16384), which is reasonable (z=100, px=(128, 128)).

A workaround could have been to save the data already stacked, but "MultiIndex cannot yet be serialized to netCDF".

You can do reset_index before saving it into the netCDF, but it requires another computation when creating the MultiIndex after loading.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.DataArray.stack load data into memory 627735640
636418772 https://github.com/pydata/xarray/issues/4113#issuecomment-636418772 https://api.github.com/repos/pydata/xarray/issues/4113 MDEyOklzc3VlQ29tbWVudDYzNjQxODc3Mg== fujiisoup 6815844 2020-05-31T04:21:29Z 2020-05-31T04:21:29Z MEMBER

Thank you for raising an issue. I confirmed this problem is reproduced.

Since our Lazyarray does not support the reshaping, it loads the data automatically. This automatic loading happens in many other operations.

For example, if you multiply your array by a scalar, python mda = da *2 It also loads the data into memory. Maybe we should improve the documentation.

FYI, using dask arrays may solve this problem. To open the file with dask, you could add chunks keywords, python da = xr.open_dataarray("da.nc", chunks={'x': 16, 'y': 16}) Then, the reshape will be a lazy operation too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.DataArray.stack load data into memory 627735640

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 36.0ms · About: xarray-datasette