home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "NONE", issue = 479942077 and user = 40465719 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • Material-Scientist · 2 ✖

issue 1

  • How should xarray use/support sparse arrays? · 2 ✖

author_association 1

  • NONE · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1014462537 https://github.com/pydata/xarray/issues/3213#issuecomment-1014462537 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X848d3hJ Material-Scientist 40465719 2022-01-17T12:20:18Z 2022-01-17T12:20:18Z NONE

I know. But having sparse data I can treat as if it were dense allows me to unstack without running out of memory, and then ffill & downsample the data in chunks:

It would be nice if xarray automatically converted the data from sparse back to dense for doing operations on the chunks just like pandas does.

The picture shows that I'm already using nbytes to determine the size.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
1013887301 https://github.com/pydata/xarray/issues/3213#issuecomment-1013887301 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X848brFF Material-Scientist 40465719 2022-01-16T14:35:29Z 2022-01-16T14:40:13Z NONE

I would prefer to retain the dense representation, but with tricks to keep the data of sparse type in memory.

Look at the following example with pandas multiindex & sparse dtype:

The dense data uses ~40 MB of memory, while the dense representation with sparse dtypes uses only ~0.5 kB of memory!

And while you can import dataframes with the sparse=True keyword, the size seems to be displayed inaccurately (both are the same size?), and we cannot examine the data like we can with pandas multiindex + sparse dtype:

Besides, a lot of operations are not available on sparse xarray data variables (i.e. if I wanted to group by price level for ffill & downsampling):

So, it would be nice if xarray adopted pandas’ approach of unstacking sparse data.

In the end, you could extract all the non-NaN values and write them to a sparse storage format, such as TileDB sparse arrays. cc: @stavrospapadopoulos

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.602ms · About: xarray-datasette