home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where user = 40465719 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 2

  • How should xarray use/support sparse arrays? 2
  • Support for jagged array 1

user 1

  • Material-Scientist · 3 ✖

author_association 1

  • NONE 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1457965323 https://github.com/pydata/xarray/issues/1482#issuecomment-1457965323 https://api.github.com/repos/pydata/xarray/issues/1482 IC_kwDOAMm_X85W5skL Material-Scientist 40465719 2023-03-07T10:58:12Z 2023-03-07T10:58:12Z NONE

As I am not aware of implementation details I am not sure there is a useful link, but maybe progress in #3213 supporting sparse arrays can solve also the jagged array issue.

Long time ago I asked there a question about how xarray supports sparse arrays. But what I actually meant were "Jagged Arrays". I just was not aware of that term and stumbled over it some days ago the very first time.

I also recently came across awkward/jagged/ragged arrays, and that's exactly how I would like to operate on multi-dimensional (2D in referenced case) sparse data:

Instead of allocating memory with NaNs, empty slots are just not materialized by using pd.SparseDtype("float", np.nan) dtype.

You basically create a dense duck array from sparse dtypes, as the Pandas sparse user guide shows:

So, all the shape, dtype, and ndim requirements are satisfied, and xarray could implement this as a duck array.

And while you can already wrap sparse duck arrays with xr.Variable, I'm not sure if the wrapper maintains the dtype:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
1014462537 https://github.com/pydata/xarray/issues/3213#issuecomment-1014462537 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X848d3hJ Material-Scientist 40465719 2022-01-17T12:20:18Z 2022-01-17T12:20:18Z NONE

I know. But having sparse data I can treat as if it were dense allows me to unstack without running out of memory, and then ffill & downsample the data in chunks:

It would be nice if xarray automatically converted the data from sparse back to dense for doing operations on the chunks just like pandas does.

The picture shows that I'm already using nbytes to determine the size.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
1013887301 https://github.com/pydata/xarray/issues/3213#issuecomment-1013887301 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X848brFF Material-Scientist 40465719 2022-01-16T14:35:29Z 2022-01-16T14:40:13Z NONE

I would prefer to retain the dense representation, but with tricks to keep the data of sparse type in memory.

Look at the following example with pandas multiindex & sparse dtype:

The dense data uses ~40 MB of memory, while the dense representation with sparse dtypes uses only ~0.5 kB of memory!

And while you can import dataframes with the sparse=True keyword, the size seems to be displayed inaccurately (both are the same size?), and we cannot examine the data like we can with pandas multiindex + sparse dtype:

Besides, a lot of operations are not available on sparse xarray data variables (i.e. if I wanted to group by price level for ffill & downsampling):

So, it would be nice if xarray adopted pandas’ approach of unstacking sparse data.

In the end, you could extract all the non-NaN values and write them to a sparse storage format, such as TileDB sparse arrays. cc: @stavrospapadopoulos

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.91ms · About: xarray-datasette