home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1013887301

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3213#issuecomment-1013887301 https://api.github.com/repos/pydata/xarray/issues/3213 1013887301 IC_kwDOAMm_X848brFF 40465719 2022-01-16T14:35:29Z 2022-01-16T14:40:13Z NONE

I would prefer to retain the dense representation, but with tricks to keep the data of sparse type in memory.

Look at the following example with pandas multiindex & sparse dtype:

The dense data uses ~40 MB of memory, while the dense representation with sparse dtypes uses only ~0.5 kB of memory!

And while you can import dataframes with the sparse=True keyword, the size seems to be displayed inaccurately (both are the same size?), and we cannot examine the data like we can with pandas multiindex + sparse dtype:

Besides, a lot of operations are not available on sparse xarray data variables (i.e. if I wanted to group by price level for ffill & downsampling):

So, it would be nice if xarray adopted pandas’ approach of unstacking sparse data.

In the end, you could extract all the non-NaN values and write them to a sparse storage format, such as TileDB sparse arrays. cc: @stavrospapadopoulos

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  479942077
Powered by Datasette · Queries took 315.634ms · About: xarray-datasette