issue_comments: 1013887301
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/3213#issuecomment-1013887301 | https://api.github.com/repos/pydata/xarray/issues/3213 | 1013887301 | IC_kwDOAMm_X848brFF | 40465719 | 2022-01-16T14:35:29Z | 2022-01-16T14:40:13Z | NONE | I would prefer to retain the dense representation, but with tricks to keep the data of sparse type in memory. Look at the following example with pandas multiindex & sparse dtype:
The dense data uses ~40 MB of memory, while the dense representation with sparse dtypes uses only ~0.5 kB of memory! And while you can import dataframes with the sparse=True keyword, the size seems to be displayed inaccurately (both are the same size?), and we cannot examine the data like we can with pandas multiindex + sparse dtype:
Besides, a lot of operations are not available on sparse xarray data variables (i.e. if I wanted to group by price level for ffill & downsampling):
So, it would be nice if xarray adopted pandas’ approach of unstacking sparse data. In the end, you could extract all the non-NaN values and write them to a sparse storage format, such as TileDB sparse arrays. cc: @stavrospapadopoulos |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
479942077 |