home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 585668294

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3213#issuecomment-585668294 https://api.github.com/repos/pydata/xarray/issues/3213 585668294 MDEyOklzc3VlQ29tbWVudDU4NTY2ODI5NA== 18172466 2020-02-13T10:55:15Z 2020-02-13T10:55:15Z NONE

Thank you all for making xarray and its tight development with dask so great!

As @shoyer mentioned

Yes, it would be useful (eventually) to have lazy loading of sparse arrays from disk, like we want we currently do for dense arrays. This would indeed require knowing that the indices are sorted.

I am wondering, if creating a lazy & sparse xarray Dataset/DataArray is already possible? Especially when creating the sparse part at runtime, and loading only the data part: Assume two differently sampled - and lazy dask - DataArrays are merged/combined along a coordinate axis into a Dataset. Then the smaller (= less dense) DataVariable is filled with NaNs. As far as I experienced the current behaviour is, that each NaN value requires memory.

That issue might be formulated this way: Dask integration enables xarray to scale to big data, only as long as the data has no sparse character. Do you agree on that formulation or am I missing something fundamental?

A code example reproducing that issue is described here: https://stackoverflow.com/q/60117268/9657367

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  479942077
Powered by Datasette · Queries took 0.631ms · About: xarray-datasette