home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1524642393

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1524642393 I_kwDOAMm_X85a4DJZ 7428 Avoid instantiating the data in prepare_variable 90008 open 0     0 2023-01-08T19:18:49Z 2023-01-09T06:25:52Z   CONTRIBUTOR      

Is your feature request related to a problem?

I'm trying to extend the features of xarray for a new backend I'm developing internally. The main use case that we are trying to open a multi 100's of GB dataset, slice out a smaller dataset (10s of GB) and write it.

However, when we try to use functions like prepare_variable, the way they are currently written, they implicitely instantiate the whole data, (potentially 10s of GB) which incurs a huge "time cost" at a surprising (to me) point in the code.

https://github.com/pydata/xarray/blob/6e77f5e8942206b3e0ab08c3621ade1499d8235b/xarray/backends/h5netcdf_.py#L338

Describe the solution you'd like

Would it be possible to just remove the second return value from prepare_variable? It isn't particuarly "useful" and easy to obtain from the inputs to the function.

Describe alternatives you've considered

I'm proably going to create a new method, with a not so well chosen name like prepare_variable_no_data that does the above, but only for my backend. My code path that needs this only uses our custom backend.

Additional context

I think this would be useful, in general for other users that need more out of memory computation. I've found that you really have to "buy into" dask, all the way to the end, if you want to see any benefits. As such, if somebody used a dask array, this would create a serial choke point in:

https://github.com/pydata/xarray/blob/6e77f5e8942206b3e0ab08c3621ade1499d8235b/xarray/backends/common.py#L308

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7428/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.764ms · About: xarray-datasette