home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 2276352251

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2276352251 I_kwDOAMm_X86HrmD7 8994 Improving performance of open_datatree 35968931 open 0     4 2024-05-02T19:43:17Z 2024-05-03T15:25:33Z   MEMBER      

What is your issue?

The implementation of open_datatree works, but is inefficient, because it calls open_dataset once for every group in the file. We should refactor this to improve the performance, which would fix issues like https://github.com/xarray-contrib/datatree/issues/330.

We discussed this in the datatree meeting, and my understanding is that concretely we need to:

  • [ ] Create an asv benchmark for open_datatree, probably involving first writing then benchmarking the opening of a special netCDF file that has no data but lots of groups.
  • [ ] Refactor the NetCDFDatastore class to only create one CachingFileManager object per file, not one per group, see https://github.com/pydata/xarray/blob/748bb3a328a65416022ec44ced8d461f143081b5/xarray/backends/netCDF4_.py#L406.
  • [ ] Refactor NetCDF4BackendEntrypoint.open_datatree to use an implementation that goes through NetCDFDatastore without calling the top-level xr.open_dataset again.
  • [ ] Check the performance of calling xr.open_datatree on a netCDF file has actually improved.

It would be great to get this done soon as part of the datatree integration project. @kmuehlbauer I know you were interested - are you willing / do you have time to take this task on?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8994/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 396.554ms · About: xarray-datasette