home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1018257806

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/6174#issuecomment-1018257806 https://api.github.com/repos/pydata/xarray/issues/6174 1018257806 IC_kwDOAMm_X848sWGO 57705593 2022-01-21T07:40:55Z 2022-01-21T07:46:06Z CONTRIBUTOR

When I first posted this issue, I thought, the best solution is to just implement my proposed helper functions as part of the official xarray API. I don't think our project would add DataTree as a new dependency just for this as long as we have a very easy and viable solution of ourselves.

But now I have a new idea. At first, I noticed that open_dataset won't actually close the file handle, but reuse it later if needed. So, at least there is no performance problem with the current read setup. For writing, there should be an option in to_netcdf that ensures that xarray is not closing the file handle. xarray already uses a CachingFileManager to open NetCDF4-files: https://github.com/pydata/xarray/blob/0ffb0f42282a1b67c4950e90e1e4ecd146307aa8/xarray/backends/netCDF4_.py#L379-L381 That means, that manager already ensures that the same file handle is re-used in subsequent operations of to_netcdf with the same file, unless it's closed in the meantime. Closing is managed here: https://github.com/pydata/xarray/blob/0ffb0f42282a1b67c4950e90e1e4ecd146307aa8/xarray/backends/api.py#L1072-L1094 It's a bit intransparent, when closing is actually triggered in practice - especially if you only look at the current docstrings. I found that, in fact, setting compute=False in to_netcdf will prevent the closing until you explicitly call compute on the returned object: python for name, ds in zip(ds_names, ds_list): delayed = ds.to_netcdf(path, group=name, compute=False) delayed.compute() If this would be communicated more transparently in the docstrings, it would bring us a big step closer to the solution of this issue :slightly_smiling_face: Apart from that, there is only one problem left: Getting a full list of all groups contained in a NetCDF4 file so that we can read them all in. In DataTree, you fall back to using directly the NetCDF4 (or h5netcdf) API for that purpose: _get_nc_dataset_class and _iter_nc_groups. That's not the worst solution. However, I would insist that xarray should be able to do this. Maybe we need a open_datasets_from_groups function for that, or rather a function list_datasets. But it should somehow be solvable within the xarray API without requiring a two-year debate about the management and representation of hierarchical data structures.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1108138101
Powered by Datasette · Queries took 0.564ms · About: xarray-datasette