home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 904817641

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4118#issuecomment-904817641 https://api.github.com/repos/pydata/xarray/issues/4118 904817641 IC_kwDOAMm_X8417mvp 35968931 2021-08-24T17:00:24Z 2022-05-19T16:33:26Z MEMBER

So I had a crack at making a full DataTree class - you can find it in this repo.

It's based on @benbovy's DatasetNode example - the basic idea is that each tree node wraps a single Dataset. The differences are that this effort: - Uses a NodeMixin from anytree for the tree structure, - Implements path-like and tag-like getting and setting, - Has functions for mapping user-supplied functions over every node in the tree, - Automatically dispatches xarray.Dataset's API over every node in the tree (such as .isel or __add__), - Has a bunch of tests, - Has a printable representation that currently looks like this:

Some limitations of the approach I used are: - Each dataset in the tree is entirely separate, so doing something like dt.sel(time=50) would require each Dataset in that subtree to have it's own coordinate called 'time'. (That's normally useful though because then 'time' can be a different resolution on each ds), - While you can access nodes via tags, the underlying implementation is in terms of paths, so ('folder1', 'folder2') points to a different node than ('folder2', 'folder1'), - There's no support for symbolic nodes yet, and I'm unsure if this design can allow for loops or not.

You can create a DataTree object in 3 ways: 1) Load from a netCDF file that has groups via open_datatree(), 2) Using the init method of DataTree, which accepts a nested dictionary of Datasets, 3) Manually create individual nodes with DataNode() and specify their relationships to each other, either by setting .parent and .children attributes, or through __get/setitem__ access, e.g. dt['path/to/node'] = DataNode('node_name', data=xr.Dataset()).

It's about 70% working, but some things I could do with some help with are: 1) ~Fundamental design questions about the class structure, such as whether DataTree should be a subclass of Dataset?~ 2) ~Getting arithmetic and ufuncs to act properly on the whole tree~, 3) ~Saving a tree to a single netCDF file~, (thanks Joe!) 4) ~Setting up CI and all that jazz~, (thanks Joe again!) 5) ~Setting up basic docs.~

There will definitely be many bugs, but any thoughts or input appreciated!

{
    "total_count": 8,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 8,
    "rocket": 0,
    "eyes": 0
}
  628719058
Powered by Datasette · Queries took 0.81ms · About: xarray-datasette