html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/4118#issuecomment-637382925,https://api.github.com/repos/pydata/xarray/issues/4118,637382925,MDEyOklzc3VlQ29tbWVudDYzNzM4MjkyNQ==,39640592,2020-06-02T08:33:42Z,2020-06-02T08:33:42Z,NONE,"Thanks @jhamman for sharing the link. Here are my thoughts on the same:

For use-cases similar to the one I have mentioned, I think it would be more meaningful to allow the tree structure (calling it `Datatree` further) to exist as a separate data structure instead of residing within the Dataset. From what I understand, the xarray Dataset would enforce all its component variables to share the same coordinate set for a given dimension name. This would again result in memory wastage with `nan` values when the value corresponding to a coordinate is unknown. 

Besides, xarray only allows attribute access for getting (and not setting) values, but a separate data structure can allow attribute access for setting values as well. For example, the data structure that I have implemented would allow something like `dt.weather = dt.weather.mean('time')` to alter all the data arrays under the `weather` node.

I am currently using attribute-based access for accessing child nodes/data arrays in the `Datatree` as it appears to reflect the tree structure better, but as @shoyer has pointed out, tuple-based access might be easier to use programmatically.

Instead of using netCDF4 groups for encoding the `Datatree`, I am currently following a simple 3-step process:
- Combine all the data arrays at the leaves of a `Datatree` object into a dataset.
- Add an additional data array to the dataset that would contain an ancestor matrix (or any other array-like representation) that can encode the hierarchical structure with a coordinate set containing names of the tree nodes.
- Use the `xarray.Dataset.to_netcdf` method to store it in a netCDF file. 

Therefore, within the netCDF file, it would exist just as a Dataset. A specially implemented `Datatree.open_datatree` method can open the dataset, detect this additional array and recreate the tree structure to instantiate the object. I would like to know if using netCDF4 groups instead provide any advantages over this approach?


","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,628719058