issue_comments: 901594249

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/4118#issuecomment-901594249	https://api.github.com/repos/pydata/xarray/issues/4118	901594249	IC_kwDOAMm_X841vTyJ	35968931	2021-08-19T04:10:30Z	2021-08-19T04:10:30Z	MEMBER	I think that xarray's current use of both dict-like access and attribute-like access for variables makes representing a general netCDF file in a single `DataTree` incompatible with the nice syntax that @emilbiju originally suggested. Consider a tree with a node structure for a hypothetical `DataTree` object `dt` that looks something like `python DataTree("root") \|-- DatasetNode("weather") \| \|-- DatasetNode("temperature") \| \| \|-- DataArrayNode("sea_surface_temperature") \| \| \|-- DataArrayNode("dew_point_temperature") \| \|-- DataArrayNode("wind_speed") \|-- DataArrayNode("population")` We ideally want to be able to seamlessly access both subtrees and individual variables via chains of keys, e.g. `weather_subtree = dt['weather']`, and `wind_speed_da = dt['weather']['wind_speed']`. (We want that so that each subtree behaves as much like an `xarray.Dataset` as possible, with respect to mapping functions over all its child nodes and so on.) This particular example is fine, and would correspond to a netCDF file with groups "root", "root/weather", and "root/weather/temperature", plus the four stored DataArray variables. However, if one of the variables has the same name as one of the groups (which I think is permitted in the netCDF format), then there is no easy way to access all the elements whilst retaining the nice syntax. For example consider `python DataTree("root") \|-- DatasetNode("A") \| \|-- DatasetNode("B") \| \| \|-- DataArrayNode("foo") \| \| \|-- DataArrayNode("bar") \| \|-- DataArrayNode("B") \|-- DataArrayNode("C")` Now we have a key collision between the group named "B" and the DataArray named "B", i.e. `dt['A']['B']` is ambiguous. We can't just forbid this type of tree because then there would be netCDF files that we couldn't represent as a `DataTree`, so we would not have the property `netCDF -> xarray.DataTree -> netCDF` in general. We can't use different types of access (e.g. `subtree = dt.A.B` for the subtree and `da = dt.A['B']` for the variable, because we've already given up the `.B` namespace to also point to the variable (i.e. same location as `['B']`). If we break that convention it's going to be very confusing for users who are expecting the root of the `DataTree` to behave like `xarray.Dataset` currently does. (We could divide access through `__call__` like `ds['A']('B')` but that wouldn't be very pythonic). The only way I can see around this is to hide a node's data variables behind a `.ds` property (i.e. `da = dt['A'].ds['B']`), or get groups via a dedicated method (i.e. `subtree = dt.get_child('A')`), but those are so much more ugly and less intuitive that it feels like a shame to have to do that. It sounds like @emilbiju avoided this by not satisfying `netCDF -> xarray.DataTree -> netCDF`: (Instead of using netCDF4 groups for encoding the Datatree ... within the netCDF file, it would exist just as a Dataset) so I'm wondering if anyone else has other suggestions or thoughts?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		628719058