html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/4118#issuecomment-1042753800,https://api.github.com/repos/pydata/xarray/issues/4118,1042753800,IC_kwDOAMm_X84-JykI,226037,2022-02-17T09:41:29Z,2022-02-17T09:53:55Z,MEMBER,"@kmuehlbauer in the representation I use the fully qualified name for the dimension / coordinate, but the corresponding `DataArray` will use the basename, e.g. both array will have `lat` as a coordinate. Sorry for the confusion, I need to add more context to the README.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,628719058
https://github.com/pydata/xarray/issues/4118#issuecomment-1042656377,https://api.github.com/repos/pydata/xarray/issues/4118,1042656377,IC_kwDOAMm_X84-Jax5,226037,2022-02-17T07:39:15Z,2022-02-17T08:17:51Z,MEMBER,"@TomNicholas (cc @mraspaud)

> Do you have use cases which one of these designs could handle but the other couldn't?

The two main classes of on-disk formats that, I know of, which cannot be always represented in the ""group is a Dataset"" approach are:
- in [netCDF following the CF conventions for groups](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#groups), it is legal for an array to refer to a dimension or a coordinate in a different group and so arrays in the same group may have dimensions with the same name, but different size / coordinate values, (this was the orginal motivation to explore the DataGroup approach)
- the current spec for the [Next-generation file formats (NGFF)](https://ngff.openmicroscopy.org) for bio-imaging has all scales of the same 5D data in the same group. (cc @joshmoore)

I don't have an example at hand, but my impression is that satellite products that use HDF5 file format also place arrays with inconsistent dimensions / coordinates in the same group.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,628719058
https://github.com/pydata/xarray/issues/4118#issuecomment-1042664227,https://api.github.com/repos/pydata/xarray/issues/4118,1042664227,IC_kwDOAMm_X84-Jcsj,226037,2022-02-17T07:52:17Z,2022-02-17T07:53:13Z,MEMBER,"@TomNicholas I also have a few comments on the comparison:

> * **Option (1) - Each group is a Dataset**
>   
>   * Model maps more directly onto netCDF (though still not exactly, because netCDF has dimensions as separate objects)

This is only true for flat netCDF files, once you introduce groups in a netCDF AND accept CF conventions the DataGroup approach can map 100% of the files, while the DataTree approach fails on a (admittedly small) class of them.

>   * Enforcing consistency between variables guarantees certain operations are always well-defined (in particular selection via an integer index like in `.isel`).
>   * Guarantees that all valid operations on a Dataset are also valid operations on a single group of a DataTree - so API can be essentially identical to Dataset.

Both points are only true for the DataArray in a single group, once you broadcast any operation to subgroups the two implementations would share the same limitations (dimensions in subgroups can be inconsistent in both cases).

In my opinion the advantage for the DataTree is minimal.

>   * Metadata (i.e. `.attrs`) are arguably most useful when set at this level

The two approach are identical in this respect, group attributes are mapped in the same way to DataTree and DataGroup

I share your views on all other points.","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,628719058