issue_comments: 1039572760

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/4118#issuecomment-1039572760	https://api.github.com/repos/pydata/xarray/issues/4118	1039572760	IC_kwDOAMm_X8499p8Y	35968931	2022-02-14T21:19:56Z	2022-02-14T21:40:21Z	MEMBER	We would like some opinions from the community on two different possible models for a tree-like structure in xarray. A tree contains many groups, but the question is what constraints should be imposed on the contents of those groups. Option (1) - Each group is a Dataset Means that within each group the same restrictions apply as currently do within a single dataset, i.e. each dimension name is only associated with a single length, so there is effectively a common set of dimensions which variables can depend on. Can't represent all files, in particular can't represent a filetype where groups are allowed to have variables with inconsistent length dimensions (e.g. Zarr stores allow this as all arrays are independent.) Model maps more directly onto netCDF (though still not exactly, because netCDF has dimensions as separate objects) This means that sometimes you might need to put variables in ajdacent groups in the same level of the tree, when you might rather want them together in the same group. Enforcing consistency between variables guarantees certain operations are always well-defined (in particular selection via an integer index like in `.isel`). Guarantees that all valid operations on a Dataset are also valid operations on a single group of a DataTree - so API can be essentially identical to Dataset. Metadata (i.e. `.attrs`) are arguably most useful when set at this level Mental model is a (nested) dict of Datasets Prototype is DataTree Option (2) - Variables within groups are unconstrained Means that within a single group each Variable can have any dimensions, of any length. There is no requirement that two variables which both depend on a dimension called "x" have to have the same length, one variable can have `.sizes['x']=10` and the other have `.sizes['x']=20`. The main advantage of this is that it can represent a wider set of files (including all Zarr stores and a wider set of GRIB files) Model maps more directly onto HDF5 Doesn't enforce the (arguably fairly arbitrary) constraint that if variables have a dimension of the same name, that dimension must also be the same length Without consistency selection becomes ill-defined, but many other operations are fine (e.g. taking `.mean()`) Mental model is a (nested) dict of dicts of DataArrays Prototype is xarray-DataGroups This is by no means the only question, and we have various choices to make within these options. The questions for the potential users here are: - Do you have use cases which one of these designs could handle but the other couldn't? - How important to you is being able to support all valid files of these certain formats? - Which of these designs is clearer/more intuitive/more appealing to you? (@alexamici , @shoyer, @jhamman, @aurghs please edit this comment to add anything I've missed)	{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 2, "rocket": 0, "eyes": 0 }		628719058