issue_comments: 868324949

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/1092#issuecomment-868324949	https://api.github.com/repos/pydata/xarray/issues/1092	868324949	MDEyOklzc3VlQ29tbWVudDg2ODMyNDk0OQ==	7611856	2021-06-25T08:36:03Z	2021-06-25T08:45:23Z	NONE	Hey Folks, I stumbled over this discussion having a similar use case as described in some comments above: A `DataSet` with a bunch of arrays called `count_a, test_count_a, train_count_a, count_b, ... , controlled_test_mean, controlled_train_mean, ... controlled_test_sigma, ...` Obviously a hierarchical structure would help to arrange this. However, one point I didn't see in the discussion is the following: Hierarchical structures often force a user to come up with some arbitrary order of hierarchy levels. The classical example is document filing: do you put your health insurance documents under `/insurance/health/2021`, `2021/health/insurance`,....? One solution to that is a tagging of documents instead of putting them into a hierarchy. This would give the full flexibility to retrieve any flat `DataSet` out of a `TaggedDataSet` by specifying the set of tags that the individual `DataArrays` must be listed under. Back to the above example, one could think of stuff like: ```python get a flat view (DataSet-like object) on all arrays of tagged that have the 'count' tag ds: DataSet(View) = tagged.tag_select("count") bar1 = ds.mean(dim="foo") get a flat view (DataSet-like object) on all arrays of tagged that have the "train and "controlled" tag bar2 = tagged.tag_select("train", "controlled").mean(dim="foo") # order of arguments to `tag_select` is irrelevant! ``` I hope it is clear what I mean, I know that there is e.g. some awesome file system plugins (he has incredibly nice high level documentation on the topic) that use such a data model. Just wanted to add that aspect to the discussion even if it might collide with the hierarchical approach! One side note: If every array in the tagged container has exactly one tag, and tags do not repeat, then the whole thing should be semantically identical to a `DataSet` because every `tag_select` will yield a single `DataArray` - I.e. it might be possible to integrate such functionality directly into `DataSet` !?! Regards, Martin	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		187859705