html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1092#issuecomment-868324949,https://api.github.com/repos/pydata/xarray/issues/1092,868324949,MDEyOklzc3VlQ29tbWVudDg2ODMyNDk0OQ==,7611856,2021-06-25T08:36:03Z,2021-06-25T08:45:23Z,NONE,"Hey Folks,
I stumbled over this discussion having a similar use case as described in some comments above: A `DataSet` with a bunch of arrays called `count_a, test_count_a, train_count_a, count_b, ... , controlled_test_mean, controlled_train_mean, ... controlled_test_sigma, ...` Obviously a hierarchical structure would help to arrange this.
However, one point I didn't see in the discussion is the following:
Hierarchical structures often force a user to come up with some arbitrary order of hierarchy levels. The classical example is document filing: do you put your health insurance documents under `/insurance/health/2021`, `2021/health/insurance`,....?
One solution to that is a tagging of documents instead of putting them into a hierarchy. This would give the full flexibility to retrieve any flat `DataSet` out of a `TaggedDataSet` by specifying the set of tags that the individual `DataArrays` must be listed under.
Back to the above example, one could think of stuff like:
```python
# get a flat view (DataSet-like object) on all arrays of tagged that have the 'count' tag
ds: DataSet(View) = tagged.tag_select(""count"")
bar1 = ds.mean(dim=""foo"")
# get a flat view (DataSet-like object) on all arrays of tagged that have the ""train and ""controlled"" tag
bar2 = tagged.tag_select(""train"", ""controlled"").mean(dim=""foo"") # order of arguments to `tag_select` is irrelevant!
```
I hope it is clear what I mean, I know that there is e.g. some awesome [file system plugins](https://amoffat.github.io/supertag/index.html) (he has incredibly nice high level documentation on the topic) that use such a data model.
Just wanted to add that aspect to the discussion even if it might collide with the hierarchical approach!
One side note: If every array in the tagged container has exactly one tag, and tags do not repeat, then the whole thing should be semantically identical to a `DataSet` because every `tag_select` will yield a single `DataArray` - I.e. it might be possible to integrate such functionality directly into `DataSet` !?!
Regards,
Martin
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187859705