home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 868324949

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1092#issuecomment-868324949 https://api.github.com/repos/pydata/xarray/issues/1092 868324949 MDEyOklzc3VlQ29tbWVudDg2ODMyNDk0OQ== 7611856 2021-06-25T08:36:03Z 2021-06-25T08:45:23Z NONE

Hey Folks, I stumbled over this discussion having a similar use case as described in some comments above: A DataSet with a bunch of arrays called count_a, test_count_a, train_count_a, count_b, ... , controlled_test_mean, controlled_train_mean, ... controlled_test_sigma, ... Obviously a hierarchical structure would help to arrange this.

However, one point I didn't see in the discussion is the following:

Hierarchical structures often force a user to come up with some arbitrary order of hierarchy levels. The classical example is document filing: do you put your health insurance documents under /insurance/health/2021, 2021/health/insurance,....?

One solution to that is a tagging of documents instead of putting them into a hierarchy. This would give the full flexibility to retrieve any flat DataSet out of a TaggedDataSet by specifying the set of tags that the individual DataArrays must be listed under.

Back to the above example, one could think of stuff like:

```python

get a flat view (DataSet-like object) on all arrays of tagged that have the 'count' tag

ds: DataSet(View) = tagged.tag_select("count") bar1 = ds.mean(dim="foo")

get a flat view (DataSet-like object) on all arrays of tagged that have the "train and "controlled" tag

bar2 = tagged.tag_select("train", "controlled").mean(dim="foo") # order of arguments to tag_select is irrelevant! ``` I hope it is clear what I mean, I know that there is e.g. some awesome file system plugins (he has incredibly nice high level documentation on the topic) that use such a data model.

Just wanted to add that aspect to the discussion even if it might collide with the hierarchical approach!

One side note: If every array in the tagged container has exactly one tag, and tags do not repeat, then the whole thing should be semantically identical to a DataSet because every tag_select will yield a single DataArray - I.e. it might be possible to integrate such functionality directly into DataSet !?!

Regards,

Martin

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  187859705
Powered by Datasette · Queries took 0.655ms · About: xarray-datasette