html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/5598#issuecomment-879685814,https://api.github.com/repos/pydata/xarray/issues/5598,879685814,MDEyOklzc3VlQ29tbWVudDg3OTY4NTgxNA==,7611856,2021-07-14T08:05:37Z,2021-07-14T08:05:37Z,NONE,"I'm not a pandas expert, but maybe one can create a dummy index that enforces the size=1 constraint. E.g. an index which only supports one value (e.g. None or zero). That could potentially be used to fix the round-trip.

Also potentially related: #5202 (also contains discussions about the multiindex/dataset handling)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,943112510
https://github.com/pydata/xarray/issues/4118#issuecomment-876397215,https://api.github.com/repos/pydata/xarray/issues/4118,876397215,MDEyOklzc3VlQ29tbWVudDg3NjM5NzIxNQ==,7611856,2021-07-08T12:27:58Z,2021-07-08T12:27:58Z,NONE,"As a user who (so far) does not use any netCDF or HDF5 features of xarray I obviously would not like to have a otherwise potentially useful feature blocked by restrictions imposed by netCDF or HDF5 ;-).

That said - I think @tacaswell comment about round trips is very reasonable and such invariants should be maintained! It would be extremely confusing for users if netcdf -> xarray-> netcdf is not a  ""no-op"". The same obviously holds true for any other storage format. As a user I would generally expect something like the following:
```python
a1= xarray.load(""foo.myformat"")
xarray.save( a1, ""bar.myformat"")
a2= xarray.load(""bar.myformat"")
assert a1 == a2, ""Why should they not be exactly equal?!?"" 
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,628719058
https://github.com/pydata/xarray/issues/1092#issuecomment-868324949,https://api.github.com/repos/pydata/xarray/issues/1092,868324949,MDEyOklzc3VlQ29tbWVudDg2ODMyNDk0OQ==,7611856,2021-06-25T08:36:03Z,2021-06-25T08:45:23Z,NONE,"Hey Folks,
I stumbled over this discussion having a similar use case as described in some comments above: A `DataSet` with a bunch of arrays called `count_a, test_count_a, train_count_a,  count_b, ... , controlled_test_mean, controlled_train_mean, ... controlled_test_sigma, ...` Obviously a hierarchical structure would help to arrange this.

However, one point I didn't see in the discussion is the following:

Hierarchical structures often force a user to come up with some arbitrary order of hierarchy levels. The classical example is document filing: do you put your health insurance documents under `/insurance/health/2021`, `2021/health/insurance`,....?

One solution to that is a tagging of documents instead of putting them into a hierarchy. This would give the full flexibility to retrieve any flat `DataSet` out of a `TaggedDataSet` by specifying the set of tags that the individual `DataArrays` must be listed under.

Back to the above example, one could think of stuff like:

```python
# get a flat view (DataSet-like object) on all arrays of tagged that have the 'count' tag
ds: DataSet(View) = tagged.tag_select(""count"")
bar1 = ds.mean(dim=""foo"")
# get a flat view (DataSet-like object) on all arrays of tagged that have the ""train and ""controlled"" tag
bar2 = tagged.tag_select(""train"", ""controlled"").mean(dim=""foo"") # order of arguments to `tag_select` is irrelevant!
```
I hope it is clear what I mean, I know that there is e.g. some awesome [file system plugins](https://amoffat.github.io/supertag/index.html) (he has incredibly nice high level documentation on the topic) that use such a data model.

Just wanted to add that aspect to the discussion even if it might collide with the hierarchical approach!

One side note: If every array in the tagged container has exactly one tag, and tags do not repeat, then the whole thing should be semantically identical to a `DataSet` because every `tag_select` will yield a single `DataArray`  - I.e. it might be possible to integrate such functionality directly into `DataSet`  !?!

Regards,

Martin

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187859705
https://github.com/pydata/xarray/issues/5202#issuecomment-855822204,https://api.github.com/repos/pydata/xarray/issues/5202,855822204,MDEyOklzc3VlQ29tbWVudDg1NTgyMjIwNA==,7611856,2021-06-07T10:49:49Z,2021-06-07T10:49:49Z,NONE,"Besides the CPU requirements, IMHO, the memory consumption is even worse.

Imagine you want to hold a 1000x1000x1000 int64 array. That would be ~ 7.5 GB and still fits into RAM on most machines.
Let's assume float coordinates for all three axes. Their memory consumption of 3000*8 bytes is negligible.

Now if you stack that, you end up with three additional 7.5GB arrays. With higher dimensions the situation gets even worse.

That said, while it generally should be possible to create the coordinates of the stacked array on the fly, I don't have a solution for it. 

Side note:
_I stumbled over that when combining xarray with pytorch, where I want to evaluate a model on a large cartesian grid. For that I stacked the array and batched the stacked coordinates to feed them to pytorch, which makes the iteration over the cartesian space really nice and smooth in code._ ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,864249974
https://github.com/pydata/xarray/issues/4892#issuecomment-810959744,https://api.github.com/repos/pydata/xarray/issues/4892,810959744,MDEyOklzc3VlQ29tbWVudDgxMDk1OTc0NA==,7611856,2021-03-31T10:30:49Z,2021-03-31T10:30:49Z,NONE,"I don't know the internals of delegation between `.sel` and `.isel`. But from the user side I would expect that boolean indexing requires me to use `.isel` naturally. I mean, I have to provide a boolean mask that fits the shape of the array, i.e. it is naturally index based and should only be used with `.isel` irrespective of the coordinate types.

While that probably be a breaking change for some people, I think it makes a quite complicated topic slightly easier to document, and figure out intentions in written code.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,806218687