html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1887#issuecomment-825176507,https://api.github.com/repos/pydata/xarray/issues/1887,825176507,MDEyOklzc3VlQ29tbWVudDgyNTE3NjUwNw==,5635139,2021-04-22T20:50:29Z,2021-04-22T21:06:47Z,MEMBER,"> `stack(new_dim=[""a"", ""b""], dropna=True)` This could be useful (potentially we can open a different issue). While someone can call `.dropna`, that coerces to floats (or some type that supports missing) and can allocate more than is needed. Potentially this can be considered along with issues around sparse, e.g. https://github.com/pydata/xarray/issues/3245, https://github.com/pydata/xarray/issues/4143","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734 https://github.com/pydata/xarray/issues/1887#issuecomment-824503658,https://api.github.com/repos/pydata/xarray/issues/1887,824503658,MDEyOklzc3VlQ29tbWVudDgyNDUwMzY1OA==,5635139,2021-04-22T03:04:41Z,2021-04-22T03:04:51Z,MEMBER,"I'm still working through this. Using this to jot down my notes, no need to respond. One property that seems to be lacking is that if `key` changes from `n-1` to `n` dimensions, the behavior changes (also outlined [here](url)): ```python In [171]: a Out[171]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [172]: mask Out[172]: array([ True, False, True]) In [173]: a[mask] Out[173]: array([[ 0, 1, 2, 3], [ 8, 9, 10, 11]]) ``` ...as expected, but now let's make a 2D mask... ```python In [174]: full_mask = np.broadcast_to(mask[:, np.newaxis], (3,4)) In [175]: full_mask Out[175]: array([[ True, True, True, True], [False, False, False, False], [ True, True, True, True]]) In [176]: a[full_mask] Out[176]: array([ 0, 1, 2, 3, 8, 9, 10, 11]) # flattened! ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734 https://github.com/pydata/xarray/issues/1887#issuecomment-824461333,https://api.github.com/repos/pydata/xarray/issues/1887,824461333,MDEyOklzc3VlQ29tbWVudDgyNDQ2MTMzMw==,1217238,2021-04-22T01:02:32Z,2021-04-22T01:02:32Z,MEMBER,"> Current proposal (""`stack`""), of `da[key]` and with a dimension of `key`'s name (and probably no multiindex): > > ```python > In [86]: da.values[key.values] > Out[86]: array([0, 3, 6, 9]) # But the xarray version > ``` The part about this new proposal that is most annoying is that the `key` needs a `name`, which we can use to name the new dimension. That's not too hard to do, but it is little annoying -- in practice you would have to write something like `da[key.rename('key_name')]` much of the time to make this work.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734 https://github.com/pydata/xarray/issues/1887#issuecomment-824460304,https://api.github.com/repos/pydata/xarray/issues/1887,824460304,MDEyOklzc3VlQ29tbWVudDgyNDQ2MDMwNA==,1217238,2021-04-22T00:59:25Z,2021-04-22T00:59:25Z,MEMBER,"> OK great. To confirm, this is what it would look like: Yes, this looks right to me.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734 https://github.com/pydata/xarray/issues/1887#issuecomment-824454992,https://api.github.com/repos/pydata/xarray/issues/1887,824454992,MDEyOklzc3VlQ29tbWVudDgyNDQ1NDk5Mg==,5635139,2021-04-22T00:40:49Z,2021-04-22T00:40:49Z,MEMBER,"> I'm not quite sure this is true -- it's the difference between needing to call `stack()` vs `unstack()`. This was a tiny point so it's fine to discard. I had meant that producing the `where` result via the `stack` result requires a `stack` and `unstack`. But producing the `stack` result via a `where` result requires only one `stack` — the `where` result is very cheap. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734 https://github.com/pydata/xarray/issues/1887#issuecomment-824452843,https://api.github.com/repos/pydata/xarray/issues/1887,824452843,MDEyOklzc3VlQ29tbWVudDgyNDQ1Mjg0Mw==,5635139,2021-04-22T00:33:29Z,2021-04-22T00:35:28Z,MEMBER,"OK great. To confirm, this is what it would look like: Context: ```python In [81]: da = xr.DataArray(np.arange(12).reshape(3,4), dims=list('ab')) In [82]: da Out[82]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) Dimensions without coordinates: a, b In [84]: key = da % 3 == 0 In [83]: key Out[83]: array([[ True, False, False, True], [False, False, True, False], [False, True, False, False]]) Dimensions without coordinates: a, b ``` Currently ```python In [85]: da[key] --------------------------------------------------------------------------- IndexError Traceback (most recent call last) in ----> 1 da[key] ... ~/.asdf/installs/python/3.8.8/lib/python3.8/site-packages/xarray/core/variable.py in _validate_indexers(self, key) 697 ) 698 if k.ndim > 1: --> 699 raise IndexError( 700 ""{}-dimensional boolean indexing is "" 701 ""not supported. "".format(k.ndim) IndexError: 2-dimensional boolean indexing is not supported. ``` Current proposal (""`stack`""), of `da[key]` and with a dimension of `key`'s name (and probably no multiindex): ```python In [86]: da.values[key.values] Out[86]: array([0, 3, 6, 9]) # But the xarray version ``` Previous suggestion (""`where`""), for the result of `da[key]`: ```python In [87]: da.where(key) Out[87]: array([[ 0., nan, nan, 3.], [nan, nan, 6., nan], [nan, 9., nan, nan]]) Dimensions without coordinates: a, b ``` (small follow up I'll put in another message, for clarity)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734 https://github.com/pydata/xarray/issues/1887#issuecomment-824329772,https://api.github.com/repos/pydata/xarray/issues/1887,824329772,MDEyOklzc3VlQ29tbWVudDgyNDMyOTc3Mg==,1217238,2021-04-21T20:16:10Z,2021-04-21T20:16:10Z,MEMBER,"> I've been trying to conceptualize why I think the `where` equivalence (the original proposal) is better than the `stack` proposal (the latter). Here are two reasons why I like the `stack` version: 1. It's more NumPy like -- boolean indexing in NumPy returns a flat array in the same way 2. It doesn't need dtype promotion to handle possibly missing values, so it will have more predictable semantics. As a side note: one nice feature of using `isel()` for stacking is that it _does not_ create a MultiIndex, which can be expensive. But there's no reason why we necessarily need to do that for `stack()`. I'll open a new issue to discuss adding an optional parameter. > * I'm not sure how the setitem would work; `da[key] = value`? To match the semantics of NumPy, `value` would need to have matching dims/coords to those of `da[key]`. In other words, it would also need to be stacked. > * If someone wants the `stack` result, it's less work to do original -> `where` result -> `stack` result relative to original -> `stack` result -> `where` result; which suggests they're more composable? I'm not quite sure this is true -- it's the difference between needing to call `stack()` vs `unstack()`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734 https://github.com/pydata/xarray/issues/1887#issuecomment-824299104,https://api.github.com/repos/pydata/xarray/issues/1887,824299104,MDEyOklzc3VlQ29tbWVudDgyNDI5OTEwNA==,5635139,2021-04-21T19:21:46Z,2021-04-21T19:21:46Z,MEMBER,"I've been trying to conceptualize why I think the `where` equivalence (the original proposal) is better than the `stack` proposal (the latter). I think it's mostly: - It's simpler - I'm not sure how the setitem would work; `da[key] = value`? - If someone wants the `stack` result, it's less work to do original -> `where` result -> `stack` result relative to original -> `stack` result -> `where` result; which suggests they're more composable? But I don't do much pointwise indexing — and so maybe we do want to prioritize that","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734 https://github.com/pydata/xarray/issues/1887#issuecomment-823673654,https://api.github.com/repos/pydata/xarray/issues/1887,823673654,MDEyOklzc3VlQ29tbWVudDgyMzY3MzY1NA==,1217238,2021-04-20T23:50:34Z,2021-04-20T23:50:34Z,MEMBER,"It's worth noting that there is at least one other way boolean indexing could work: - `ds[key]` could work like `ds.stack({key.name: key.dims}).isel({key.name: np.flatnonzero(key.data)})`, except without creating a MultiIndex. Arguably this might be more useful and also more consistent with NumPy itself. It's also more similar to the operation @Hoeze wants in https://github.com/pydata/xarray/issues/5179. We can't support both with the same syntax, so we have to make a choice here :). See also the discussion about what `drop_duplicates`/`unique` should do over in https://github.com/pydata/xarray/pull/5089.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734 https://github.com/pydata/xarray/issues/1887#issuecomment-803491524,https://api.github.com/repos/pydata/xarray/issues/1887,803491524,MDEyOklzc3VlQ29tbWVudDgwMzQ5MTUyNA==,5635139,2021-03-21T00:38:23Z,2021-03-21T00:38:23Z,MEMBER,"I've added the ""good first issue"" label — at least the first two bullets of the proposal would be relatively simple to implement, given they're mostly syntactic sugar.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734