html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1887#issuecomment-825176507,https://api.github.com/repos/pydata/xarray/issues/1887,825176507,MDEyOklzc3VlQ29tbWVudDgyNTE3NjUwNw==,5635139,2021-04-22T20:50:29Z,2021-04-22T21:06:47Z,MEMBER,"> `stack(new_dim=[""a"", ""b""], dropna=True)`
This could be useful (potentially we can open a different issue). While someone can call `.dropna`, that coerces to floats (or some type that supports missing) and can allocate more than is needed. Potentially this can be considered along with issues around sparse, e.g. https://github.com/pydata/xarray/issues/3245, https://github.com/pydata/xarray/issues/4143","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824503658,https://api.github.com/repos/pydata/xarray/issues/1887,824503658,MDEyOklzc3VlQ29tbWVudDgyNDUwMzY1OA==,5635139,2021-04-22T03:04:41Z,2021-04-22T03:04:51Z,MEMBER,"I'm still working through this. Using this to jot down my notes, no need to respond.
One property that seems to be lacking is that if `key` changes from `n-1` to `n` dimensions, the behavior changes (also outlined [here](url)):
```python
In [171]: a
Out[171]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [172]: mask
Out[172]: array([ True, False, True])
In [173]: a[mask]
Out[173]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])
```
...as expected, but now let's make a 2D mask...
```python
In [174]: full_mask = np.broadcast_to(mask[:, np.newaxis], (3,4))
In [175]: full_mask
Out[175]:
array([[ True, True, True, True],
[False, False, False, False],
[ True, True, True, True]])
In [176]: a[full_mask]
Out[176]: array([ 0, 1, 2, 3, 8, 9, 10, 11]) # flattened!
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824461333,https://api.github.com/repos/pydata/xarray/issues/1887,824461333,MDEyOklzc3VlQ29tbWVudDgyNDQ2MTMzMw==,1217238,2021-04-22T01:02:32Z,2021-04-22T01:02:32Z,MEMBER,"> Current proposal (""`stack`""), of `da[key]` and with a dimension of `key`'s name (and probably no multiindex):
>
> ```python
> In [86]: da.values[key.values]
> Out[86]: array([0, 3, 6, 9]) # But the xarray version
> ```
The part about this new proposal that is most annoying is that the `key` needs a `name`, which we can use to name the new dimension. That's not too hard to do, but it is little annoying -- in practice you would have to write something like `da[key.rename('key_name')]` much of the time to make this work.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824460304,https://api.github.com/repos/pydata/xarray/issues/1887,824460304,MDEyOklzc3VlQ29tbWVudDgyNDQ2MDMwNA==,1217238,2021-04-22T00:59:25Z,2021-04-22T00:59:25Z,MEMBER,"> OK great. To confirm, this is what it would look like:
Yes, this looks right to me.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824454992,https://api.github.com/repos/pydata/xarray/issues/1887,824454992,MDEyOklzc3VlQ29tbWVudDgyNDQ1NDk5Mg==,5635139,2021-04-22T00:40:49Z,2021-04-22T00:40:49Z,MEMBER,"> I'm not quite sure this is true -- it's the difference between needing to call `stack()` vs `unstack()`.
This was a tiny point so it's fine to discard. I had meant that producing the `where` result via the `stack` result requires a `stack` and `unstack`. But producing the `stack` result via a `where` result requires only one `stack` — the `where` result is very cheap.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824452843,https://api.github.com/repos/pydata/xarray/issues/1887,824452843,MDEyOklzc3VlQ29tbWVudDgyNDQ1Mjg0Mw==,5635139,2021-04-22T00:33:29Z,2021-04-22T00:35:28Z,MEMBER,"OK great. To confirm, this is what it would look like:
Context:
```python
In [81]: da = xr.DataArray(np.arange(12).reshape(3,4), dims=list('ab'))
In [82]: da
Out[82]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Dimensions without coordinates: a, b
In [84]: key = da % 3 == 0
In [83]: key
Out[83]:
array([[ True, False, False, True],
[False, False, True, False],
[False, True, False, False]])
Dimensions without coordinates: a, b
```
Currently
```python
In [85]: da[key]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
in
----> 1 da[key]
...
~/.asdf/installs/python/3.8.8/lib/python3.8/site-packages/xarray/core/variable.py in _validate_indexers(self, key)
697 )
698 if k.ndim > 1:
--> 699 raise IndexError(
700 ""{}-dimensional boolean indexing is ""
701 ""not supported. "".format(k.ndim)
IndexError: 2-dimensional boolean indexing is not supported.
```
Current proposal (""`stack`""), of `da[key]` and with a dimension of `key`'s name (and probably no multiindex):
```python
In [86]: da.values[key.values]
Out[86]: array([0, 3, 6, 9]) # But the xarray version
```
Previous suggestion (""`where`""), for the result of `da[key]`:
```python
In [87]: da.where(key)
Out[87]:
array([[ 0., nan, nan, 3.],
[nan, nan, 6., nan],
[nan, 9., nan, nan]])
Dimensions without coordinates: a, b
```
(small follow up I'll put in another message, for clarity)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824329772,https://api.github.com/repos/pydata/xarray/issues/1887,824329772,MDEyOklzc3VlQ29tbWVudDgyNDMyOTc3Mg==,1217238,2021-04-21T20:16:10Z,2021-04-21T20:16:10Z,MEMBER,"> I've been trying to conceptualize why I think the `where` equivalence (the original proposal) is better than the `stack` proposal (the latter).
Here are two reasons why I like the `stack` version:
1. It's more NumPy like -- boolean indexing in NumPy returns a flat array in the same way
2. It doesn't need dtype promotion to handle possibly missing values, so it will have more predictable semantics.
As a side note: one nice feature of using `isel()` for stacking is that it _does not_ create a MultiIndex, which can be expensive. But there's no reason why we necessarily need to do that for `stack()`. I'll open a new issue to discuss adding an optional parameter.
> * I'm not sure how the setitem would work; `da[key] = value`?
To match the semantics of NumPy, `value` would need to have matching dims/coords to those of `da[key]`. In other words, it would also need to be stacked.
> * If someone wants the `stack` result, it's less work to do original -> `where` result -> `stack` result relative to original -> `stack` result -> `where` result; which suggests they're more composable?
I'm not quite sure this is true -- it's the difference between needing to call `stack()` vs `unstack()`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824299104,https://api.github.com/repos/pydata/xarray/issues/1887,824299104,MDEyOklzc3VlQ29tbWVudDgyNDI5OTEwNA==,5635139,2021-04-21T19:21:46Z,2021-04-21T19:21:46Z,MEMBER,"I've been trying to conceptualize why I think the `where` equivalence (the original proposal) is better than the `stack` proposal (the latter). I think it's mostly:
- It's simpler
- I'm not sure how the setitem would work; `da[key] = value`?
- If someone wants the `stack` result, it's less work to do original -> `where` result -> `stack` result relative to original -> `stack` result -> `where` result; which suggests they're more composable?
But I don't do much pointwise indexing — and so maybe we do want to prioritize that","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-823673654,https://api.github.com/repos/pydata/xarray/issues/1887,823673654,MDEyOklzc3VlQ29tbWVudDgyMzY3MzY1NA==,1217238,2021-04-20T23:50:34Z,2021-04-20T23:50:34Z,MEMBER,"It's worth noting that there is at least one other way boolean indexing could work:
- `ds[key]` could work like `ds.stack({key.name: key.dims}).isel({key.name: np.flatnonzero(key.data)})`, except without creating a MultiIndex. Arguably this might be more useful and also more consistent with NumPy itself. It's also more similar to the operation @Hoeze wants in https://github.com/pydata/xarray/issues/5179.
We can't support both with the same syntax, so we have to make a choice here :).
See also the discussion about what `drop_duplicates`/`unique` should do over in https://github.com/pydata/xarray/pull/5089.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-803491524,https://api.github.com/repos/pydata/xarray/issues/1887,803491524,MDEyOklzc3VlQ29tbWVudDgwMzQ5MTUyNA==,5635139,2021-03-21T00:38:23Z,2021-03-21T00:38:23Z,MEMBER,"I've added the ""good first issue"" label — at least the first two bullets of the proposal would be relatively simple to implement, given they're mostly syntactic sugar.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734