html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1887#issuecomment-825176507,https://api.github.com/repos/pydata/xarray/issues/1887,825176507,MDEyOklzc3VlQ29tbWVudDgyNTE3NjUwNw==,5635139,2021-04-22T20:50:29Z,2021-04-22T21:06:47Z,MEMBER,"> `stack(new_dim=[""a"", ""b""], dropna=True)`

This could be useful (potentially we can open a different issue). While someone can call `.dropna`, that coerces to floats (or some type that supports missing) and can allocate more than is needed. Potentially this can be considered along with issues around sparse, e.g. https://github.com/pydata/xarray/issues/3245, https://github.com/pydata/xarray/issues/4143","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824782830,https://api.github.com/repos/pydata/xarray/issues/1887,824782830,MDEyOklzc3VlQ29tbWVudDgyNDc4MjgzMA==,1200058,2021-04-22T12:08:45Z,2021-04-22T12:11:55Z,NONE,"> > Current proposal (""`stack`""), of `da[key]` and with a dimension of `key`'s name (and probably no multiindex):
> > ```python
> > In [86]: da.values[key.values]
> > Out[86]: array([0, 3, 6, 9])   # But the xarray version
> > ```
> 
> The part about this new proposal that is most annoying is that the `key` needs a `name`, which we can use to name the new dimension. That's not too hard to do, but it is little annoying -- in practice you would have to write something like `da[key.rename('key_name')]` much of the time to make this work.

IMO, the perfect solution would be masking support.
I.e. `da[key]` would return the same array with an additional variable `da.mask == key`:
```python
In [87]: da[key]
Out[87]:
<xarray.DataArray (a: 3, b: 4)>
array([[   0, <NA>, <NA>,    3],
       [<NA>, <NA>,    6, <NA>],
       [<NA>,    9, <NA>, <NA>]])
dtype: int
Dimensions without coordinates: a, b
```
Then we could have something like `da[key].stack(new_dim=[""a"", ""b""], dropna=True)`:
```python
In [87]: da[key].stack(new_dim=[""a"", ""b""], dropna=True)
Out[87]:
<xarray.DataArray (newdim: 4)>
array([0, 3, 6, 9])
coords{
   ""a"" (newdim): [0, 0, 1, 2],
   ""b"" (newdim): [0, 3, 2, 1],
}
Dimensions without coordinates: newdim
```
Here, `dropna=True` would allow avoiding to create the cross-product of `a, b`.

Also, that would avoid all those unnecessary `float` casts for free.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824503658,https://api.github.com/repos/pydata/xarray/issues/1887,824503658,MDEyOklzc3VlQ29tbWVudDgyNDUwMzY1OA==,5635139,2021-04-22T03:04:41Z,2021-04-22T03:04:51Z,MEMBER,"I'm still working through this. Using this to jot down my notes, no need to respond.

One property that seems to be lacking is that if `key` changes from `n-1` to `n` dimensions, the behavior changes (also outlined [here](url)):

```python
In [171]: a
Out[171]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [172]: mask
Out[172]: array([ True, False,  True])

In [173]: a[mask]
Out[173]:
array([[ 0,  1,  2,  3],
       [ 8,  9, 10, 11]])
```

...as expected, but now let's make a 2D mask...

```python
In [174]: full_mask = np.broadcast_to(mask[:, np.newaxis], (3,4))

In [175]: full_mask
Out[175]:
array([[ True,  True,  True,  True],
       [False, False, False, False],
       [ True,  True,  True,  True]])

In [176]: a[full_mask]
Out[176]: array([ 0,  1,  2,  3,  8,  9, 10, 11])    # flattened!
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824461333,https://api.github.com/repos/pydata/xarray/issues/1887,824461333,MDEyOklzc3VlQ29tbWVudDgyNDQ2MTMzMw==,1217238,2021-04-22T01:02:32Z,2021-04-22T01:02:32Z,MEMBER,"> Current proposal (""`stack`""), of `da[key]` and with a dimension of `key`'s name (and probably no multiindex):
> 
> ```python
> In [86]: da.values[key.values]
> Out[86]: array([0, 3, 6, 9])   # But the xarray version
> ```

The part about this new proposal that is most annoying is that the `key` needs a `name`, which we can use to name the new dimension. That's not too hard to do, but it is little annoying -- in practice you would have to write something like  `da[key.rename('key_name')]` much of the time to make this work.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824460304,https://api.github.com/repos/pydata/xarray/issues/1887,824460304,MDEyOklzc3VlQ29tbWVudDgyNDQ2MDMwNA==,1217238,2021-04-22T00:59:25Z,2021-04-22T00:59:25Z,MEMBER,"> OK great. To confirm, this is what it would look like:

Yes, this looks right to me.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824454992,https://api.github.com/repos/pydata/xarray/issues/1887,824454992,MDEyOklzc3VlQ29tbWVudDgyNDQ1NDk5Mg==,5635139,2021-04-22T00:40:49Z,2021-04-22T00:40:49Z,MEMBER,"> I'm not quite sure this is true -- it's the difference between needing to call `stack()` vs `unstack()`.

This was a tiny point so it's fine to discard. I had meant that producing the `where` result via the `stack` result requires a `stack` and `unstack`. But producing the `stack` result via a `where` result requires only one `stack` — the `where` result is very cheap. 
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824452843,https://api.github.com/repos/pydata/xarray/issues/1887,824452843,MDEyOklzc3VlQ29tbWVudDgyNDQ1Mjg0Mw==,5635139,2021-04-22T00:33:29Z,2021-04-22T00:35:28Z,MEMBER,"OK great. To confirm, this is what it would look like:


Context:

```python
In [81]: da = xr.DataArray(np.arange(12).reshape(3,4), dims=list('ab'))

In [82]: da
Out[82]:
<xarray.DataArray (a: 3, b: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
Dimensions without coordinates: a, b

In [84]: key = da % 3 == 0

In [83]: key
Out[83]:
<xarray.DataArray (a: 3, b: 4)>
array([[ True, False, False,  True],
       [False, False,  True, False],
       [False,  True, False, False]])
Dimensions without coordinates: a, b
```

Currently
```python

In [85]: da[key]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-85-7fd83c907cb6> in <module>
----> 1 da[key]
...
~/.asdf/installs/python/3.8.8/lib/python3.8/site-packages/xarray/core/variable.py in _validate_indexers(self, key)
    697                         )
    698                     if k.ndim > 1:
--> 699                         raise IndexError(
    700                             ""{}-dimensional boolean indexing is ""
    701                             ""not supported. "".format(k.ndim)

IndexError: 2-dimensional boolean indexing is not supported.
```

Current proposal (""`stack`""), of `da[key]` and with a dimension of `key`'s name (and probably no multiindex):
```python
In [86]: da.values[key.values]
Out[86]: array([0, 3, 6, 9])   # But the xarray version
```

Previous suggestion (""`where`""), for the result of `da[key]`:
```python
In [87]: da.where(key)
Out[87]:
<xarray.DataArray (a: 3, b: 4)>
array([[ 0., nan, nan,  3.],
       [nan, nan,  6., nan],
       [nan,  9., nan, nan]])
Dimensions without coordinates: a, b
```

(small follow up I'll put in another message, for clarity)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824329772,https://api.github.com/repos/pydata/xarray/issues/1887,824329772,MDEyOklzc3VlQ29tbWVudDgyNDMyOTc3Mg==,1217238,2021-04-21T20:16:10Z,2021-04-21T20:16:10Z,MEMBER,"> I've been trying to conceptualize why I think the `where` equivalence (the original proposal) is better than the `stack` proposal (the latter).

Here are two reasons why I like the `stack` version:

1. It's more NumPy like -- boolean indexing in NumPy returns a flat array in the same way
2. It doesn't need dtype promotion to handle possibly missing values, so it will have more predictable semantics.

As a side note: one nice feature of using `isel()` for stacking is that it _does not_ create a MultiIndex, which can be expensive. But there's no reason why we necessarily need to do that for `stack()`. I'll open a new issue to discuss adding an optional parameter.

> * I'm not sure how the setitem would work; `da[key] = value`?

To match the semantics of NumPy, `value` would need to have matching dims/coords to those of `da[key]`. In other words, it would also need to be stacked.

> * If someone wants the `stack` result, it's less work to do original -> `where` result -> `stack` result relative to original -> `stack` result -> `where` result; which suggests they're more composable?

I'm not quite sure this is true -- it's the difference between needing to call `stack()` vs `unstack()`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-824299104,https://api.github.com/repos/pydata/xarray/issues/1887,824299104,MDEyOklzc3VlQ29tbWVudDgyNDI5OTEwNA==,5635139,2021-04-21T19:21:46Z,2021-04-21T19:21:46Z,MEMBER,"I've been trying to conceptualize why I think the `where` equivalence (the original proposal) is better than the `stack` proposal (the latter). I think it's mostly:
- It's simpler
- I'm not sure how the setitem would work; `da[key] = value`?
- If someone wants the `stack` result, it's less work to do original -> `where` result -> `stack` result relative to original -> `stack` result -> `where` result; which suggests they're more composable?

But I don't do much pointwise indexing — and so maybe we do want to prioritize that","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-823673654,https://api.github.com/repos/pydata/xarray/issues/1887,823673654,MDEyOklzc3VlQ29tbWVudDgyMzY3MzY1NA==,1217238,2021-04-20T23:50:34Z,2021-04-20T23:50:34Z,MEMBER,"It's worth noting that there is at least one other way boolean indexing could work:

- `ds[key]` could work like `ds.stack({key.name: key.dims}).isel({key.name: np.flatnonzero(key.data)})`, except without creating a MultiIndex. Arguably this might be more useful and also more consistent with NumPy itself. It's also more similar to the operation @Hoeze wants in https://github.com/pydata/xarray/issues/5179.

We can't support both with the same syntax, so we have to make a choice here :).

See also the discussion about what `drop_duplicates`/`unique` should do over in https://github.com/pydata/xarray/pull/5089.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-803491524,https://api.github.com/repos/pydata/xarray/issues/1887,803491524,MDEyOklzc3VlQ29tbWVudDgwMzQ5MTUyNA==,5635139,2021-03-21T00:38:23Z,2021-03-21T00:38:23Z,MEMBER,"I've added the ""good first issue"" label — at least the first two bullets of the proposal would be relatively simple to implement, given they're mostly syntactic sugar.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-744463486,https://api.github.com/repos/pydata/xarray/issues/1887,744463486,MDEyOklzc3VlQ29tbWVudDc0NDQ2MzQ4Ng==,43274047,2020-12-14T14:07:32Z,2020-12-14T15:47:18Z,NONE,"Just wanted to confirm, that boolean indexing is indeed highly relevant, especially for assigning values instead of just selecting them. **Here is a use case** which I encounter very often:

I'm working with very sparse data (e.g a satellite image of some islands surrounded by water), and I want to modify it using `some_vectorized_function()`. Of course I could use `some_vectorized_function()` to process the whole image, but boolean masking allows me to save a lot of computations.

Here is how I would achieve this in numpy:

```
import numpy as np
import some_vectorized_function

image = np.array(                                          # image.shape == (3, 7, 7)
    [[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
      [0.0, 454, 454, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 565, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 343, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]],
    
     [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
      [0.0, 454, 565, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 667, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 878, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]],
    
     [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
      [0.0, 565, 676, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 323, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 545, 0.0],
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]]
)
image = np.moveaxis(image, 0, -1)                          # image.shape == (7, 7, 3)


# ""image"" is a standard RGB image
# with shape == (height, width, channel)
# but only 4 pixels contain relevant data!


mask = np.all(image > 0, axis=-1)                          # mask.shape == (7, 7)
                                                           # mask.dtype == bool
                                                           # mask.sum() == 4

image[mask] = some_vectorized_function(image[mask])        # len(image[mask]) == 4
                                                           # image[mask].shape == (4, 3)
```

The most important fact here is that `image[mask]` is just a list of 4 pixels, which I can process and then **assign them back** into their original place. And as you see, this boolean masking also plays very nice with broadcasting, which allows me to mask a 3D array with a 2D mask.

Unfortunately, nothing like this is currently possible with XArray. If implemented, it would enable some crazy speedups for operations like spatial interpolation, where we don't want to interpolate the whole image, but only some pixels that we care about. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734
https://github.com/pydata/xarray/issues/1887#issuecomment-544693024,https://api.github.com/repos/pydata/xarray/issues/1887,544693024,MDEyOklzc3VlQ29tbWVudDU0NDY5MzAyNA==,1200058,2019-10-21T20:27:14Z,2019-10-21T20:27:14Z,NONE,"Since https://github.com/pydata/xarray/issues/3206 has been implemented now:
Maybe fancy boolean indexing (`da[boolean_mask]`) could return a sparse array as well.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734