html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1887#issuecomment-824782830,https://api.github.com/repos/pydata/xarray/issues/1887,824782830,MDEyOklzc3VlQ29tbWVudDgyNDc4MjgzMA==,1200058,2021-04-22T12:08:45Z,2021-04-22T12:11:55Z,NONE,"> > Current proposal (""`stack`""), of `da[key]` and with a dimension of `key`'s name (and probably no multiindex): > > ```python > > In [86]: da.values[key.values] > > Out[86]: array([0, 3, 6, 9]) # But the xarray version > > ``` > > The part about this new proposal that is most annoying is that the `key` needs a `name`, which we can use to name the new dimension. That's not too hard to do, but it is little annoying -- in practice you would have to write something like `da[key.rename('key_name')]` much of the time to make this work. IMO, the perfect solution would be masking support. I.e. `da[key]` would return the same array with an additional variable `da.mask == key`: ```python In [87]: da[key] Out[87]: array([[ 0, , , 3], [, , 6, ], [, 9, , ]]) dtype: int Dimensions without coordinates: a, b ``` Then we could have something like `da[key].stack(new_dim=[""a"", ""b""], dropna=True)`: ```python In [87]: da[key].stack(new_dim=[""a"", ""b""], dropna=True) Out[87]: array([0, 3, 6, 9]) coords{ ""a"" (newdim): [0, 0, 1, 2], ""b"" (newdim): [0, 3, 2, 1], } Dimensions without coordinates: newdim ``` Here, `dropna=True` would allow avoiding to create the cross-product of `a, b`. Also, that would avoid all those unnecessary `float` casts for free.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734 https://github.com/pydata/xarray/issues/1887#issuecomment-744463486,https://api.github.com/repos/pydata/xarray/issues/1887,744463486,MDEyOklzc3VlQ29tbWVudDc0NDQ2MzQ4Ng==,43274047,2020-12-14T14:07:32Z,2020-12-14T15:47:18Z,NONE,"Just wanted to confirm, that boolean indexing is indeed highly relevant, especially for assigning values instead of just selecting them. **Here is a use case** which I encounter very often: I'm working with very sparse data (e.g a satellite image of some islands surrounded by water), and I want to modify it using `some_vectorized_function()`. Of course I could use `some_vectorized_function()` to process the whole image, but boolean masking allows me to save a lot of computations. Here is how I would achieve this in numpy: ``` import numpy as np import some_vectorized_function image = np.array( # image.shape == (3, 7, 7) [[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 454, 454, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 565, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 343, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]], [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 454, 565, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 667, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 878, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]], [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 565, 676, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 323, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 545, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]] ) image = np.moveaxis(image, 0, -1) # image.shape == (7, 7, 3) # ""image"" is a standard RGB image # with shape == (height, width, channel) # but only 4 pixels contain relevant data! mask = np.all(image > 0, axis=-1) # mask.shape == (7, 7) # mask.dtype == bool # mask.sum() == 4 image[mask] = some_vectorized_function(image[mask]) # len(image[mask]) == 4 # image[mask].shape == (4, 3) ``` The most important fact here is that `image[mask]` is just a list of 4 pixels, which I can process and then **assign them back** into their original place. And as you see, this boolean masking also plays very nice with broadcasting, which allows me to mask a 3D array with a 2D mask. Unfortunately, nothing like this is currently possible with XArray. If implemented, it would enable some crazy speedups for operations like spatial interpolation, where we don't want to interpolate the whole image, but only some pixels that we care about. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734 https://github.com/pydata/xarray/issues/1887#issuecomment-544693024,https://api.github.com/repos/pydata/xarray/issues/1887,544693024,MDEyOklzc3VlQ29tbWVudDU0NDY5MzAyNA==,1200058,2019-10-21T20:27:14Z,2019-10-21T20:27:14Z,NONE,"Since https://github.com/pydata/xarray/issues/3206 has been implemented now: Maybe fancy boolean indexing (`da[boolean_mask]`) could return a sparse array as well.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,294241734