issue_comments: 744463486

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/1887#issuecomment-744463486	https://api.github.com/repos/pydata/xarray/issues/1887	744463486	MDEyOklzc3VlQ29tbWVudDc0NDQ2MzQ4Ng==	43274047	2020-12-14T14:07:32Z	2020-12-14T15:47:18Z	NONE	Just wanted to confirm, that boolean indexing is indeed highly relevant, especially for assigning values instead of just selecting them. Here is a use case which I encounter very often: I'm working with very sparse data (e.g a satellite image of some islands surrounded by water), and I want to modify it using `some_vectorized_function()`. Of course I could use `some_vectorized_function()` to process the whole image, but boolean masking allows me to save a lot of computations. Here is how I would achieve this in numpy: ``` import numpy as np import some_vectorized_function image = np.array( # image.shape == (3, 7, 7) [[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 454, 454, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 565, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 343, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]], [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 454, 565, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 667, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 878, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]], [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 565, 676, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 323, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 545, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]] ) image = np.moveaxis(image, 0, -1) # image.shape == (7, 7, 3) "image" is a standard RGB image with shape == (height, width, channel) but only 4 pixels contain relevant data! mask = np.all(image > 0, axis=-1) # mask.shape == (7, 7) # mask.dtype == bool # mask.sum() == 4 image[mask] = some_vectorized_function(image[mask]) # len(image[mask]) == 4 # image[mask].shape == (4, 3) ``` The most important fact here is that `image[mask]` is just a list of 4 pixels, which I can process and then assign them back into their original place. And as you see, this boolean masking also plays very nice with broadcasting, which allows me to mask a 3D array with a 2D mask. Unfortunately, nothing like this is currently possible with XArray. If implemented, it would enable some crazy speedups for operations like spatial interpolation, where we don't want to interpolate the whole image, but only some pixels that we care about.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		294241734