issues: 171828347

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
171828347	MDU6SXNzdWUxNzE4MjgzNDc=	974	Indexing with alignment and broadcasting	1217238	closed	0		741199	6	2016-08-18T06:39:27Z	2018-02-04T23:30:12Z	2018-02-04T23:30:11Z	MEMBER				I think we can bring all of NumPy's advanced indexing to xarray in a very consistent way, with only very minor breaks in backwards compatibility. For boolean indexing: - `da[key]` where `key` is a boolean labelled array (with any number of dimensions) is made equivalent to `da.where(key.reindex_like(ds), drop=True)`. This matches the existing behavior if `key` is a 1D boolean array. For multi-dimensional arrays, even though the result is now multi-dimensional, this coupled with automatic skipping of NaNs means that `da[key].mean()` gives the same result as in NumPy. - `da[key] = value` where `key` is a boolean labelled array can be made equivalent to `da = da.where(align(key.reindex_like(da), value.reindex_like(da)))` (that is, the three argument form of `where`). - `da[key_0, ..., key_n]` where all of `key_i` are boolean arrays gets handled in the usual way. It is an `IndexingError` to supply multiple labelled keys if any of them are not already aligned with as the corresponding index coordinates (and share the same dimension name). If they want alignment, we suggest users simply write `da[key_0 & ... & key_n]`. For vectorized indexing* (by integer or index value): - `da[key_0, ..., key_n]` where all of `key_i` are integer labelled arrays with any number of dimensions gets handled like NumPy, except instead of broadcasting numpy-style we do broadcasting xarray-style: - If any of `key_i` are unlabelled, 1D arrays (e.g., numpy arrays), we convert them into an `xarray.Variable` along the respective dimension. 0D arrays remain scalars. This ensures that the result of broadcasting them (in the next step) will be consistent with our current "outer indexing" behavior. Unlabelled higher dimensional arrays triggers an `IndexingError`. - We ensure all keys have the same dimensions/coordinates by mapping it to `da[broadcast(key_0, ..., key_n)]` (note that broadcast now includes automatic alignment). - The result's dimensions and coordinates are copied from the broadcast keys. - The result's values are taken by mapping each set of integer locations specified by the broadcast version of `key_i` to the integer position on the corresponding `i`th axis on `da`. - Labeled indexing like `ds.loc[key_0, ...., key_n]` works exactly as above, except instead of doing integer lookup, we lookup label values in the corresponding index instead. - Indexing with `.isel` and `.sel`/`.reindex` works like the two previous cases, except we lookup axes by dimension name instead of axis position. - I haven't fully thought through the implications for assignment (`da[key] = value` or `da.loc[key] = value`), but I think it works in a straightforwardly similar fashion. All of these methods should also work for indexing on `Dataset` by looping over Dataset variables in the usual way. This framework neatly subsumes most of the major limitations with xarray's existing indexing: - Boolean indexing on multi-dimensional arrays works in an intuitive way, for both selection and assignment. - No more need for specialized methods (`sel_points`/`isel_points`) for pointwise indexing. If you want to select along the diagonal of an array, you simply need to supply indexers that use a new dimension. Instead of `arr.sel_points(lat=stations.lat, lon=stations.lon, dim='station')`, you would simply write `arr.sel(lat=stations.lat, lon=stations.lon)` -- the `station` dimension is taken automatically from the indexer. - Other use cases for NumPy's advanced indexing that currently are impossible in xarray also automatically work. For example, nearest neighbor interpolation to a completely different grid is now as simple as `ds.reindex(lon=grid.lon, lat=grid.lat, method='nearest', tolerance=0.5)` or `ds.reindex_like(grid, method='nearest', tolerance=0.5)`. Questions to consider: - How does this interact with @benbovy's enhancements for MultiIndex indexing? (#802 and #947) - How do we handle mixed slice and array indexing? In NumPy, this is a major source of confusion, because slicing is done before broadcasting and the order of slices in the result is handled separately from broadcast indices. I think we may be able to resolve this by mapping slices in this case to 1D arrays along their respective axes, and using our normal broadcasting rules. - Should we deprecate non-boolean indexing with `[]` and `.loc[]` and non-labelled arrays when some but not all dimensions are provided? Instead, we would require explicitly indexing like `[key, ...]` (yes, writing `...`), which indicates "all trailing axes" like NumPy. This behavior has been suggested for new indexers in NumPy because it precludes a class of bugs where the array has an unexpected number of dimensions. On the other hand, it's not so necessary for us when we have explicit indexing by dimension name with `.sel`. xref these comments from @MaximilianR and myself Note: I would certainly* welcome help making this happen from a contributor other than myself, though you should probably wait until I finish #964, first, which lays important groundwork.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/974/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

2 rows from issues_id in issues_labels
6 rows from issue in issue_comments