html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1553#issuecomment-748491929,https://api.github.com/repos/pydata/xarray/issues/1553,748491929,MDEyOklzc3VlQ29tbWVudDc0ODQ5MTkyOQ==,18488,2020-12-19T16:00:00Z,2020-12-19T16:00:00Z,NONE,"For the case of a simple vectorized `reindex` you can work around the lack of a multi-dimensional `DataArray.reindex` by falling back on `isel` as follows:
```
def reindex_vectorized(da, indexers, method=None, tolerance=None, dim=None, fill_value=None):
# Reindex does not presently support vectorized lookups: https://github.com/pydata/xarray/issues/1553
# Sel does (e.g. https://github.com/pydata/xarray/issues/4630) but can't handle missing keys
if dim is None:
dim = 'dim_0'
if fill_value is None:
fill_value = {'i': np.nan, 'f': np.nan}[da.dtype.kind]
dtype = np.result_type(fill_value, da.dtype)
if method is None:
method = {}
elif not isinstance(method, dict):
method = {dim: method for dim in da.dims}
if tolerance is None:
tolerance = {}
elif not isinstance(tolerance, dict):
tolerance = {dim: tolerance for dim in da.dims}
ixs = {}
masks = []
any_empty = False
for index_dim, index in indexers.items():
ix = da.indexes[index_dim].get_indexer(index, method=method.get(index_dim), tolerance=tolerance.get(index_dim))
ixs[index_dim] = xr.DataArray(np.fmax(0, ix), dims=[dim])
masks.append(ix >= 0)
any_empty = any_empty or (len(da.indexes[index_dim]) == 0)
mask = functools.reduce(lambda x, y: x & y, masks)
if any_empty and len(mask):
# Unfortunately can't just isel with `ixs` in this special case, because we'll go out of bounds accessing index 0
new_coords = {
name: coord
for name, coord in da.coords.items()
# XXX: to match the other case we should really include coords with name in ixs too, but it's fiddly
if name not in ixs
}
new_dims = [name for name in da.dims if name not in ixs] + [dim]
result = xr.DataArray(
data=np.broadcast_to(
fill_value,
tuple(n for name, n in da.sizes.items() if name not in ixs) + (len(mask),)
),
coords=new_coords, dims=new_dims,
name=da.name, attrs=da.attrs
)
else:
result = da[ixs]
if not mask.all():
result = result.astype(dtype, copy=False)
result[{dim: ~mask}] = fill_value
return result
```
Example:
```
sensor_data = xr.DataArray(np.arange(6).reshape((3, 2)), coords=[
('time', [0, 2, 3]),
('sensor', ['A', 'C']),
])
reindex_vectorized(sensor_data, {
'sensor': ['A', 'A', 'A', 'B', 'C'],
'time': [0, 1, 2, 0, 0],
}, method={'time': 'ffill'})
# [0, 0, 2, nan, 1]
reindex_vectorized(xr.DataArray(coords=[
('sensor', []),
('time', [0, 2])
]), {
'sensor': ['A', 'A', 'A', 'B', 'C'],
'time': [0, 1, 2, 0, 0],
}, method={'time': 'ffill'})
# [nan, nan, nan, nan, nan]
```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,254927382
https://github.com/pydata/xarray/issues/4714#issuecomment-748486801,https://api.github.com/repos/pydata/xarray/issues/4714,748486801,MDEyOklzc3VlQ29tbWVudDc0ODQ4NjgwMQ==,18488,2020-12-19T15:13:36Z,2020-12-19T15:14:59Z,NONE,"Thanks for the response. I think `reindex` would need to be changed as well because this code:
```python
sensor_data.reindex({
'time': [1],
'sensor': ['A', 'B']
}, method='ffill')
```
Is not equivalent to this code:
```python
sensor_data.reindex({
'time': [1],
'sensor': ['A', 'B']
}).ffill(dim='time').ffill(dim='sensor')
```
So if I understand your `to_dataset` idea correctly, you are proposing:
```python
ds = sensor_data.to_dataset(dim='sensor')
xr.concat([
ds[sensor].sel({'time': time}, method='ffill', drop=True)
for sensor, time in zip(['A', 'A', 'A', 'B', 'C'], [0, 1, 2, 0, 0])
], dim='sample')
```
I guess this works but it's a bit cumbersome and unlikely to be fast. I think there must be something I'm not understanding here - I'm not familiar with all the nuances of the `xarray` api.
Your idea of `reindex` followed by `sel` is an interesting one, but it does do something slightly different than what I was asking for: it does not fail if one of the sensors in the query list is missing, but rather inserts a NaN. I suppose you could fix this by doing an extra check afterwards, assuming that your original pre-reindex data contained no NaNs.
In general `min(S*N,T*N)` could be much larger than `S*T`, so for big queries it's quite possible that you wouldn't have enough space to allocate the intermediate even if you could fit 100s of copies of the original `S*T` matrix. Using a dask cluster would make this situation less likely of course, but it seems like it would be better to avoid all this copying (even on a beefy cluster) even if just for performance reasons.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,771382653
https://github.com/pydata/xarray/issues/4714#issuecomment-748479287,https://api.github.com/repos/pydata/xarray/issues/4714,748479287,MDEyOklzc3VlQ29tbWVudDc0ODQ3OTI4Nw==,18488,2020-12-19T14:06:36Z,2020-12-19T14:06:36Z,NONE,"Thanks for the suggestion. One issue with this alternative is it creates a potentially large intermediate object.
If you have T times and S sensors, and want to sample them at N (time, sensor) pairs, then the intermediate object with your approach has size `T*N` (if you index sensors first) or `S*N` (if you index time first). If you can index both dimensions in one `sel` call then we should only need to allocate memory for the result of size `N`, which is considerably better.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,771382653
https://github.com/pydata/xarray/issues/4714#issuecomment-748477889,https://api.github.com/repos/pydata/xarray/issues/4714,748477889,MDEyOklzc3VlQ29tbWVudDc0ODQ3Nzg4OQ==,18488,2020-12-19T13:53:53Z,2020-12-19T13:53:53Z,NONE,I guess it would also make sense to have this in `reindex` if you did decide to add it.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,771382653