html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2511#issuecomment-944328081,https://api.github.com/repos/pydata/xarray/issues/2511,944328081,IC_kwDOAMm_X844SU2R,16700639,2021-10-15T14:03:21Z,2021-10-15T14:03:21Z,CONTRIBUTOR,"I'll drop a PR, it might be easier to try and play with this than a piece of code lost in an issue.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374025325
https://github.com/pydata/xarray/issues/2511#issuecomment-931430066,https://api.github.com/repos/pydata/xarray/issues/2511,931430066,IC_kwDOAMm_X843hH6y,16700639,2021-09-30T15:30:02Z,2021-10-06T09:48:19Z,CONTRIBUTOR,"Okay I could re do my test.
If I manually call `compute()` before doing `isel(......)` my whole computation takes about **5.65 seconds**.
However if I try with my naive patch it takes **32.34 seconds**.
I'm sorry I cannot share as is my code, the relevant portion is really in the middle of many things.
I'll try to get a minimalist version of it to share with you.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374025325
https://github.com/pydata/xarray/issues/2511#issuecomment-930153816,https://api.github.com/repos/pydata/xarray/issues/2511,930153816,IC_kwDOAMm_X843cQVY,16700639,2021-09-29T13:02:15Z,2021-10-06T09:46:10Z,CONTRIBUTOR,"@pl-marasco Ok that's strange.
I should have saved my use case :/
I will try to reproduce it and will provide a gist of it soon.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374025325
https://github.com/pydata/xarray/issues/2511#issuecomment-932229595,https://api.github.com/repos/pydata/xarray/issues/2511,932229595,IC_kwDOAMm_X843kLHb,16700639,2021-10-01T13:29:32Z,2021-10-01T13:29:32Z,CONTRIBUTOR,"@pl-marasco Thanks for the example !
With it I have the same result as you, it takes the same time with patch or with compute.
However, I could construct an example giving very different results. It is quite close to my original code:
```
time_start = time.perf_counter()
COORDS = dict(
time=pd.date_range(""2042-01-01"", periods=200,
freq=pd.DateOffset(days=1)),
)
da = xr.DataArray(
np.random.rand(200 * 3500 * 350).reshape((200, 3500, 350)),
dims=('time', 'x', 'y'),
coords=COORDS
).chunk(dict(time=-1, x=100, y=100))
resampled = da.resample(time=""MS"")
for label, sample in resampled:
# sample = sample.compute()
idx = sample.argmax('time')
sample.isel(time=idx)
time_elapsed = time.perf_counter() - time_start
print(time_elapsed, "" secs"")
```
(Basically I want for each month the first event occurring in it).
Without the patch and uncommenting `sample = sample.compute()`, it takes 5.7 secs.
With the patch it takes 53.9 seconds.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374025325
https://github.com/pydata/xarray/issues/2511#issuecomment-922942743,https://api.github.com/repos/pydata/xarray/issues/2511,922942743,IC_kwDOAMm_X843Av0X,16700639,2021-09-20T13:45:56Z,2021-09-20T13:45:56Z,CONTRIBUTOR,"I wrote a very naive fix, it works but seems to perform **really** slowly, I would appreciate some feedback (I'm a beginner with Dask).
Basically, I added `k = dask.array.asarray(k, dtype=np.int64)` to do the exact same thing as with numpy.
_I can create a PR if it's better to review this_
The patch:
```
class VectorizedIndexer(ExplicitIndexer):
""""""Tuple for vectorized indexing.
All elements should be slice or N-dimensional np.ndarray objects with an
integer dtype and the same number of dimensions. Indexing follows proposed
rules for np.ndarray.vindex, which matches NumPy's advanced indexing rules
(including broadcasting) except sliced axes are always moved to the end:
https://github.com/numpy/numpy/pull/6256
""""""
__slots__ = ()
def __init__(self, key):
if not isinstance(key, tuple):
raise TypeError(f""key must be a tuple: {key!r}"")
new_key = []
ndim = None
for k in key:
if isinstance(k, slice):
k = as_integer_slice(k)
elif isinstance(k, np.ndarray) or isinstance(k, dask.array.Array):
if not np.issubdtype(k.dtype, np.integer):
raise TypeError(
f""invalid indexer array, does not have integer dtype: {k!r}""
)
if ndim is None:
ndim = k.ndim
elif ndim != k.ndim:
ndims = [k.ndim for k in key if isinstance(k, np.ndarray)]
raise ValueError(
""invalid indexer key: ndarray arguments ""
f""have different numbers of dimensions: {ndims}""
)
if isinstance(k, dask.array.Array):
k = dask.array.asarray(k, dtype=np.int64)
else:
k = np.asarray(k, dtype=np.int64)
else:
raise TypeError(
f""unexpected indexer type for {type(self).__name__}: {k!r}""
)
new_key.append(k)
super().__init__(new_key)
```
","{""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 2, ""eyes"": 0}",,374025325
https://github.com/pydata/xarray/issues/2511#issuecomment-563330352,https://api.github.com/repos/pydata/xarray/issues/2511,563330352,MDEyOklzc3VlQ29tbWVudDU2MzMzMDM1Mg==,7799184,2019-12-09T16:53:38Z,2019-12-09T16:53:38Z,CONTRIBUTOR,"I'm having similar issue, here is an example:
```
import numpy as np
import dask.array as da
import xarray as xr
darr = xr.DataArray(data=[0.2, 0.4, 0.6], coords={""z"": range(3)}, dims=(""z"",))
good_indexer = xr.DataArray(
data=np.random.randint(0, 3, 8).reshape(4, 2).astype(int),
coords={""y"": range(4), ""x"": range(2)},
dims=(""y"", ""x"")
)
bad_indexer = xr.DataArray(
data=da.random.randint(0, 3, 8).reshape(4, 2).astype(int),
coords={""y"": range(4), ""x"": range(2)},
dims=(""y"", ""x"")
)
In [5]: darr
Out[5]:
array([0.2, 0.4, 0.6])
Coordinates:
* z (z) int64 0 1 2
In [6]: good_indexer
Out[6]:
array([[0, 1],
[2, 2],
[1, 2],
[1, 0]])
Coordinates:
* y (y) int64 0 1 2 3
* x (x) int64 0 1
In [7]: bad_indexer
Out[7]:
dask.array
Coordinates:
* y (y) int64 0 1 2 3
* x (x) int64 0 1
In [8]: darr[good_indexer]
Out[8]:
array([[0.2, 0.4],
[0.6, 0.6],
[0.4, 0.6],
[0.4, 0.2]])
Coordinates:
z (y, x) int64 0 1 2 2 1 2 1 0
* y (y) int64 0 1 2 3
* x (x) int64 0 1
In [9]: darr[bad_indexer]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
----> 1 darr[bad_indexer]
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/dataarray.py in __getitem__(self, key)
638 else:
639 # xarray-style array indexing
--> 640 return self.isel(indexers=self._item_key_to_dict(key))
641
642 def __setitem__(self, key: Any, value: Any) -> None:
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/dataarray.py in isel(self, indexers, drop, **indexers_kwargs)
1012 """"""
1013 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, ""isel"")
-> 1014 ds = self._to_temp_dataset().isel(drop=drop, indexers=indexers)
1015 return self._from_temp_dataset(ds)
1016
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/dataset.py in isel(self, indexers, drop, **indexers_kwargs)
1920 if name in self.indexes:
1921 new_var, new_index = isel_variable_and_index(
-> 1922 name, var, self.indexes[name], var_indexers
1923 )
1924 if new_index is not None:
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/indexes.py in isel_variable_and_index(name, variable, index, indexers)
79 )
80
---> 81 new_variable = variable.isel(indexers)
82
83 if new_variable.dims != (name,):
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in isel(self, indexers, **indexers_kwargs)
1052
1053 key = tuple(indexers.get(dim, slice(None)) for dim in self.dims)
-> 1054 return self[key]
1055
1056 def squeeze(self, dim=None):
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in __getitem__(self, key)
700 array `x.values` directly.
701 """"""
--> 702 dims, indexer, new_order = self._broadcast_indexes(key)
703 data = as_indexable(self._data)[indexer]
704 if new_order:
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in _broadcast_indexes(self, key)
557 if isinstance(k, Variable):
558 if len(k.dims) > 1:
--> 559 return self._broadcast_indexes_vectorized(key)
560 dims.append(k.dims[0])
561 elif not isinstance(k, integer_types):
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in _broadcast_indexes_vectorized(self, key)
685 new_order = None
686
--> 687 return out_dims, VectorizedIndexer(tuple(out_key)), new_order
688
689 def __getitem__(self: VariableType, key) -> VariableType:
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/indexing.py in __init__(self, key)
447 else:
448 raise TypeError(
--> 449 f""unexpected indexer type for {type(self).__name__}: {k!r}""
450 )
451 new_key.append(k)
TypeError: unexpected indexer type for VectorizedIndexer: dask.array
In [10]: xr.__version__
Out[10]: '0.14.1'
In [11]: import dask; dask.__version__
Out[11]: '2.9.0'
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374025325
https://github.com/pydata/xarray/issues/2511#issuecomment-525634152,https://api.github.com/repos/pydata/xarray/issues/2511,525634152,MDEyOklzc3VlQ29tbWVudDUyNTYzNDE1Mg==,13190237,2019-08-28T08:12:13Z,2019-08-28T08:12:13Z,CONTRIBUTOR,"I think the problem is somewhere here:
https://github.com/pydata/xarray/blob/aaeea6250b89e3605ee1d1a160ad50d6ed657c7e/xarray/core/utils.py#L85-L103
I don't think `pandas.Index` can hold lazy arrays. Could there be a way around exploiting `dask.dataframe` indexing methods?","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374025325
https://github.com/pydata/xarray/issues/2511#issuecomment-522986699,https://api.github.com/repos/pydata/xarray/issues/2511,522986699,MDEyOklzc3VlQ29tbWVudDUyMjk4NjY5OQ==,13190237,2019-08-20T12:15:18Z,2019-08-20T18:52:49Z,CONTRIBUTOR,"Even though the example from above does work, sadly, the following does not:
``` python
import xarray as xr
import dask.array as da
import numpy as np
da = xr.DataArray(np.random.rand(3*4*5).reshape((3,4,5))).chunk(dict(dim_0=1))
idcs = da.argmax('dim_2')
da[dict(dim_2=idcs)]
```
results in
``` python
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
----> 1 da[dict(dim_2=idcs)]
~/src/xarray/xarray/core/dataarray.py in __getitem__(self, key)
604 else:
605 # xarray-style array indexing
--> 606 return self.isel(indexers=self._item_key_to_dict(key))
607
608 def __setitem__(self, key: Any, value: Any) -> None:
~/src/xarray/xarray/core/dataarray.py in isel(self, indexers, drop, **indexers_kwargs)
986 """"""
987 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, ""isel"")
--> 988 ds = self._to_temp_dataset().isel(drop=drop, indexers=indexers)
989 return self._from_temp_dataset(ds)
990
~/src/xarray/xarray/core/dataset.py in isel(self, indexers, drop, **indexers_kwargs)
1901 indexes[name] = new_index
1902 else:
-> 1903 new_var = var.isel(indexers=var_indexers)
1904
1905 variables[name] = new_var
~/src/xarray/xarray/core/variable.py in isel(self, indexers, drop, **indexers_kwargs)
984 if dim in indexers:
985 key[i] = indexers[dim]
--> 986 return self[tuple(key)]
987
988 def squeeze(self, dim=None):
~/src/xarray/xarray/core/variable.py in __getitem__(self, key)
675 array `x.values` directly.
676 """"""
--> 677 dims, indexer, new_order = self._broadcast_indexes(key)
678 data = as_indexable(self._data)[indexer]
679 if new_order:
~/src/xarray/xarray/core/variable.py in _broadcast_indexes(self, key)
532 if isinstance(k, Variable):
533 if len(k.dims) > 1:
--> 534 return self._broadcast_indexes_vectorized(key)
535 dims.append(k.dims[0])
536 elif not isinstance(k, integer_types):
~/src/xarray/xarray/core/variable.py in _broadcast_indexes_vectorized(self, key)
660 new_order = None
661
--> 662 return out_dims, VectorizedIndexer(tuple(out_key)), new_order
663
664 def __getitem__(self, key):
~/src/xarray/xarray/core/indexing.py in __init__(self, key)
460 raise TypeError(
461 ""unexpected indexer type for {}: {!r}"".format(
--> 462 type(self).__name__, k
463 )
464 )
TypeError: unexpected indexer type for VectorizedIndexer: dask.array
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374025325
https://github.com/pydata/xarray/issues/2511#issuecomment-498178025,https://api.github.com/repos/pydata/xarray/issues/2511,498178025,MDEyOklzc3VlQ29tbWVudDQ5ODE3ODAyNQ==,13190237,2019-06-03T09:13:49Z,2019-06-03T09:13:49Z,CONTRIBUTOR,As of version 0.12 indexing with dask arrays works out of the box... I think this can be closed now.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374025325
https://github.com/pydata/xarray/issues/2511#issuecomment-433304954,https://api.github.com/repos/pydata/xarray/issues/2511,433304954,MDEyOklzc3VlQ29tbWVudDQzMzMwNDk1NA==,13190237,2018-10-26T06:48:54Z,2018-10-26T06:48:54Z,CONTRIBUTOR,"It seem's working fine with the following change but it has a lot of dublicated code...
```
diff --git a/xarray/core/indexing.py b/xarray/core/indexing.py
index d51da471..9fe93581 100644
--- a/xarray/core/indexing.py
+++ b/xarray/core/indexing.py
@@ -7,6 +7,7 @@ from datetime import timedelta
import numpy as np
import pandas as pd
+import dask.array as da
from . import duck_array_ops, nputils, utils
from .pycompat import (
@@ -420,6 +421,19 @@ class VectorizedIndexer(ExplicitIndexer):
'have different numbers of dimensions: {}'
.format(ndims))
k = np.asarray(k, dtype=np.int64)
+ elif isinstance(k, dask_array_type):
+ if not np.issubdtype(k.dtype, np.integer):
+ raise TypeError('invalid indexer array, does not have '
+ 'integer dtype: {!r}'.format(k))
+ if ndim is None:
+ ndim = k.ndim
+ elif ndim != k.ndim:
+ ndims = [k.ndim for k in key
+ if isinstance(k, (np.ndarray) + dask_array_type)]
+ raise ValueError('invalid indexer key: ndarray arguments '
+ 'have different numbers of dimensions: {}'
+ .format(ndims))
+ k = da.array(k, dtype=np.int64)
else:
raise TypeError('unexpected indexer type for {}: {!r}'
.format(type(self).__name__, k))
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374025325