home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

20 rows where issue = 374025325 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 8

  • bzah 5
  • ulijh 4
  • pl-marasco 4
  • shoyer 2
  • dcherian 2
  • rafa-guedes 1
  • roxyboy 1
  • cerodell 1

author_association 3

  • CONTRIBUTOR 10
  • NONE 6
  • MEMBER 4

issue 1

  • Array indexing with dask arrays · 20 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
992699334 https://github.com/pydata/xarray/issues/2511#issuecomment-992699334 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X847K2PG dcherian 2448579 2021-12-13T17:21:20Z 2021-12-13T17:21:20Z MEMBER

IIUC this cannot work lazily in most cases if you have dimension coordinate variables. When xarray constructs the output after indexing, it will try to index those coordinate variables so that it can associate the right timestamp (for e.g) with the output.

The example from @ulijh should work though (it has no dimension coordinate or indexed variables)

python import xarray as xr import dask.array as da import numpy as np da = xr.DataArray(np.random.rand(3*4*5).reshape((3,4,5))).chunk(dict(dim_0=1)) idcs = da.argmax('dim_2') da[dict(dim_2=idcs)]

The example by @rafa-guedes (thanks for that one!) could be made to work I think.

``` python import numpy as np import dask.array as da import xarray as xr

darr = xr.DataArray(data=[0.2, 0.4, 0.6], coords={"z": range(3)}, dims=("z",)) good_indexer = xr.DataArray( data=np.random.randint(0, 3, 8).reshape(4, 2).astype(int), coords={"y": range(4), "x": range(2)}, dims=("y", "x") ) bad_indexer = xr.DataArray( data=da.random.randint(0, 3, 8).reshape(4, 2).astype(int), coords={"y": range(4), "x": range(2)}, dims=("y", "x") )

In [5]: darr
Out[5]: <xarray.DataArray (z: 3)> array([0.2, 0.4, 0.6]) Coordinates: * z (z) int64 0 1 2

In [6]: good_indexer
Out[6]: <xarray.DataArray (y: 4, x: 2)> array([[0, 1], [2, 2], [1, 2], [1, 0]]) Coordinates: * y (y) int64 0 1 2 3 * x (x) int64 0 1

In [7]: bad_indexer
Out[7]: <xarray.DataArray 'reshape-417766b2035dcb1227ddde8505297039' (y: 4, x: 2)> dask.array<reshape, shape=(4, 2), dtype=int64, chunksize=(4, 2), chunktype=numpy.ndarray> Coordinates: * y (y) int64 0 1 2 3 * x (x) int64 0 1

In [8]: darr[good_indexer]
Out[8]: <xarray.DataArray (y: 4, x: 2)> array([[0.2, 0.4], [0.6, 0.6], [0.4, 0.6], [0.4, 0.2]]) Coordinates: z (y, x) int64 0 1 2 2 1 2 1 0 * y (y) int64 0 1 2 3 * x (x) int64 0 1 ```

We can copy the dimension coordinates of the output (x,y) directly from the indexer. And the dimension coordinate on the input (z) should be a dask array in the output (since z is not a dimension coordinate in the output, this should be fine)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
944328081 https://github.com/pydata/xarray/issues/2511#issuecomment-944328081 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X844SU2R bzah 16700639 2021-10-15T14:03:21Z 2021-10-15T14:03:21Z CONTRIBUTOR

I'll drop a PR, it might be easier to try and play with this than a piece of code lost in an issue.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
931430066 https://github.com/pydata/xarray/issues/2511#issuecomment-931430066 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843hH6y bzah 16700639 2021-09-30T15:30:02Z 2021-10-06T09:48:19Z CONTRIBUTOR

Okay I could re do my test. If I manually call compute() before doing isel(......) my whole computation takes about 5.65 seconds. However if I try with my naive patch it takes 32.34 seconds.

I'm sorry I cannot share as is my code, the relevant portion is really in the middle of many things. I'll try to get a minimalist version of it to share with you.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
930153816 https://github.com/pydata/xarray/issues/2511#issuecomment-930153816 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843cQVY bzah 16700639 2021-09-29T13:02:15Z 2021-10-06T09:46:10Z CONTRIBUTOR

@pl-marasco Ok that's strange. I should have saved my use case :/ I will try to reproduce it and will provide a gist of it soon.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
935769790 https://github.com/pydata/xarray/issues/2511#issuecomment-935769790 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843xra- pl-marasco 22492773 2021-10-06T08:47:24Z 2021-10-06T08:47:24Z NONE

@bzah I've been testing your code and I can confirm the increment of timing once the .compute() isn't in use. I've noticed that using your modification, seems that dask array is computed more than one time per sample. I've made some tests using a modified version from #3237 and here are my observations:

Assuming that we have only one sample object after the resample the expected result should be 1 compute and that's what we obtain if we call the computation before the .argmax() If .compute() is removed then I got 3 total computations. Just as a confirmation if you increase the sample you will get a multiple of 3 as a result of computes.

I still don't know the reason and if is correct or not but sounds weird to me; though it could explain the time increase.

@dcherian @shyer do you know if all this make any sense? should the .isel() automatically trig the computation or should give back a lazy array?

Here is the code I've been using (works only adding the modification proposed by @bzah)

``` import numpy as np import dask import xarray as xr

class Scheduler: """ From: https://stackoverflow.com/questions/53289286/ """

def __init__(self, max_computes=20):
    self.max_computes = max_computes
    self.total_computes = 0

def __call__(self, dsk, keys, **kwargs):
    self.total_computes += 1
    if self.total_computes > self.max_computes:
        raise RuntimeError(
            "Too many dask computations were scheduled: {}".format(
                self.total_computes
            )
        )
    return dask.get(dsk, keys, **kwargs)

scheduler = Scheduler()

with dask.config.set(scheduler=scheduler):

COORDS = dict(dim_0=pd.date_range("2042-01-01", periods=31, freq='D'),
              dim_1= range(0,500),
              dim_2= range(0,500))

da = xr.DataArray(np.random.rand(31 * 500 * 500).reshape((31, 500, 500)),
                  coords=COORDS).chunk(dict(dim_0=-1, dim_1=100, dim_2=100))

print(da)

resampled = da.resample(dim_0="MS")

for label, sample in resampled:

    #sample = sample.compute()
    idx = sample.argmax('dim_0')
    sampled = sample.isel(dim_0=idx)

print("Total number of computes: %d" % scheduler.total_computes)

```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
932582053 https://github.com/pydata/xarray/issues/2511#issuecomment-932582053 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843lhKl cerodell 38116316 2021-10-01T21:18:53Z 2021-10-01T21:20:49Z NONE

Hello! First off thank you for all the hard work on xarray! Use it every day and love it :)

I am also having issues indexing with dask arrays and get the following error.

``` Traceback (most recent call last): File "~/phd-comps/scripts/sfire-pbl.py", line 64, in <module> PBLH = height.isel(gradT2.argmax(dim=['interp_level'])) File "~/miniconda3/envs/cr/lib/python3.7/site-packages/xarray/core/dataarray.py", line 1184, in isel indexers, drop=drop, missing_dims=missing_dims File "~/miniconda3/envs/cr/lib/python3.7/site-packages/xarray/core/dataset.py", line 2389, in _isel_fancy new_var = var.isel(indexers=var_indexers) File "~/miniconda3/envs/cr/lib/python3.7/site-packages/xarray/core/variable.py", line 1156, in isel return self[key] File "~/miniconda3/envs/cr/lib/python3.7/site-packages/xarray/core/variable.py", line 776, in getitem dims, indexer, new_order = self._broadcast_indexes(key) File "~/miniconda3/envs/cr/lib/python3.7/site-packages/xarray/core/variable.py", line 632, in _broadcast_indexes return self._broadcast_indexes_vectorized(key) File "~/miniconda3/envs/cr/lib/python3.7/site-packages/xarray/core/variable.py", line 761, in _broadcast_indexes_vectorized return out_dims, VectorizedIndexer(tuple(out_key)), new_order File "~/miniconda3/envs/cr/lib/python3.7/site-packages/xarray/core/indexing.py", line 323, in init f"unexpected indexer type for {type(self).name}: {k!r}" TypeError: unexpected indexer type for VectorizedIndexer: dask.array<getitem, shape=(240, 399, 159), dtype=int64, chunksize=(60, 133, 53), chunktype=numpy.ndarray>

dask 2021.9.1 pyhd8ed1ab_0 conda-forge xarray 0.19.0 pyhd8ed1ab_0 conda-forge ```

In order to get it to work, I first need to manually call compute to load to NumPy array before using argmax with isel. Not sure what info I can provide to help solve the issue please let me know and ill send whatever I can.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
932229595 https://github.com/pydata/xarray/issues/2511#issuecomment-932229595 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843kLHb bzah 16700639 2021-10-01T13:29:32Z 2021-10-01T13:29:32Z CONTRIBUTOR

@pl-marasco Thanks for the example ! With it I have the same result as you, it takes the same time with patch or with compute.

However, I could construct an example giving very different results. It is quite close to my original code:

``` time_start = time.perf_counter() COORDS = dict( time=pd.date_range("2042-01-01", periods=200, freq=pd.DateOffset(days=1)), ) da = xr.DataArray( np.random.rand(200 * 3500 * 350).reshape((200, 3500, 350)), dims=('time', 'x', 'y'), coords=COORDS ).chunk(dict(time=-1, x=100, y=100))

resampled = da.resample(time="MS")

for label, sample in resampled:
    # sample = sample.compute()
    idx = sample.argmax('time')
    sample.isel(time=idx)

time_elapsed = time.perf_counter() - time_start
print(time_elapsed, " secs")

``` (Basically I want for each month the first event occurring in it).

Without the patch and uncommenting sample = sample.compute(), it takes 5.7 secs. With the patch it takes 53.9 seconds.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
932169790 https://github.com/pydata/xarray/issues/2511#issuecomment-932169790 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843j8g- pl-marasco 22492773 2021-10-01T12:04:55Z 2021-10-01T12:04:55Z NONE

@bzah I tested your patch with the following code:

``` import xarray as xr from distributed import Client client = Client()

da = xr.DataArray(np.random.rand(2035003500).reshape((20,3500,3500)), dims=('time', 'x', 'y')).chunk(dict(time=-1, x=100, y=100))

idx = da.argmax('time').compute() da.isel(time=idx) ```

In my case seems that with or without it takes the same time but I would like to know if is the same for you.

L.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
930309991 https://github.com/pydata/xarray/issues/2511#issuecomment-930309991 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843c2dn pl-marasco 22492773 2021-09-29T15:56:33Z 2021-09-29T15:56:33Z NONE

@pl-marasco Ok that's strange. I should have saved my use case :/ I will try to reproduce it and will provide a gist of it soon.

What I noticed, on my use case, is that it provoke a computation. Is that the reason for what you consider slow? Could be possible that is related to #3237 ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
930124657 https://github.com/pydata/xarray/issues/2511#issuecomment-930124657 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843cJNx pl-marasco 22492773 2021-09-29T12:22:06Z 2021-09-29T12:22:06Z NONE

@bzah I've been testing your solution and doesn't seems to slow as you are mentioning. Do you have a specific test to be conducted so that we can make a more robust comparison?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
922942743 https://github.com/pydata/xarray/issues/2511#issuecomment-922942743 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843Av0X bzah 16700639 2021-09-20T13:45:56Z 2021-09-20T13:45:56Z CONTRIBUTOR

I wrote a very naive fix, it works but seems to perform really slowly, I would appreciate some feedback (I'm a beginner with Dask). Basically, I added k = dask.array.asarray(k, dtype=np.int64) to do the exact same thing as with numpy. I can create a PR if it's better to review this

The patch: ``` class VectorizedIndexer(ExplicitIndexer): """Tuple for vectorized indexing.

All elements should be slice or N-dimensional np.ndarray objects with an
integer dtype and the same number of dimensions. Indexing follows proposed
rules for np.ndarray.vindex, which matches NumPy's advanced indexing rules
(including broadcasting) except sliced axes are always moved to the end:
https://github.com/numpy/numpy/pull/6256
"""

__slots__ = ()

def __init__(self, key):
    if not isinstance(key, tuple):
        raise TypeError(f"key must be a tuple: {key!r}")

    new_key = []
    ndim = None
    for k in key:
        if isinstance(k, slice):
            k = as_integer_slice(k)
        elif isinstance(k, np.ndarray) or isinstance(k, dask.array.Array):
            if not np.issubdtype(k.dtype, np.integer):
                raise TypeError(
                    f"invalid indexer array, does not have integer dtype: {k!r}"
                )
            if ndim is None:
                ndim = k.ndim
            elif ndim != k.ndim:
                ndims = [k.ndim for k in key if isinstance(k, np.ndarray)]
                raise ValueError(
                    "invalid indexer key: ndarray arguments "
                    f"have different numbers of dimensions: {ndims}"
                )
            if isinstance(k, dask.array.Array):
                k = dask.array.asarray(k, dtype=np.int64)
            else:
                k = np.asarray(k, dtype=np.int64)
        else:
            raise TypeError(
                f"unexpected indexer type for {type(self).__name__}: {k!r}"
            )
        new_key.append(k)

    super().__init__(new_key)

```

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 2,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
568107398 https://github.com/pydata/xarray/issues/2511#issuecomment-568107398 https://api.github.com/repos/pydata/xarray/issues/2511 MDEyOklzc3VlQ29tbWVudDU2ODEwNzM5OA== dcherian 2448579 2019-12-20T22:14:34Z 2019-12-20T22:14:34Z MEMBER

I don't think any one is working on it. We would appreciate it if you could try to fix it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
567966648 https://github.com/pydata/xarray/issues/2511#issuecomment-567966648 https://api.github.com/repos/pydata/xarray/issues/2511 MDEyOklzc3VlQ29tbWVudDU2Nzk2NjY0OA== roxyboy 8934026 2019-12-20T15:37:09Z 2019-12-20T15:39:10Z NONE

I'm just curious if there's been any progress on this issue. I'm also getting the same error: TypeError: unexpected indexer type for VectorizedIndexer and I would greatly benefit from lazy vectorized indexing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
563330352 https://github.com/pydata/xarray/issues/2511#issuecomment-563330352 https://api.github.com/repos/pydata/xarray/issues/2511 MDEyOklzc3VlQ29tbWVudDU2MzMzMDM1Mg== rafa-guedes 7799184 2019-12-09T16:53:38Z 2019-12-09T16:53:38Z CONTRIBUTOR

I'm having similar issue, here is an example:

``` import numpy as np import dask.array as da import xarray as xr

darr = xr.DataArray(data=[0.2, 0.4, 0.6], coords={"z": range(3)}, dims=("z",)) good_indexer = xr.DataArray( data=np.random.randint(0, 3, 8).reshape(4, 2).astype(int), coords={"y": range(4), "x": range(2)}, dims=("y", "x") ) bad_indexer = xr.DataArray( data=da.random.randint(0, 3, 8).reshape(4, 2).astype(int), coords={"y": range(4), "x": range(2)}, dims=("y", "x") )

In [5]: darr
Out[5]: <xarray.DataArray (z: 3)> array([0.2, 0.4, 0.6]) Coordinates: * z (z) int64 0 1 2

In [6]: good_indexer
Out[6]: <xarray.DataArray (y: 4, x: 2)> array([[0, 1], [2, 2], [1, 2], [1, 0]]) Coordinates: * y (y) int64 0 1 2 3 * x (x) int64 0 1

In [7]: bad_indexer
Out[7]: <xarray.DataArray 'reshape-417766b2035dcb1227ddde8505297039' (y: 4, x: 2)> dask.array<reshape, shape=(4, 2), dtype=int64, chunksize=(4, 2), chunktype=numpy.ndarray> Coordinates: * y (y) int64 0 1 2 3 * x (x) int64 0 1

In [8]: darr[good_indexer]
Out[8]: <xarray.DataArray (y: 4, x: 2)> array([[0.2, 0.4], [0.6, 0.6], [0.4, 0.6], [0.4, 0.2]]) Coordinates: z (y, x) int64 0 1 2 2 1 2 1 0 * y (y) int64 0 1 2 3 * x (x) int64 0 1

In [9]: darr[bad_indexer]

TypeError Traceback (most recent call last) <ipython-input-8-2a57c1a2eade> in <module> ----> 1 darr[bad_indexer]

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/dataarray.py in getitem(self, key) 638 else: 639 # xarray-style array indexing --> 640 return self.isel(indexers=self._item_key_to_dict(key)) 641 642 def setitem(self, key: Any, value: Any) -> None:

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/dataarray.py in isel(self, indexers, drop, **indexers_kwargs) 1012 """ 1013 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel") -> 1014 ds = self._to_temp_dataset().isel(drop=drop, indexers=indexers) 1015 return self._from_temp_dataset(ds) 1016

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/dataset.py in isel(self, indexers, drop, **indexers_kwargs) 1920 if name in self.indexes: 1921 new_var, new_index = isel_variable_and_index( -> 1922 name, var, self.indexes[name], var_indexers 1923 ) 1924 if new_index is not None:

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/indexes.py in isel_variable_and_index(name, variable, index, indexers) 79 ) 80 ---> 81 new_variable = variable.isel(indexers) 82 83 if new_variable.dims != (name,):

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in isel(self, indexers, **indexers_kwargs) 1052 1053 key = tuple(indexers.get(dim, slice(None)) for dim in self.dims) -> 1054 return self[key] 1055 1056 def squeeze(self, dim=None):

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in getitem(self, key) 700 array x.values directly. 701 """ --> 702 dims, indexer, new_order = self._broadcast_indexes(key) 703 data = as_indexable(self._data)[indexer] 704 if new_order:

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in _broadcast_indexes(self, key) 557 if isinstance(k, Variable): 558 if len(k.dims) > 1: --> 559 return self._broadcast_indexes_vectorized(key) 560 dims.append(k.dims[0]) 561 elif not isinstance(k, integer_types):

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in _broadcast_indexes_vectorized(self, key) 685 new_order = None 686 --> 687 return out_dims, VectorizedIndexer(tuple(out_key)), new_order 688 689 def getitem(self: VariableType, key) -> VariableType:

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/indexing.py in init(self, key) 447 else: 448 raise TypeError( --> 449 f"unexpected indexer type for {type(self).name}: {k!r}" 450 ) 451 new_key.append(k)

TypeError: unexpected indexer type for VectorizedIndexer: dask.array<reshape, shape=(4, 2), dtype=int64, chunksize=(4, 2), chunktype=numpy.ndarray>

In [10]: xr.version
Out[10]: '0.14.1'

In [11]: import dask; dask.version
Out[11]: '2.9.0' ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
525634152 https://github.com/pydata/xarray/issues/2511#issuecomment-525634152 https://api.github.com/repos/pydata/xarray/issues/2511 MDEyOklzc3VlQ29tbWVudDUyNTYzNDE1Mg== ulijh 13190237 2019-08-28T08:12:13Z 2019-08-28T08:12:13Z CONTRIBUTOR

I think the problem is somewhere here:

https://github.com/pydata/xarray/blob/aaeea6250b89e3605ee1d1a160ad50d6ed657c7e/xarray/core/utils.py#L85-L103

I don't think pandas.Index can hold lazy arrays. Could there be a way around exploiting dask.dataframe indexing methods?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
523149751 https://github.com/pydata/xarray/issues/2511#issuecomment-523149751 https://api.github.com/repos/pydata/xarray/issues/2511 MDEyOklzc3VlQ29tbWVudDUyMzE0OTc1MQ== shoyer 1217238 2019-08-20T18:56:18Z 2019-08-20T18:56:18Z MEMBER

Yes, something seems to be going wrong here...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
522986699 https://github.com/pydata/xarray/issues/2511#issuecomment-522986699 https://api.github.com/repos/pydata/xarray/issues/2511 MDEyOklzc3VlQ29tbWVudDUyMjk4NjY5OQ== ulijh 13190237 2019-08-20T12:15:18Z 2019-08-20T18:52:49Z CONTRIBUTOR

Even though the example from above does work, sadly, the following does not: python import xarray as xr import dask.array as da import numpy as np da = xr.DataArray(np.random.rand(3*4*5).reshape((3,4,5))).chunk(dict(dim_0=1)) idcs = da.argmax('dim_2') da[dict(dim_2=idcs)] results in ``` python


TypeError Traceback (most recent call last) <ipython-input-4-3542cdd6d61c> in <module> ----> 1 da[dict(dim_2=idcs)]

~/src/xarray/xarray/core/dataarray.py in getitem(self, key) 604 else: 605 # xarray-style array indexing --> 606 return self.isel(indexers=self._item_key_to_dict(key)) 607 608 def setitem(self, key: Any, value: Any) -> None:

~/src/xarray/xarray/core/dataarray.py in isel(self, indexers, drop, **indexers_kwargs) 986 """ 987 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel") --> 988 ds = self._to_temp_dataset().isel(drop=drop, indexers=indexers) 989 return self._from_temp_dataset(ds) 990

~/src/xarray/xarray/core/dataset.py in isel(self, indexers, drop, **indexers_kwargs) 1901 indexes[name] = new_index 1902 else: -> 1903 new_var = var.isel(indexers=var_indexers) 1904 1905 variables[name] = new_var

~/src/xarray/xarray/core/variable.py in isel(self, indexers, drop, **indexers_kwargs) 984 if dim in indexers: 985 key[i] = indexers[dim] --> 986 return self[tuple(key)] 987 988 def squeeze(self, dim=None):

~/src/xarray/xarray/core/variable.py in getitem(self, key) 675 array x.values directly. 676 """ --> 677 dims, indexer, new_order = self._broadcast_indexes(key) 678 data = as_indexable(self._data)[indexer] 679 if new_order:

~/src/xarray/xarray/core/variable.py in _broadcast_indexes(self, key) 532 if isinstance(k, Variable): 533 if len(k.dims) > 1: --> 534 return self._broadcast_indexes_vectorized(key) 535 dims.append(k.dims[0]) 536 elif not isinstance(k, integer_types):

~/src/xarray/xarray/core/variable.py in _broadcast_indexes_vectorized(self, key) 660 new_order = None 661 --> 662 return out_dims, VectorizedIndexer(tuple(out_key)), new_order 663 664 def getitem(self, key):

~/src/xarray/xarray/core/indexing.py in init(self, key) 460 raise TypeError( 461 "unexpected indexer type for {}: {!r}".format( --> 462 type(self).name, k 463 ) 464 )

TypeError: unexpected indexer type for VectorizedIndexer: dask.array<arg_agg-aggregate, shape=(3, 4), dtype=int64, chunksize=(1, 4)> ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
498178025 https://github.com/pydata/xarray/issues/2511#issuecomment-498178025 https://api.github.com/repos/pydata/xarray/issues/2511 MDEyOklzc3VlQ29tbWVudDQ5ODE3ODAyNQ== ulijh 13190237 2019-06-03T09:13:49Z 2019-06-03T09:13:49Z CONTRIBUTOR

As of version 0.12 indexing with dask arrays works out of the box... I think this can be closed now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
433304954 https://github.com/pydata/xarray/issues/2511#issuecomment-433304954 https://api.github.com/repos/pydata/xarray/issues/2511 MDEyOklzc3VlQ29tbWVudDQzMzMwNDk1NA== ulijh 13190237 2018-10-26T06:48:54Z 2018-10-26T06:48:54Z CONTRIBUTOR

It seem's working fine with the following change but it has a lot of dublicated code... ``` diff --git a/xarray/core/indexing.py b/xarray/core/indexing.py index d51da471..9fe93581 100644 --- a/xarray/core/indexing.py +++ b/xarray/core/indexing.py @@ -7,6 +7,7 @@ from datetime import timedelta

import numpy as np import pandas as pd +import dask.array as da

from . import duck_array_ops, nputils, utils from .pycompat import ( @@ -420,6 +421,19 @@ class VectorizedIndexer(ExplicitIndexer): 'have different numbers of dimensions: {}' .format(ndims)) k = np.asarray(k, dtype=np.int64) + elif isinstance(k, dask_array_type): + if not np.issubdtype(k.dtype, np.integer): + raise TypeError('invalid indexer array, does not have ' + 'integer dtype: {!r}'.format(k)) + if ndim is None: + ndim = k.ndim + elif ndim != k.ndim: + ndims = [k.ndim for k in key + if isinstance(k, (np.ndarray) + dask_array_type)] + raise ValueError('invalid indexer key: ndarray arguments ' + 'have different numbers of dimensions: {}' + .format(ndims)) + k = da.array(k, dtype=np.int64) else: raise TypeError('unexpected indexer type for {}: {!r}' .format(type(self).name, k)) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
433128556 https://github.com/pydata/xarray/issues/2511#issuecomment-433128556 https://api.github.com/repos/pydata/xarray/issues/2511 MDEyOklzc3VlQ29tbWVudDQzMzEyODU1Ng== shoyer 1217238 2018-10-25T16:59:28Z 2018-10-25T16:59:28Z MEMBER

For reference, here's the current stacktrace/error message: ```python-traceback


TypeError Traceback (most recent call last) <ipython-input-7-74fe4ba70f9d> in <module>() ----> 1 da[{'dim_1' : indc}]

/usr/local/lib/python3.6/dist-packages/xarray/core/dataarray.py in getitem(self, key) 472 else: 473 # xarray-style array indexing --> 474 return self.isel(indexers=self._item_key_to_dict(key)) 475 476 def setitem(self, key, value):

/usr/local/lib/python3.6/dist-packages/xarray/core/dataarray.py in isel(self, indexers, drop, **indexers_kwargs) 817 """ 818 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, 'isel') --> 819 ds = self._to_temp_dataset().isel(drop=drop, indexers=indexers) 820 return self._from_temp_dataset(ds) 821

/usr/local/lib/python3.6/dist-packages/xarray/core/dataset.py in isel(self, indexers, drop, **indexers_kwargs) 1537 for name, var in iteritems(self._variables): 1538 var_indexers = {k: v for k, v in indexers_list if k in var.dims} -> 1539 new_var = var.isel(indexers=var_indexers) 1540 if not (drop and name in var_indexers): 1541 variables[name] = new_var

/usr/local/lib/python3.6/dist-packages/xarray/core/variable.py in isel(self, indexers, drop, **indexers_kwargs) 905 if dim in indexers: 906 key[i] = indexers[dim] --> 907 return self[tuple(key)] 908 909 def squeeze(self, dim=None):

/usr/local/lib/python3.6/dist-packages/xarray/core/variable.py in getitem(self, key) 614 array x.values directly. 615 """ --> 616 dims, indexer, new_order = self._broadcast_indexes(key) 617 data = as_indexable(self._data)[indexer] 618 if new_order:

/usr/local/lib/python3.6/dist-packages/xarray/core/variable.py in _broadcast_indexes(self, key) 487 return self._broadcast_indexes_outer(key) 488 --> 489 return self._broadcast_indexes_vectorized(key) 490 491 def _broadcast_indexes_basic(self, key):

/usr/local/lib/python3.6/dist-packages/xarray/core/variable.py in _broadcast_indexes_vectorized(self, key) 599 new_order = None 600 --> 601 return out_dims, VectorizedIndexer(tuple(out_key)), new_order 602 603 def getitem(self, key):

/usr/local/lib/python3.6/dist-packages/xarray/core/indexing.py in init(self, key) 423 else: 424 raise TypeError('unexpected indexer type for {}: {!r}' --> 425 .format(type(self).name, k)) 426 new_key.append(k) 427

TypeError: unexpected indexer type for VectorizedIndexer: dask.array<xarray-\<this-array>, shape=(10,), dtype=int64, chunksize=(2,)> ```

It looks like we could support this relatively easily since dask.array supports indexing with dask arrays now. This would be a welcome enhancement!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.261ms · About: xarray-datasette