issue_comments
6 rows where author_association = "MEMBER" and issue = 759709924 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Fancy indexing a Dataset with dask DataArray triggers multiple computes · 6 ✖
| id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue | 
|---|---|---|---|---|---|---|---|---|---|---|---|
| 802953719 | https://github.com/pydata/xarray/issues/4663#issuecomment-802953719 | https://api.github.com/repos/pydata/xarray/issues/4663 | MDEyOklzc3VlQ29tbWVudDgwMjk1MzcxOQ== | dcherian 2448579 | 2021-03-19T16:23:32Z | 2021-03-19T16:23:32Z | MEMBER | 
 ouch. thanks for raising that issue. 
 I think we'd be open to adding a  | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | Fancy indexing a Dataset with dask DataArray triggers multiple computes 759709924 | |
| 802058819 | https://github.com/pydata/xarray/issues/4663#issuecomment-802058819 | https://api.github.com/repos/pydata/xarray/issues/4663 | MDEyOklzc3VlQ29tbWVudDgwMjA1ODgxOQ== | dcherian 2448579 | 2021-03-18T16:15:19Z | 2021-03-18T16:15:19Z | MEMBER | I would start by trying to fix ``` python import dask.array as da import numpy as np from xarray.tests import raise_if_dask_computes with raise_if_dask_computes(max_computes=0): ds = xr.Dataset( dict( a=("x", da.from_array(np.random.randint(0, 100, 100))), b=(("x", "y"), da.random.random((100, 10))), ) ) ds.b.sel(x=ds.a.data) ``` specifically this  Then the next issue is the multiple computes that happen when we pass a  | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | Fancy indexing a Dataset with dask DataArray triggers multiple computes 759709924 | |
| 802050621 | https://github.com/pydata/xarray/issues/4663#issuecomment-802050621 | https://api.github.com/repos/pydata/xarray/issues/4663 | MDEyOklzc3VlQ29tbWVudDgwMjA1MDYyMQ== | dcherian 2448579 | 2021-03-18T16:05:21Z | 2021-03-18T16:05:36Z | MEMBER | From @alimanfoo in #5054 I have a dataset comprising several variables. All variables are dask arrays (e.g., backed by zarr). I would like to use one of these variables, which is a 1d boolean array, to index the other variables along a large single dimension. The boolean indexing array is about ~40 million items long, with ~20 million true values. If I do this all via dask (i.e., not using xarray) then I can index one dask array with another dask array via fancy indexing. The indexing array is not loaded into memory or computed. If I need to know the shape and chunks of the resulting arrays I can call compute_chunk_sizes(), but still very little memory is required. If I do this via xarray.Dataset.isel() then a substantial amount of memory (several GB) is allocated during isel() and retained. This is problematic as in a real-world use case there are many arrays to be indexed and memory runs out on standard systems. There is a follow-on issue which is if I then want to run a computation over one of the indexed arrays, if the indexing was done via xarray then that leads to a further blow-up of multiple GB of memory usage, if using dask distributed cluster. I think the underlying issue here is that the indexing array is loaded into memory, and then gets copied multiple times when the dask graph is constructed. If using a distributed scheduler, further copies get made during scheduling of any subsequent computation. I made a notebook which illustrates the increased memory usage during Dataset.isel() here: colab.research.google.com/drive/1bn7Sj0An7TehwltWizU8j_l2OvPeoJyo?usp=sharing This is possibly the same underlying issue (and use case) as raised by @eric-czech in #4663, so feel free to close this if you think it's a duplicate. | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | Fancy indexing a Dataset with dask DataArray triggers multiple computes 759709924 | |
| 740958441 | https://github.com/pydata/xarray/issues/4663#issuecomment-740958441 | https://api.github.com/repos/pydata/xarray/issues/4663 | MDEyOklzc3VlQ29tbWVudDc0MDk1ODQ0MQ== | dcherian 2448579 | 2020-12-08T20:09:27Z | 2020-12-08T20:09:27Z | MEMBER | I think the solution is to handle this case (dask-backed DataArray) in  | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | Fancy indexing a Dataset with dask DataArray triggers multiple computes 759709924 | |
| 740941441 | https://github.com/pydata/xarray/issues/4663#issuecomment-740941441 | https://api.github.com/repos/pydata/xarray/issues/4663 | MDEyOklzc3VlQ29tbWVudDc0MDk0MTQ0MQ== | dcherian 2448579 | 2020-12-08T19:58:51Z | 2020-12-08T20:02:55Z | MEMBER | Thanks for the great example! This looks like a duplicate of https://github.com/pydata/xarray/issues/2801. If you agree, can we move the conversation there? I like using our  ``` python import dask.array as da import numpy as np from xarray.tests import raise_if_dask_computes Use a custom array type to know when data is being evaluatedclass Array(): with raise_if_dask_computes(max_computes=1): ds = xr.Dataset(dict( a=('x', da.from_array(Array(np.random.rand(100)))), b=(('x', 'y'), da.random.random((100, 10))), c=(('x', 'y'), da.random.random((100, 10))), d=(('x', 'y'), da.random.random((100, 10))), )) ds.sel(x=ds.a) ``` ```pythonRuntimeError Traceback (most recent call last) <ipython-input-76-8efd3a1c3fe5> in <module> 26 d=(('x', 'y'), da.random.random((100, 10))), 27 )) ---> 28 ds.sel(x=ds.a) /project/mrgoodbar/dcherian/python/xarray/xarray/core/dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs) 2211 self, indexers=indexers, method=method, tolerance=tolerance 2212 ) -> 2213 result = self.isel(indexers=pos_indexers, drop=drop) 2214 return result._overwrite_indexes(new_indexes) 2215 /project/mrgoodbar/dcherian/python/xarray/xarray/core/dataset.py in isel(self, indexers, drop, missing_dims, **indexers_kwargs) 2058 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel") 2059 if any(is_fancy_indexer(idx) for idx in indexers.values()): -> 2060 return self._isel_fancy(indexers, drop=drop, missing_dims=missing_dims) 2061 2062 # Much faster algorithm for when all indexers are ints, slices, one-dimensional /project/mrgoodbar/dcherian/python/xarray/xarray/core/dataset.py in _isel_fancy(self, indexers, drop, missing_dims) 2122 indexes[name] = new_index 2123 elif var_indexers: -> 2124 new_var = var.isel(indexers=var_indexers) 2125 else: 2126 new_var = var.copy(deep=False) /project/mrgoodbar/dcherian/python/xarray/xarray/core/variable.py in isel(self, indexers, missing_dims, **indexers_kwargs) 1118 1119 key = tuple(indexers.get(dim, slice(None)) for dim in self.dims) -> 1120 return self[key] 1121 1122 def squeeze(self, dim=None): /project/mrgoodbar/dcherian/python/xarray/xarray/core/variable.py in getitem(self, key)
    766         array  /project/mrgoodbar/dcherian/python/xarray/xarray/core/variable.py in _broadcast_indexes(self, key) 625 dims.append(d) 626 if len(set(dims)) == len(dims): --> 627 return self._broadcast_indexes_outer(key) 628 629 return self._broadcast_indexes_vectorized(key) /project/mrgoodbar/dcherian/python/xarray/xarray/core/variable.py in _broadcast_indexes_outer(self, key) 680 k = k.data 681 if not isinstance(k, BASIC_INDEXING_TYPES): --> 682 k = np.asarray(k) 683 if k.size == 0: 684 # Slice by empty list; numpy could not infer the dtype ~/miniconda3/envs/dcpy_old_dask/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~/miniconda3/envs/dcpy_old_dask/lib/python3.7/site-packages/dask/array/core.py in array(self, dtype, kwargs) 1374 1375 def array(self, dtype=None, kwargs): -> 1376 x = self.compute() 1377 if dtype and x.dtype != dtype: 1378 x = x.astype(dtype) ~/miniconda3/envs/dcpy_old_dask/lib/python3.7/site-packages/dask/base.py in compute(self, kwargs) 165 dask.base.compute 166 """ --> 167 (result,) = compute(self, traverse=False, kwargs) 168 return result 169 ~/miniconda3/envs/dcpy_old_dask/lib/python3.7/site-packages/dask/base.py in compute(args, kwargs) 450 postcomputes.append(x.dask_postcompute()) 451 --> 452 results = schedule(dsk, keys, kwargs) 453 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) 454 /project/mrgoodbar/dcherian/python/xarray/xarray/tests/init.py in call(self, dsk, keys, kwargs) 112 raise RuntimeError( 113 "Too many computes. Total: %d > max: %d." --> 114 % (self.total_computes, self.max_computes) 115 ) 116 return dask.get(dsk, keys, kwargs) RuntimeError: Too many computes. Total: 2 > max: 1. ``` So here it looks like we don't support indexing by dask arrays, so as we loop through the dataset the  | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | Fancy indexing a Dataset with dask DataArray triggers multiple computes 759709924 | |
| 740945474 | https://github.com/pydata/xarray/issues/4663#issuecomment-740945474 | https://api.github.com/repos/pydata/xarray/issues/4663 | MDEyOklzc3VlQ29tbWVudDc0MDk0NTQ3NA== | dcherian 2448579 | 2020-12-08T20:01:38Z | 2020-12-08T20:01:38Z | MEMBER | I commented too soon. 
 only computes once (!) so there's something else going on possibly | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | Fancy indexing a Dataset with dask DataArray triggers multiple computes 759709924 | 
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
user 1