html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/7019#issuecomment-1553453937,https://api.github.com/repos/pydata/xarray/issues/7019,1553453937,IC_kwDOAMm_X85cl9Nx,2443309,2023-05-18T18:27:52Z,2023-05-18T18:27:52Z,MEMBER,👏 Congrats @TomNicholas on getting this in! Such an important contribution. 👏 ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1553395594,https://api.github.com/repos/pydata/xarray/issues/7019,1553395594,IC_kwDOAMm_X85clu-K,35968931,2023-05-18T17:37:22Z,2023-05-18T17:37:22Z,MEMBER,Woooo thanks @dcherian ! ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1553390072,https://api.github.com/repos/pydata/xarray/issues/7019,1553390072,IC_kwDOAMm_X85cltn4,2448579,2023-05-18T17:34:01Z,2023-05-18T17:34:01Z,MEMBER,Thanks @TomNicholas Big change!,"{""total_count"": 3, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 3, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1550564976,https://api.github.com/repos/pydata/xarray/issues/7019,1550564976,IC_kwDOAMm_X85ca75w,35968931,2023-05-17T01:39:08Z,2023-05-17T01:39:08Z,MEMBER,"@Illviljan thanks for all your comments! Would you (or @keewis?) be willing to approve this PR now? I would really like to merge this so that I can release a version of xarray that I can use as a dependency for [cubed-xarray](https://github.com/xarray-contrib/cubed-xarray).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1534087275,https://api.github.com/repos/pydata/xarray/issues/7019,1534087275,IC_kwDOAMm_X85bcFBr,35968931,2023-05-04T04:41:22Z,2023-05-04T04:41:22Z,MEMBER,"(Okay now the failures are from https://github.com/pydata/xarray/pull/7815 which I've separated out, and from https://github.com/pydata/xarray/pull/7561 being recently merged into main which is definitely not my fault :sweat_smile: https://github.com/pydata/xarray/pull/7019/commits/316c63d55f4e2c317b028842f752a40596f16c6d shows that this PR passes the tests by itself.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1531992793,https://api.github.com/repos/pydata/xarray/issues/7019,1531992793,IC_kwDOAMm_X85bUFrZ,35968931,2023-05-02T18:58:23Z,2023-05-02T19:01:20Z,MEMBER,"I would like to merge this now please! It works, it passes the tests, including mypy. The main feature not in this PR is using `parallel=True` with `open_mfdataset`, which is still coupled to `dask.delayed` - I made #7811 to track that so I could get this PR merged. If we merge this I can start properly testing cubed with xarray (in [cubed-xarray](https://github.com/xarray-contrib/cubed-xarray)). @shoyer @dcherian if one of you could merge this or otherwise tell me anything else you think is still required!","{""total_count"": 4, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 2, ""rocket"": 2, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1502378655,https://api.github.com/repos/pydata/xarray/issues/7019,1502378655,IC_kwDOAMm_X85ZjHqf,2448579,2023-04-10T21:57:04Z,2023-04-10T21:57:04Z,MEMBER,"> We could still achieve the goal of running cubed without dask by making normalize_chunks the responsibility of the chunkmanager Seems OK to me. The other option is to xfail the broken tests on old dask versions","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1499791533,https://api.github.com/repos/pydata/xarray/issues/7019,1499791533,IC_kwDOAMm_X85ZZQCt,35968931,2023-04-07T00:32:47Z,2023-04-07T00:59:03Z,MEMBER,"> I'm having problems with ensuring the behaviour of the chunks='auto' option is consistent between `.chunk` and `open_dataset` Update on this rabbit hole: [This commit to dask](https://github.com/dask/dask/pull/9507/commits/196c2a43fb0992ff288929c4f994e765394fffb7) changed the behaviour of dask's auto-chunking logic, such that if I run my little test script `test_old_get_chunk.py` on dask releases before and after that commit I get different chunking patterns:
```python from xarray.core.variable import IndexVariable from dask.array.core import normalize_chunks # import the import itertools from numbers import Number import dask import dask.array as da import xarray as xr import numpy as np # This function is copied from xarray, but calls dask.array.core.normalize_chunks # It is used in open_dataset, but not in Dataset.chunk def _get_chunk(var, chunks): """""" Return map from each dim to chunk sizes, accounting for backend's preferred chunks. """""" if isinstance(var, IndexVariable): return {} dims = var.dims shape = var.shape # Determine the explicit requested chunks. preferred_chunks = var.encoding.get(""preferred_chunks"", {}) preferred_chunk_shape = tuple( preferred_chunks.get(dim, size) for dim, size in zip(dims, shape) ) if isinstance(chunks, Number) or (chunks == ""auto""): chunks = dict.fromkeys(dims, chunks) chunk_shape = tuple( chunks.get(dim, None) or preferred_chunk_sizes for dim, preferred_chunk_sizes in zip(dims, preferred_chunk_shape) ) chunk_shape = normalize_chunks( chunk_shape, shape=shape, dtype=var.dtype, previous_chunks=preferred_chunk_shape ) # Warn where requested chunks break preferred chunks, provided that the variable # contains data. if var.size: for dim, size, chunk_sizes in zip(dims, shape, chunk_shape): try: preferred_chunk_sizes = preferred_chunks[dim] except KeyError: continue # Determine the stop indices of the preferred chunks, but omit the last stop # (equal to the dim size). In particular, assume that when a sequence # expresses the preferred chunks, the sequence sums to the size. preferred_stops = ( range(preferred_chunk_sizes, size, preferred_chunk_sizes) if isinstance(preferred_chunk_sizes, Number) else itertools.accumulate(preferred_chunk_sizes[:-1]) ) # Gather any stop indices of the specified chunks that are not a stop index # of a preferred chunk. Again, omit the last stop, assuming that it equals # the dim size. breaks = set(itertools.accumulate(chunk_sizes[:-1])).difference( preferred_stops ) if breaks: warnings.warn( ""The specified Dask chunks separate the stored chunks along "" f'dimension ""{dim}"" starting at index {min(breaks)}. This could ' ""degrade performance. Instead, consider rechunking after loading."" ) return dict(zip(dims, chunk_shape)) chunks = 'auto' encoded_chunks = 100 dask_arr = da.from_array( np.ones((500, 500), dtype=""float64""), chunks=encoded_chunks ) var = xr.core.variable.Variable(data=dask_arr, dims=['x', 'y']) with dask.config.set({""array.chunk-size"": ""1MiB""}): chunks_suggested = _get_chunk(var, chunks) print(chunks_suggested) ```
```python (cubed) tom@tom-XPS-9315:~/Documents/Work/Code/dask$ git checkout 2022.9.2 Previous HEAD position was 7fe622b44 Add docs on running Dask in a standalone Python script (#9513) HEAD is now at 3ef47422b bump version to 2022.9.2 (cubed) tom@tom-XPS-9315:~/Documents/Work/Code/dask$ python ../experimentation/bugs/auto_chunking/test_old_get_chunk.py {'x': (362, 138), 'y': (362, 138)} (cubed) tom@tom-XPS-9315:~/Documents/Work/Code/dask$ git checkout 2022.9.1 Previous HEAD position was 3ef47422b bump version to 2022.9.2 HEAD is now at b944abf68 bump version to 2022.9.1 (cubed) tom@tom-XPS-9315:~/Documents/Work/Code/dask$ python ../experimentation/bugs/auto_chunking/test_old_get_chunk.py {'x': (250, 250), 'y': (250, 250)} ``` (I was absolutely tearing my hair out trying to find this bug, because after the change `normalize_chunks` became a pure function, but before the change it actually wasn't, so I was trying calling `normalize_chunks` with the exact same set of input arguments and was still not able to reproduce the bug :angry: ) Anyway what this means is as this PR vendors `dask.array.core.normalize_chunks`, but the behaviour of `dask.array.core.normalize_chunks` changed between the version in CI job `min-all-deps` and the other CI jobs, the single vendored function cannot possibly match both behaviours. I think one simple way to fix this failure without should be to upgrade the minimum version of dask to >=2022.9.2 (from 2022.1.1 where it currently is). EDIT: I tried changing the minimum version of dask-core in `min-all-deps.yml` but the conda solve failed. But also would updating to `2022.9.2` now violate [xarray's minimum dependency versions policy](https://docs.xarray.dev/en/stable/getting-started-guide/installing.html#minimum-dependency-versions)? EDIT2: Another way to fix this should be to un-vendor `dask.array.core.normalize_chunks` within xarray. We could still achieve the goal of running cubed without dask by making `normalize_chunks` the responsibility of the `chunkmanager` instead, as cubed's vendored version of that function is not subject to xarray's minimum dependencies requirement.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1499432372,https://api.github.com/repos/pydata/xarray/issues/7019,1499432372,IC_kwDOAMm_X85ZX4W0,35968931,2023-04-06T18:03:48Z,2023-04-06T18:07:24Z,MEMBER,"I'm having problems with ensuring the behaviour of the `chunks='auto'` option is consistent between `.chunk` and `open_dataset`. These problems appeared since vendoring `dask.array.core.normalize_chunks`. Right now the only failing tests use `chunks='auto'` (e.g `xarray/tests/test_backends.py::test_chunking_consintency[auto]` - yes there's a typo in that test's name), and they fail because xarray decides on different sizes for the automatically-chosen chunks. What's weird is that all tests pass for me locally but these failures occur on just some of the CI jobs (and which CI jobs is not even consistent apparently???). I have no idea why this would behave differently on only some of the CI jobs, especially after double-checking that `array-chunk-size` is being correctly determined from the dask config variable [within `normalize_chunks`](https://github.com/pydata/xarray/pull/7019/files#r1160051774).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1483155153,https://api.github.com/repos/pydata/xarray/issues/7019,1483155153,IC_kwDOAMm_X85YZybR,35968931,2023-03-24T17:19:44Z,2023-03-24T17:21:32Z,MEMBER,"I've made a bare-bones [cubed-xarray package](https://github.com/xarray-contrib/cubed-xarray) to store the [`CubedManager` class](https://github.com/xarray-contrib/cubed-xarray/blob/main/cubed_xarray/cubedmanager.py), as well as any integration tests (yet to be written). @tomwhite you should have an invitation to be an owner of that repo. It uses the entrypoint exposed in this PR to hook in, and seems to work for me locally :grin: ","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 1, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1481626515,https://api.github.com/repos/pydata/xarray/issues/7019,1481626515,IC_kwDOAMm_X85YT9OT,35968931,2023-03-23T17:47:35Z,2023-03-23T17:47:51Z,MEMBER,"Thanks for the review @dcherian! I agree with basically everything you wrote. The main difficulty I have at this point is [non-reproducible failures as described here](https://github.com/pydata/xarray/pull/7019#discussion_r1146569098)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1481504146,https://api.github.com/repos/pydata/xarray/issues/7019,1481504146,IC_kwDOAMm_X85YTfWS,35968931,2023-03-23T16:26:53Z,2023-03-23T17:36:41Z,MEMBER,"> > I would like to get to the point where you can use xarray with a chunked array without ever importing dask. I think this PR gets very close, but that would be tricky to test because cubed depends on dask (so I can't just run the test suite without dask in the environment > I just released Cubed 0.6.0 which doesn't have a dependency on Dask, so this should be possible now. Actually testing cubed with xarray in an environment without dask is currently blocked by rechunker's explicitly dependency on dask, see https://github.com/pangeo-data/rechunker/issues/139 EDIT: We can hack around this by pip installing cubed, then pip uninstalling dask [as mentioned here](https://github.com/tomwhite/cubed/issues/158#issuecomment-1481523207)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1478328995,https://api.github.com/repos/pydata/xarray/issues/7019,1478328995,IC_kwDOAMm_X85YHYKj,35968931,2023-03-21T17:40:36Z,2023-03-21T17:40:36Z,MEMBER,"> Does this mean my comment https://github.com/pydata/xarray/pull/7019#discussion_r970713341 is valid again? Yes I think it does @headtr1ck - thanks for the reminder about that. I now want to finish this PR by exposing the ""chunk manager"" interface as a new entrypoint, copying the pattern used for xarray's backends. That would allow me to move the cubed-specific `CubedManager` code into a separate repository, have the choice of chunkmanager default to whatever is installed, but ask explicitly what to do if multiple chunkmanagers are installed. That should address your comment @headtr1ck.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1472766481,https://api.github.com/repos/pydata/xarray/issues/7019,1472766481,IC_kwDOAMm_X85XyKIR,35968931,2023-03-16T21:26:36Z,2023-03-16T21:26:36Z,MEMBER,"Thanks @dcherian ! Once I copied that explicit indexer business I was able to get serialization to and from zarr working with cubed! ```python In [1]: import xarray as xr In [2]: from cubed import Spec In [3]: ds = xr.open_dataset( ...: 'airtemps.zarr', ...: chunks={}, ...: from_array_kwargs={ ...: 'manager': 'cubed', ...: 'spec': Spec(work_dir=""tmp"", max_mem=20e6), ...: } ...: ) /home/tom/Documents/Work/Code/xarray/xarray/backends/plugins.py:139: RuntimeWarning: 'netcdf4' fails while guessing warnings.warn(f""{engine!r} fails while guessing"", RuntimeWarning) /home/tom/Documents/Work/Code/xarray/xarray/backends/plugins.py:139: RuntimeWarning: 'scipy' fails while guessing warnings.warn(f""{engine!r} fails while guessing"", RuntimeWarning) In [4]: ds['air'] Out[4]: cubed.Array Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Attributes: GRIB_id: 11 ... In [5]: ds.isel(time=slice(100, 300)).to_zarr(""cubed_subset.zarr"") /home/tom/Documents/Work/Code/xarray/xarray/core/dataset.py:2118: SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs return to_zarr( # type: ignore Out[5]: ```","{""total_count"": 8, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 6, ""rocket"": 0, ""eyes"": 2}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1469955680,https://api.github.com/repos/pydata/xarray/issues/7019,1469955680,IC_kwDOAMm_X85Xnb5g,2448579,2023-03-15T12:54:09Z,2023-03-15T12:54:09Z,MEMBER,"> dask_array_ops.rolling - uses functions from dask.array.overlap This is unused, we use `sliding_window_view` now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1469050607,https://api.github.com/repos/pydata/xarray/issues/7019,1469050607,IC_kwDOAMm_X85Xj-7v,35968931,2023-03-15T00:30:08Z,2023-03-15T10:03:11Z,MEMBER,"I tried opening a zarr store into xarray with chunking via cubed, but I got an error inside the indexing adapter classes. Somehow the type is completely wrong - would be good to type hint this part of the code, because this happens despite mypy passing now. ```python # create example zarr store orig = xr.tutorial.open_dataset(""air_temperature"") orig.to_zarr('air2.zarr') # open it as a cubed array ds = xr.open_dataset('air2.zarr', engine='zarr', chunks={}, from_array_kwargs={'manager': 'cubed'}) # fails at this point ds.load() ```
```python --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/tenacity/__init__.py:382, in Retrying.__call__(self, fn, *args, **kwargs) 381 try: --> 382 result = fn(*args, **kwargs) 383 except BaseException: # noqa: B902 File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/runtime/executors/python.py:10, in exec_stage_func(func, *args, **kwargs) 8 @retry(stop=stop_after_attempt(3)) 9 def exec_stage_func(func, *args, **kwargs): ---> 10 return func(*args, **kwargs) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/primitive/blockwise.py:66, in apply_blockwise(out_key, config) 64 args.append(arg) ---> 66 result = config.function(*args) 67 if isinstance(result, dict): # structured array with named fields File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/core/ops.py:439, in map_blocks..func_with_block_id..wrap(*a, **kw) 438 block_id = offset_to_block_id(a[-1].item()) --> 439 return func(*a[:-1], block_id=block_id, **kw) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/core/ops.py:572, in map_direct..new_func..wrap(block_id, *a, **kw) 571 args = a + arrays --> 572 return func(*args, block_id=block_id, **kw) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/core/ops.py:76, in _from_array(e, x, outchunks, asarray, block_id) 75 def _from_array(e, x, outchunks=None, asarray=None, block_id=None): ---> 76 out = x[get_item(outchunks, block_id)] 77 if asarray: File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:627, in CopyOnWriteArray.__getitem__(self, key) 626 def __getitem__(self, key): --> 627 return type(self)(_wrap_numpy_scalars(self.array[key])) File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:534, in LazilyIndexedArray.__getitem__(self, indexer) 533 return array[indexer] --> 534 return type(self)(self.array, self._updated_key(indexer)) File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:500, in LazilyIndexedArray._updated_key(self, new_key) 499 def _updated_key(self, new_key): --> 500 iter_new_key = iter(expanded_indexer(new_key.tuple, self.ndim)) 501 full_key = [] AttributeError: 'tuple' object has no attribute 'tuple' The above exception was the direct cause of the following exception: RetryError Traceback (most recent call last) Cell In[69], line 1 ----> 1 ds.load() File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:761, in Dataset.load(self, **kwargs) 758 chunkmanager = get_chunked_array_type(*lazy_data.values()) 760 # evaluate all the chunked arrays simultaneously --> 761 evaluated_data = chunkmanager.compute(*lazy_data.values(), **kwargs) 763 for k, data in zip(lazy_data, evaluated_data): 764 self.variables[k].data = data File ~/Documents/Work/Code/xarray/xarray/core/parallelcompat.py:451, in CubedManager.compute(self, *data, **kwargs) 448 def compute(self, *data: ""CubedArray"", **kwargs) -> np.ndarray: 449 from cubed import compute --> 451 return compute(*data, **kwargs) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/core/array.py:300, in compute(executor, callbacks, optimize_graph, *arrays, **kwargs) 297 executor = PythonDagExecutor() 299 _return_in_memory_array = kwargs.pop(""_return_in_memory_array"", True) --> 300 plan.execute( 301 executor=executor, 302 callbacks=callbacks, 303 optimize_graph=optimize_graph, 304 array_names=[a.name for a in arrays], 305 **kwargs, 306 ) 308 if _return_in_memory_array: 309 return tuple(a._read_stored() for a in arrays) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/core/plan.py:154, in Plan.execute(self, executor, callbacks, optimize_graph, array_names, **kwargs) 152 if callbacks is not None: 153 [callback.on_compute_start(dag) for callback in callbacks] --> 154 executor.execute_dag( 155 dag, callbacks=callbacks, array_names=array_names, **kwargs 156 ) 157 if callbacks is not None: 158 [callback.on_compute_end(dag) for callback in callbacks] File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/runtime/executors/python.py:22, in PythonDagExecutor.execute_dag(self, dag, callbacks, array_names, **kwargs) 20 if stage.mappable is not None: 21 for m in stage.mappable: ---> 22 exec_stage_func(stage.function, m, config=pipeline.config) 23 if callbacks is not None: 24 event = TaskEndEvent(array_name=name) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/tenacity/__init__.py:289, in BaseRetrying.wraps..wrapped_f(*args, **kw) 287 @functools.wraps(f) 288 def wrapped_f(*args: t.Any, **kw: t.Any) -> t.Any: --> 289 return self(f, *args, **kw) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/tenacity/__init__.py:379, in Retrying.__call__(self, fn, *args, **kwargs) 377 retry_state = RetryCallState(retry_object=self, fn=fn, args=args, kwargs=kwargs) 378 while True: --> 379 do = self.iter(retry_state=retry_state) 380 if isinstance(do, DoAttempt): 381 try: File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/tenacity/__init__.py:326, in BaseRetrying.iter(self, retry_state) 324 if self.reraise: 325 raise retry_exc.reraise() --> 326 raise retry_exc from fut.exception() 328 if self.wait: 329 sleep = self.wait(retry_state) RetryError: RetryError[ ```
This still works fine for dask.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1469232494,https://api.github.com/repos/pydata/xarray/issues/7019,1469232494,IC_kwDOAMm_X85XkrVu,2448579,2023-03-15T02:54:59Z,2023-03-15T02:54:59Z,MEMBER,"> AttributeError: 'tuple' object has no attribute 'tuple' This means you're indexing a `LazilyIndexedArray` with a `tuple` but you need to wrap that tuple in an `Indexer` object llke `BasicIndexer`, `VectorizedIndexer` etc..","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1468732253,https://api.github.com/repos/pydata/xarray/issues/7019,1468732253,IC_kwDOAMm_X85XixNd,35968931,2023-03-14T19:52:38Z,2023-03-14T19:52:38Z,MEMBER,"Thanks @tomwhite - I think it might make sense for me to remove the `CubedManager` class from this PR and instead put that & cubed+xarray tests into another repo. That keeps xarray's changes minimal, doesn't require putting cubed in any xarray CI envs, and hopefully allows us to merge the `ChunkManager` changes here earlier. --- **Places `dask` is still explicitly imported in xarray** There are a few remaining places where I haven't generalised to remove specific `import dask` calls either because it won't be imported at runtime unless you ask for it, cubed doesn't implement the equivalent function, that function isn't in the array API standard, or because I'm not sure if the dask concept used generalises to other parallel frameworks. - [ ] `open_mfdataset(..., parallel=True)` - there is no `cubed.delayed` to wrap the `open_dataset` calls in, - [ ] `Dataset.__dask_graph__` and all the other similar dask magic methods - [ ] `dask_array_ops.rolling` - uses functions from `dask.array.overlap`, - [ ] `dask_array_ops.least_squares` - uses `dask.array.apply_along_axis` and `dask.array.linalg.lstsq`, - [ ] `dask_array_ops.push` - uses `dask.array.reductions.cumreduction` I would like to get to the point where you can use xarray with a chunked array **without ever importing dask**. I think this PR gets very close, but that would be tricky to test because cubed depends on dask (so I can't just run the test suite without dask in the environment), and there are not yet any other parallel _chunk-aware_ frameworks I know of (ramba and arkouda don't have a chunks attribute so wouldn't require this PR).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1460980509,https://api.github.com/repos/pydata/xarray/issues/7019,1460980509,IC_kwDOAMm_X85XFMsd,35968931,2023-03-08T22:37:21Z,2023-03-08T22:39:46Z,MEMBER,"I'm making progress with this PR, and now that @tomwhite implemented `cubed.apply_gufunc` I've re-routed `xarray.apply_ufunc` to use whatever version of `apply_gufunc` is defined by the chosen `ChunkManager`. This means many basic operations should now just work: ```python In [1]: import xarray as xr In [2]: da = xr.DataArray([1, 2, 3], dims='x') In [3]: da_chunked = da.chunk(from_array_kwargs={'manager': 'cubed'}) In [4]: da_chunked Out[4]: cubed.Array Dimensions without coordinates: x In [5]: da_chunked.mean() Out[5]: cubed.Array In [6]: da_chunked.mean().compute() [cubed.Array] Out[6]: array(2) ``` (You need to install both `cubed>0.5.0` and the `main` branch of `rechunker` for this to work.) I still have a fair bit more to do on this PR (see checklist at top), but for testing should I: 1. Start making a `test_cubed.py` file in xarray as part of this PR with bespoke tests, 2. Put bespoke tests for xarray wrapping cubed somewhere else (e.g. the cubed repo or a new `cubed-xarray` repo), 3. Merge this PR without cubed-specific tests and concentrate on finishing the general duck-array testing framework in #6908 so we can implement (b) in the way we actually eventually want things to work for 3rd-party duck array libraries? I would prefer not to have this PR grow to be thousands of lines by including tests in it, but also waiting for #6908 might take a while because that's also a fairly ambitious PR. The fact that the tests are currently green for this PR (ignoring some mypy stuff) is evidence that **the decoupling of dask from xarray is working so far**. (I have already added some tests for the ability to register custom `ChunkManagers` though.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1400867666,https://api.github.com/repos/pydata/xarray/issues/7019,1400867666,IC_kwDOAMm_X85Tf4tS,2448579,2023-01-23T19:29:06Z,2023-01-23T19:29:06Z,MEMBER,">ramba doesn't actually require explicit chunks to work I suspect Arkouda is similar in that this is not a detail the user is expected to worry about. (cc @sdbachman)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1400848135,https://api.github.com/repos/pydata/xarray/issues/7019,1400848135,IC_kwDOAMm_X85Tfz8H,35968931,2023-01-23T19:14:26Z,2023-01-23T19:14:26Z,MEMBER,"@drtodd13 mentioned today that ramba doesn't actually require explicit chunks to work, which I hadn't realised. So forcing wrapped libraries to implement an explicit chunks method might be too restrictive. Ramba could possibly work entirely through the numpy array API standard.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1400846131,https://api.github.com/repos/pydata/xarray/issues/7019,1400846131,IC_kwDOAMm_X85Tfzcz,35968931,2023-01-23T19:12:57Z,2023-01-23T19:12:57Z,MEMBER,@drtodd13 tagging you here and [linking my notes](https://gist.github.com/TomNicholas/4b31805c8a8810dbe7d2566bc38ce1be) from today's [distributed arrays working group meeting](https://discourse.pangeo.io/t/new-working-group-for-distributed-array-computing/2734/20) for the links and references to this PR.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629 https://github.com/pydata/xarray/pull/7019#issuecomment-1255310245,https://api.github.com/repos/pydata/xarray/issues/7019,1255310245,IC_kwDOAMm_X85K0oOl,35968931,2022-09-22T17:06:42Z,2022-09-22T17:06:42Z,MEMBER,"> This was more abstract than expected Yeah I was kind of asking whether this was unnecessarily abstract, and if there was a simpler design that achieved the same flexibility.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368740629