html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4372#issuecomment-683621673,https://api.github.com/repos/pydata/xarray/issues/4372,683621673,MDEyOklzc3VlQ29tbWVudDY4MzYyMTY3Mw==,5821660,2020-08-31T07:43:34Z,2020-08-31T07:43:34Z,MEMBER,"@dcherian @shoyer In #4392 I've tried to get around this bug. I found it easier to just catch the dask ValueError's and not add more code checks. I'll add more information in that PR.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,684930038 https://github.com/pydata/xarray/issues/4372#issuecomment-682337819,https://api.github.com/repos/pydata/xarray/issues/4372,682337819,MDEyOklzc3VlQ29tbWVudDY4MjMzNzgxOQ==,5821660,2020-08-28T05:45:25Z,2020-08-28T05:45:25Z,MEMBER,"> Another questions are, why does this kwarg exist in dask and why do they not rechunk per default? Trying to answer this from looking at the dask code. - `allow_rechunk=False`: catch chunking problems in core and non-core-dimensions and raise an error. This helps to prevent users running into loading huge dask arrays into memory without further notice. - `allow_rechunk=True`: blockwise is called with `align_arrays=True` per default which means automatic rechunking for all arrays (core and non-core dimensions). Users can use this, if they are sure the system can handle possible large amounts of data. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,684930038 https://github.com/pydata/xarray/issues/4372#issuecomment-682335024,https://api.github.com/repos/pydata/xarray/issues/4372,682335024,MDEyOklzc3VlQ29tbWVudDY4MjMzNTAyNA==,5821660,2020-08-28T05:35:28Z,2020-08-28T05:35:28Z,MEMBER,"From the dask `apply_gufunc` docstring: ```python """""" allow_rechunk: Optional, bool, keyword only Allows rechunking, otherwise chunk sizes need to match and core dimensions are to consist only of one chunk. Warning: enabling this can increase memory usage significantly. Defaults to False """""" ``` Current code handling in dask: https://github.com/dask/dask/blob/42873f27ce11ce35652dda344dae5c47b742bef2/dask/array/gufunc.py#L398-L417 ```python if not allow_rechunk: chunksizes = chunksizess[dim] #### Check if core dimensions consist of only one chunk if (dim in core_shapes) and (chunksizes[0][0] < core_shapes[dim]): raise ValueError( ""Core dimension `'{}'` consists of multiple chunks. To fix, rechunk into a single \ chunk along this dimension or set `allow_rechunk=True`, but beware that this may increase memory usage \ significantly."".format( dim ) ) #### Check if loop dimensions consist of same chunksizes, when they have sizes > 1 relevant_chunksizes = list( unique(c for s, c in zip(sizes, chunksizes) if s > 1) ) if len(relevant_chunksizes) > 1: raise ValueError( ""Dimension `'{}'` with different chunksize present"".format(dim) ) ``` IIUTC, this not only rechunks non-core dimensions but also fixes core dimensions with more than one chunk. Would this be intended from the `xarray-side`? Before #4060 core dimension chunks>1 was catched and errored: ```python # core dimensions cannot span multiple chunks for axis, dim in enumerate(core_dims, start=-len(core_dims)): if len(data.chunks[axis]) != 1: raise ValueError( ""dimension {!r} on {}th function argument to "" ""apply_ufunc with dask='parallelized' consists of "" ""multiple chunks, but is also a core dimension. To "" ""fix, rechunk into a single dask array chunk along "" ""this dimension, i.e., ``.chunk({})``, but beware "" ""that this may significantly increase memory usage."".format( dim, n, {dim: -1} ) ) ``` Explicit `rechunk` was recommended to the user, though. That means setting `allow_rechunk=True` per default alone will not give us same behaviour as before #4060. I'm unsure how to proceed.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,684930038 https://github.com/pydata/xarray/issues/4372#issuecomment-682327998,https://api.github.com/repos/pydata/xarray/issues/4372,682327998,MDEyOklzc3VlQ29tbWVudDY4MjMyNzk5OA==,5821660,2020-08-28T05:09:19Z,2020-08-28T05:09:19Z,MEMBER,"@shoyer In this case: Should we warn the user, that data might be loaded into memory? Another questions are, why does this kwarg exist in dask and why do they not rechunk per default? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,684930038 https://github.com/pydata/xarray/issues/4372#issuecomment-681605223,https://api.github.com/repos/pydata/xarray/issues/4372,681605223,MDEyOklzc3VlQ29tbWVudDY4MTYwNTIyMw==,5821660,2020-08-27T06:13:03Z,2020-08-27T06:13:03Z,MEMBER,"One solution would be to catch this ValueError, issue a FutureWarning and add `allow_rechunk=True` to `dask_gufunc_kwargs` here: https://github.com/pydata/xarray/blob/9c85dd5f792805bea319f01f08ee51b83bde0f3b/xarray/core/computation.py#L646-L657 ```python def func(*arrays): import dask.array as da gufunc = functools.partial( da.apply_gufunc, numpy_func, signature.to_gufunc_string(exclude_dims), *arrays, vectorize=vectorize, output_dtypes=output_dtypes, ) try: res = gufunc(**dask_gufunc_kwargs) except ValueError as exc: if ""with different chunksize present"" in str(exc): warnings.warn( f""``allow_rechunk=True`` need to be explicitely set in the "" f""``dask_gufunc_kwargs`` parameter. Not setting will raise dask "" f""ValueError ``{str(exc)}`` in a future version."", FutureWarning, stacklevel=2, ) dask_gufunc_kwargs[""allow_rechunk""] = True res = gufunc(**dask_gufunc_kwargs) else: raise ``` I could make a PR out of this. The message wording can surely be improved. WDYT @dcherian and @shoyer? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,684930038 https://github.com/pydata/xarray/issues/4372#issuecomment-679691483,https://api.github.com/repos/pydata/xarray/issues/4372,679691483,MDEyOklzc3VlQ29tbWVudDY3OTY5MTQ4Mw==,5821660,2020-08-25T05:20:32Z,2020-08-25T05:20:32Z,MEMBER,The behaviour changed in #4060 (commit https://github.com/pydata/xarray/commit/a7fb5a9fa1a2b829181ea9e4986b959f315350dd). Please see discussion with regard to `allow_rechunk` over there. Reason to not handle/set `allow_rechunk=True` was @shoyer's [comment](https://github.com/pydata/xarray/pull/4060#issuecomment-634776667).,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,684930038