html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4554#issuecomment-822566735,https://api.github.com/repos/pydata/xarray/issues/4554,822566735,MDEyOklzc3VlQ29tbWVudDgyMjU2NjczNQ==,20629530,2021-04-19T15:37:30Z,2021-04-19T15:37:30Z,CONTRIBUTOR,"Took a look and it seems to originate from the stacking part and someting in `dask`. In `polyfit`, we rearrange the DataArrays to 2D arrays, so we can run the least squares with `np/dsa.apply_along_axis`. But I checked and the chunking problem seems to appear before any call of the sort. MWE: ```python3 import xarray as xr import dask.array as dsa nz, ny, nx = (10, 20, 30) data = dsa.ones((nz, ny, nx), chunks=(1, 5, nx)) da = xr.DataArray(data, dims=['z', 'y', 'x']) da.chunks # ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (5, 5, 5, 5), (30,)) stk = da.stack(zy=['z', 'y']) print(stk.dims, stk.chunks) # ('x', 'zy') ((30,), (20, 20, 20, 20, 20, 20, 20, 20, 20, 20)) # Merged chunks! ``` And then I went down the rabbit hole (ok it's not that deep) and is all goes down here: https://github.com/pydata/xarray/blob/e0358e586079c12525ce60c4a51b591dc280713b/xarray/core/variable.py#L1507 In `Variable._stack_one` the stacking is performed and `Variable.data.reshape` is called. Dask itself is rechunking the output, merging the chunks. There is a `merge_chunks` kwarg for `reshape`, but I think it has a bug: ```python # Let's stack as xarray does: x, z, y -> x, zy data_t = data.transpose(2, 0, 1) # Dask array with shape (30, 10, 20), the same as `reordered` in `Variable._stack_once`. new_data = data_t.reshape((30, -1), merge_chunks=True) # True is the default, this is the same call as in xarray new_data.chunks # ((30,), (20, 20, 20, 20, 20, 20, 20, 20, 20, 20)) new_data = data_t.reshape((30, -1), merge_chunks=False) new_data.shape # I'm printing shape because chunks is too large, but see the bug: # (30, 6000) # instead of (30, 200)!!! # Doesn't happen when we do not transpose. So let's reshape data as z, y, x -> zy, x new_data = data.reshape((-1, 30), merge_chunks=True) new_data.chunks # ((5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), (30,)) # Chunks were not merged? But this is the output expected by paigem. new_data = data.reshape((-1, 30), merge_chunks=False) new_data.chunks # ((5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), (30,)) # That's what I expected with merge_chunks=False. ``` For `polyfit` itself, the `apply_along_axis` call could be changed to a `apply_ufunc` with `vectorize=True`, I think this would avoid the problem and behave the same on the user's side. Would need some refactoring. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,732910109