html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/4554#issuecomment-822566735,https://api.github.com/repos/pydata/xarray/issues/4554,822566735,MDEyOklzc3VlQ29tbWVudDgyMjU2NjczNQ==,20629530,2021-04-19T15:37:30Z,2021-04-19T15:37:30Z,CONTRIBUTOR,"Took a look and it seems to originate from the stacking part and someting in `dask`.
In `polyfit`, we rearrange the DataArrays to 2D arrays, so we can run the least squares with `np/dsa.apply_along_axis`. But I checked and the chunking problem seems to appear before any call of the sort. MWE:
```python3
import xarray as xr
import dask.array as dsa
nz, ny, nx = (10, 20, 30)
data = dsa.ones((nz, ny, nx), chunks=(1, 5, nx))
da = xr.DataArray(data, dims=['z', 'y', 'x'])
da.chunks
# ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (5, 5, 5, 5), (30,))
stk = da.stack(zy=['z', 'y'])
print(stk.dims, stk.chunks)
# ('x', 'zy') ((30,), (20, 20, 20, 20, 20, 20, 20, 20, 20, 20))
# Merged chunks!
```
And then I went down the rabbit hole (ok it's not that deep) and is all goes down here:
https://github.com/pydata/xarray/blob/e0358e586079c12525ce60c4a51b591dc280713b/xarray/core/variable.py#L1507
In `Variable._stack_one` the stacking is performed and `Variable.data.reshape` is called. Dask itself is rechunking the output, merging the chunks. There is a `merge_chunks` kwarg for `reshape`, but I think it has a bug:
```python
# Let's stack as xarray does: x, z, y -> x, zy
data_t = data.transpose(2, 0, 1) # Dask array with shape (30, 10, 20), the same as `reordered` in `Variable._stack_once`.
new_data = data_t.reshape((30, -1), merge_chunks=True) # True is the default, this is the same call as in xarray
new_data.chunks
# ((30,), (20, 20, 20, 20, 20, 20, 20, 20, 20, 20))
new_data = data_t.reshape((30, -1), merge_chunks=False)
new_data.shape # I'm printing shape because chunks is too large, but see the bug:
# (30, 6000) # instead of (30, 200)!!!
# Doesn't happen when we do not transpose. So let's reshape data as z, y, x -> zy, x
new_data = data.reshape((-1, 30), merge_chunks=True)
new_data.chunks
# ((5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), (30,))
# Chunks were not merged? But this is the output expected by paigem.
new_data = data.reshape((-1, 30), merge_chunks=False)
new_data.chunks
# ((5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), (30,))
# That's what I expected with merge_chunks=False.
```
For `polyfit` itself, the `apply_along_axis` call could be changed to a `apply_ufunc` with `vectorize=True`, I think this would avoid the problem and behave the same on the user's side. Would need some refactoring.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,732910109