github: issue_comments: 1 row where author_association = "CONTRIBUTOR" and issue = 732910109 sorted by updated

1 row where author_association = "CONTRIBUTOR" and issue = 732910109 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	performed_via_github_app	issue
822566735	https://github.com/pydata/xarray/issues/4554#issuecomment-822566735	https://api.github.com/repos/pydata/xarray/issues/4554	MDEyOklzc3VlQ29tbWVudDgyMjU2NjczNQ==	aulemahal 20629530	2021-04-19T15:37:30Z	2021-04-19T15:37:30Z	CONTRIBUTOR	Took a look and it seems to originate from the stacking part and someting in `dask`. In `polyfit`, we rearrange the DataArrays to 2D arrays, so we can run the least squares with `np/dsa.apply_along_axis`. But I checked and the chunking problem seems to appear before any call of the sort. MWE: ```python3 import xarray as xr import dask.array as dsa nz, ny, nx = (10, 20, 30) data = dsa.ones((nz, ny, nx), chunks=(1, 5, nx)) da = xr.DataArray(data, dims=['z', 'y', 'x']) da.chunks ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (5, 5, 5, 5), (30,)) stk = da.stack(zy=['z', 'y']) print(stk.dims, stk.chunks) ('x', 'zy') ((30,), (20, 20, 20, 20, 20, 20, 20, 20, 20, 20)) Merged chunks! ``` And then I went down the rabbit hole (ok it's not that deep) and is all goes down here: https://github.com/pydata/xarray/blob/e0358e586079c12525ce60c4a51b591dc280713b/xarray/core/variable.py#L1507 In `Variable._stack_one` the stacking is performed and `Variable.data.reshape` is called. Dask itself is rechunking the output, merging the chunks. There is a `merge_chunks` kwarg for `reshape`, but I think it has a bug: ```python Let's stack as xarray does: x, z, y -> x, zy data_t = data.transpose(2, 0, 1) # Dask array with shape (30, 10, 20), the same as `reordered` in `Variable._stack_once`. new_data = data_t.reshape((30, -1), merge_chunks=True) # True is the default, this is the same call as in xarray new_data.chunks ((30,), (20, 20, 20, 20, 20, 20, 20, 20, 20, 20)) new_data = data_t.reshape((30, -1), merge_chunks=False) new_data.shape # I'm printing shape because chunks is too large, but see the bug: (30, 6000) # instead of (30, 200)!!! Doesn't happen when we do not transpose. So let's reshape data as z, y, x -> zy, x new_data = data.reshape((-1, 30), merge_chunks=True) new_data.chunks ((5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), (30,)) Chunks were not merged? But this is the output expected by paigem. new_data = data.reshape((-1, 30), merge_chunks=False) new_data.chunks ((5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), (30,)) That's what I expected with merge_chunks=False. ``` For `polyfit` itself, the `apply_along_axis` call could be changed to a `apply_ufunc` with `vectorize=True`, I think this would avoid the problem and behave the same on the user's side. Would need some refactoring.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		Unexpected chunking of 3d DataArray in `polyfit()` 732910109

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

1 row where author_association = "CONTRIBUTOR" and issue = 732910109 sorted by updated_at descending

((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (5, 5, 5, 5), (30,))

('x', 'zy') ((30,), (20, 20, 20, 20, 20, 20, 20, 20, 20, 20))

Merged chunks!

Let's stack as xarray does: x, z, y -> x, zy

((30,), (20, 20, 20, 20, 20, 20, 20, 20, 20, 20))

(30, 6000) # instead of (30, 200)!!!

Doesn't happen when we do not transpose. So let's reshape data as z, y, x -> zy, x

((5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), (30,))

Chunks were not merged? But this is the output expected by paigem.

((5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), (30,))

That's what I expected with merge_chunks=False.

Advanced export