home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 732910109 and user = 20629530 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • aulemahal · 1 ✖

issue 1

  • Unexpected chunking of 3d DataArray in `polyfit()` · 1 ✖

author_association 1

  • CONTRIBUTOR 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
822566735 https://github.com/pydata/xarray/issues/4554#issuecomment-822566735 https://api.github.com/repos/pydata/xarray/issues/4554 MDEyOklzc3VlQ29tbWVudDgyMjU2NjczNQ== aulemahal 20629530 2021-04-19T15:37:30Z 2021-04-19T15:37:30Z CONTRIBUTOR

Took a look and it seems to originate from the stacking part and someting in dask.

In polyfit, we rearrange the DataArrays to 2D arrays, so we can run the least squares with np/dsa.apply_along_axis. But I checked and the chunking problem seems to appear before any call of the sort. MWE:

```python3 import xarray as xr import dask.array as dsa nz, ny, nx = (10, 20, 30) data = dsa.ones((nz, ny, nx), chunks=(1, 5, nx)) da = xr.DataArray(data, dims=['z', 'y', 'x']) da.chunks

((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (5, 5, 5, 5), (30,))

stk = da.stack(zy=['z', 'y']) print(stk.dims, stk.chunks)

('x', 'zy') ((30,), (20, 20, 20, 20, 20, 20, 20, 20, 20, 20))

Merged chunks!

```

And then I went down the rabbit hole (ok it's not that deep) and is all goes down here: https://github.com/pydata/xarray/blob/e0358e586079c12525ce60c4a51b591dc280713b/xarray/core/variable.py#L1507

In Variable._stack_one the stacking is performed and Variable.data.reshape is called. Dask itself is rechunking the output, merging the chunks. There is a merge_chunks kwarg for reshape, but I think it has a bug:

```python

Let's stack as xarray does: x, z, y -> x, zy

data_t = data.transpose(2, 0, 1) # Dask array with shape (30, 10, 20), the same as reordered in Variable._stack_once.

new_data = data_t.reshape((30, -1), merge_chunks=True) # True is the default, this is the same call as in xarray new_data.chunks

((30,), (20, 20, 20, 20, 20, 20, 20, 20, 20, 20))

new_data = data_t.reshape((30, -1), merge_chunks=False) new_data.shape # I'm printing shape because chunks is too large, but see the bug:

(30, 6000) # instead of (30, 200)!!!

Doesn't happen when we do not transpose. So let's reshape data as z, y, x -> zy, x

new_data = data.reshape((-1, 30), merge_chunks=True) new_data.chunks

((5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), (30,))

Chunks were not merged? But this is the output expected by paigem.

new_data = data.reshape((-1, 30), merge_chunks=False) new_data.chunks

((5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), (30,))

That's what I expected with merge_chunks=False.

```

For polyfit itself, the apply_along_axis call could be changed to a apply_ufunc with vectorize=True, I think this would avoid the problem and behave the same on the user's side. Would need some refactoring.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unexpected chunking of 3d DataArray in `polyfit()` 732910109

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.792ms · About: xarray-datasette