issue_comments
6 rows where issue = 627600168 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Unexpected chunking behavior when using `xr.align` with `join='outer'` · 6 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
704530619 | https://github.com/pydata/xarray/issues/4112#issuecomment-704530619 | https://api.github.com/repos/pydata/xarray/issues/4112 | MDEyOklzc3VlQ29tbWVudDcwNDUzMDYxOQ== | jbusecke 14314623 | 2020-10-06T20:20:34Z | 2020-10-06T20:20:34Z | CONTRIBUTOR | Just tried this with the newest dask version and can confirm that I do not get huge chunks anymore IF i specify short_time = xr.cftime_range('2000', periods=12) long_time = xr.cftime_range('2000', periods=120) data_short = np.random.rand(len(short_time)) data_long = np.random.rand(len(long_time)) n=1000 a = xr.DataArray(data_short, dims=['time'], coords={'time':short_time}).expand_dims(a=n, b=n).chunk({'time':3}) b = xr.DataArray(data_long, dims=['time'], coords={'time':long_time}).expand_dims(a=n, b=n).chunk({'time':3}) a,b = xr.align(a,b, join = 'outer')
with the defaults, I still get one giant chunk. Ill try this soon in a real world scenario described above. Just wanted to report back here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Unexpected chunking behavior when using `xr.align` with `join='outer'` 627600168 | |
643513541 | https://github.com/pydata/xarray/issues/4112#issuecomment-643513541 | https://api.github.com/repos/pydata/xarray/issues/4112 | MDEyOklzc3VlQ29tbWVudDY0MzUxMzU0MQ== | dcherian 2448579 | 2020-06-12T22:55:12Z | 2020-06-12T22:55:12Z | MEMBER |
This is Tom's proposed solution in https://github.com/dask/dask/issues/6270 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Unexpected chunking behavior when using `xr.align` with `join='outer'` 627600168 | |
643512625 | https://github.com/pydata/xarray/issues/4112#issuecomment-643512625 | https://api.github.com/repos/pydata/xarray/issues/4112 | MDEyOklzc3VlQ29tbWVudDY0MzUxMjYyNQ== | shoyer 1217238 | 2020-06-12T22:50:57Z | 2020-06-12T22:50:57Z | MEMBER | The problem with chunking indexers is that then dask doesn't have any visibility into the indexing values, which means the graph now grows like the square of the number of chunks along an axis, instead of proportional to the number of chunks. The real operation that xarray needs here is The padded portion of the array is used in indexing, but only so the result is aligned for I don't know the best way to handle this. One option might be to rewrite Dask's indexing functionality to "split" chunks that are much larger than their inputs into smaller pieces, even if they all come from the same input chunk? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Unexpected chunking behavior when using `xr.align` with `join='outer'` 627600168 | |
643346497 | https://github.com/pydata/xarray/issues/4112#issuecomment-643346497 | https://api.github.com/repos/pydata/xarray/issues/4112 | MDEyOklzc3VlQ29tbWVudDY0MzM0NjQ5Nw== | dcherian 2448579 | 2020-06-12T15:51:31Z | 2020-06-12T15:52:58Z | MEMBER | Thanks @TomAugspurger I think an upstream dask solution would be useful. xarray automatic aligns objects everywhere and this alignment is what is blowing things up. For this reason I think xarray should explicitly chunk the indexer when aligning. We could use a reasonable chunk size like median chunk size of dataarray along that axis — this would respect the user's chunksize choices. @shoyer What do you think? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Unexpected chunking behavior when using `xr.align` with `join='outer'` 627600168 | |
636808986 | https://github.com/pydata/xarray/issues/4112#issuecomment-636808986 | https://api.github.com/repos/pydata/xarray/issues/4112 | MDEyOklzc3VlQ29tbWVudDYzNjgwODk4Ng== | TomAugspurger 1312546 | 2020-06-01T11:44:23Z | 2020-06-01T11:44:23Z | MEMBER | Rechunking the |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Unexpected chunking behavior when using `xr.align` with `join='outer'` 627600168 | |
636334010 | https://github.com/pydata/xarray/issues/4112#issuecomment-636334010 | https://api.github.com/repos/pydata/xarray/issues/4112 | MDEyOklzc3VlQ29tbWVudDYzNjMzNDAxMA== | dcherian 2448579 | 2020-05-30T13:52:33Z | 2020-05-30T13:53:31Z | MEMBER | Great diagnosis @jbusecke . Ultimately this comes down to dask indexing ``` python import dask.array arr = dask.array.from_array([0, 1, 2, 3], chunks=(1,)) print(arr.chunks) # ((1, 1, 1, 1),) align calls reindex which indexes with something like thisindexer = [0, 1, 2, 3, ] + [-1,] * 111 print(arr[indexer].chunks) # ((1, 1, 1, 112),) maybe something like this is a solutionlazy_indexer = dask.array.from_array(indexer, chunks=arr.chunks[0][0], name="idx") print(arr[lazy_indexer].chunks) # ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),) ``` cc @TomAugspurger, the issue here is that big |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Unexpected chunking behavior when using `xr.align` with `join='outer'` 627600168 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4