pull_requests: 925889142
This data as json
id | node_id | number | state | locked | title | user | body | created_at | updated_at | closed_at | merged_at | merge_commit_sha | assignee | milestone | draft | head | base | author_association | auto_merge | repo | url | merged_by |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
925889142 | PR_kwDOAMm_X843L_J2 | 6566 | closed | 0 | New inline_array kwarg for open_dataset | 35968931 | Exposes the `inline_array` kwarg of [`dask.array.from_array`](https://docs.dask.org/en/stable/generated/dask.array.from_array.html) in `xr.open_dataset`, and `ds/da/variable.chunk`. What setting this to True does is inline the array into the opening/chunking task, which avoids an an extra array object at the start of the task graph. That's useful because the presence of that single common task connecting otherwise independent parts of the graph can confuse the graph optimizer. With `open_dataset(..., inline_array=False)`: <img src="https://user-images.githubusercontent.com/35968931/166312998-611dc79e-610c-44ab-b8ab-1f6f55e145d1.png" width="400"> With `open_dataset(..., inline_array=True)`: <img src="https://user-images.githubusercontent.com/35968931/166313051-1ef7ffc4-eab3-4e69-b58f-5f7befca7fd2.png" width="400"> In our case (xGCM) this is important because once inlined the optimizer understands that all the remaining parts of the graph are embarrasingly-parallel, and realizes that it can fuze all our chunk-wise padding tasks into one padding task per chunk. I think this option could help in any case where someone is opening data from a Zarr store (the reason we had this opener task) or a netCDF file. The value of the kwarg should be kept optional because in theory [inlining is a tradeoff](https://docs.dask.org/en/stable/generated/dask.array.from_array.html) between fewer tasks and more memory use, but I think there might be a case for setting the default to be True? Questions: 1) How should I test this? 2) Should it default to `False` or `True`? 3) `inline_array` or `inline`? (`inline_array` doesn't really make sense for `open_dataset`, which creates multiple arrays) - [x] Closes #1895 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` @rabernat @jbusecke | 2022-05-02T19:39:07Z | 2022-05-11T22:12:24Z | 2022-05-11T20:26:43Z | 2022-05-11T20:26:42Z | 0512da117388a451653484b4f45927ac337b596f | 0 | 102b503584da56f2c5faa59d0d508feae96fff34 | 6fbeb13105b419cb0a6646909df358d535e09faf | MEMBER | { "enabled_by": { "login": "TomNicholas", "id": 35968931, "node_id": "MDQ6VXNlcjM1OTY4OTMx", "avatar_url": "https://avatars.githubusercontent.com/u/35968931?v=4", "gravatar_id": "", "url": "https://api.github.com/users/TomNicholas", "html_url": "https://github.com/TomNicholas", "followers_url": "https://api.github.com/users/TomNicholas/followers", "following_url": "https://api.github.com/users/TomNicholas/following{/other_user}", "gists_url": "https://api.github.com/users/TomNicholas/gists{/gist_id}", "starred_url": "https://api.github.com/users/TomNicholas/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/TomNicholas/subscriptions", "organizations_url": "https://api.github.com/users/TomNicholas/orgs", "repos_url": "https://api.github.com/users/TomNicholas/repos", "events_url": "https://api.github.com/users/TomNicholas/events{/privacy}", "received_events_url": "https://api.github.com/users/TomNicholas/received_events", "type": "User", "site_admin": false }, "merge_method": "squash", "commit_title": "New inline_array kwarg for open_dataset (#6566)", "commit_message": "* added inline_array kwarg\r\n\r\n* remove cheeky print statements\r\n\r\n* Remove another rogue print statement\r\n\r\n* bump dask dependency\r\n\r\n* update multiple dependencies based on min-deps-check.py\r\n\r\n* update environment to match #6559\r\n\r\n* Update h5py in ci/requirements/min-all-deps.yml\r\n\r\n* Update ci/requirements/min-all-deps.yml\r\n\r\n* remove pynio from test env\r\n\r\n* Update ci/requirements/min-all-deps.yml\r\n\r\n* promote inline_array kwarg to be top-level kwarg\r\n\r\n* whatsnew\r\n\r\n* add test\r\n\r\n* Remove repeated docstring entry\r\n\r\nCo-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>\r\n\r\n* Remove repeated docstring entry\r\n\r\nCo-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>\r\n\r\n* hyperlink to dask functions\r\n\r\nCo-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>" } |
13221727 | https://github.com/pydata/xarray/pull/6566 |
Links from other tables
- 3 rows from pull_requests_id in labels_pull_requests