issues: 1223270563
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1223270563 | PR_kwDOAMm_X843L_J2 | 6566 | New inline_array kwarg for open_dataset | 35968931 | closed | 0 | 11 | 2022-05-02T19:39:07Z | 2022-05-11T22:12:24Z | 2022-05-11T20:26:43Z | MEMBER | 0 | pydata/xarray/pulls/6566 | Exposes the What setting this to True does is inline the array into the opening/chunking task, which avoids an an extra array object at the start of the task graph. That's useful because the presence of that single common task connecting otherwise independent parts of the graph can confuse the graph optimizer. With With In our case (xGCM) this is important because once inlined the optimizer understands that all the remaining parts of the graph are embarrasingly-parallel, and realizes that it can fuze all our chunk-wise padding tasks into one padding task per chunk. I think this option could help in any case where someone is opening data from a Zarr store (the reason we had this opener task) or a netCDF file. The value of the kwarg should be kept optional because in theory inlining is a tradeoff between fewer tasks and more memory use, but I think there might be a case for setting the default to be True? Questions:
1) How should I test this?
2) Should it default to
@rabernat @jbusecke |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6566/reactions", "total_count": 3, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 2, "eyes": 0 } |
13221727 | pull |