home / github / pull_requests

Menu
  • GraphQL API
  • Search all tables

pull_requests: 925889142

This data as json

id node_id number state locked title user body created_at updated_at closed_at merged_at merge_commit_sha assignee milestone draft head base author_association auto_merge repo url merged_by
925889142 PR_kwDOAMm_X843L_J2 6566 closed 0 New inline_array kwarg for open_dataset 35968931 Exposes the `inline_array` kwarg of [`dask.array.from_array`](https://docs.dask.org/en/stable/generated/dask.array.from_array.html) in `xr.open_dataset`, and `ds/da/variable.chunk`. What setting this to True does is inline the array into the opening/chunking task, which avoids an an extra array object at the start of the task graph. That's useful because the presence of that single common task connecting otherwise independent parts of the graph can confuse the graph optimizer. With `open_dataset(..., inline_array=False)`: <img src="https://user-images.githubusercontent.com/35968931/166312998-611dc79e-610c-44ab-b8ab-1f6f55e145d1.png" width="400"> With `open_dataset(..., inline_array=True)`: <img src="https://user-images.githubusercontent.com/35968931/166313051-1ef7ffc4-eab3-4e69-b58f-5f7befca7fd2.png" width="400"> In our case (xGCM) this is important because once inlined the optimizer understands that all the remaining parts of the graph are embarrasingly-parallel, and realizes that it can fuze all our chunk-wise padding tasks into one padding task per chunk. I think this option could help in any case where someone is opening data from a Zarr store (the reason we had this opener task) or a netCDF file. The value of the kwarg should be kept optional because in theory [inlining is a tradeoff](https://docs.dask.org/en/stable/generated/dask.array.from_array.html) between fewer tasks and more memory use, but I think there might be a case for setting the default to be True? Questions: 1) How should I test this? 2) Should it default to `False` or `True`? 3) `inline_array` or `inline`? (`inline_array` doesn't really make sense for `open_dataset`, which creates multiple arrays) - [x] Closes #1895 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` @rabernat @jbusecke 2022-05-02T19:39:07Z 2022-05-11T22:12:24Z 2022-05-11T20:26:43Z 2022-05-11T20:26:42Z 0512da117388a451653484b4f45927ac337b596f     0 102b503584da56f2c5faa59d0d508feae96fff34 6fbeb13105b419cb0a6646909df358d535e09faf MEMBER
{
    "enabled_by": {
        "login": "TomNicholas",
        "id": 35968931,
        "node_id": "MDQ6VXNlcjM1OTY4OTMx",
        "avatar_url": "https://avatars.githubusercontent.com/u/35968931?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/TomNicholas",
        "html_url": "https://github.com/TomNicholas",
        "followers_url": "https://api.github.com/users/TomNicholas/followers",
        "following_url": "https://api.github.com/users/TomNicholas/following{/other_user}",
        "gists_url": "https://api.github.com/users/TomNicholas/gists{/gist_id}",
        "starred_url": "https://api.github.com/users/TomNicholas/starred{/owner}{/repo}",
        "subscriptions_url": "https://api.github.com/users/TomNicholas/subscriptions",
        "organizations_url": "https://api.github.com/users/TomNicholas/orgs",
        "repos_url": "https://api.github.com/users/TomNicholas/repos",
        "events_url": "https://api.github.com/users/TomNicholas/events{/privacy}",
        "received_events_url": "https://api.github.com/users/TomNicholas/received_events",
        "type": "User",
        "site_admin": false
    },
    "merge_method": "squash",
    "commit_title": "New inline_array kwarg for open_dataset (#6566)",
    "commit_message": "* added inline_array kwarg\r\n\r\n* remove cheeky print statements\r\n\r\n* Remove another rogue print statement\r\n\r\n* bump dask dependency\r\n\r\n* update multiple dependencies based on min-deps-check.py\r\n\r\n* update environment to match #6559\r\n\r\n* Update h5py in ci/requirements/min-all-deps.yml\r\n\r\n* Update ci/requirements/min-all-deps.yml\r\n\r\n* remove pynio from test env\r\n\r\n* Update ci/requirements/min-all-deps.yml\r\n\r\n* promote inline_array kwarg to be top-level kwarg\r\n\r\n* whatsnew\r\n\r\n* add test\r\n\r\n* Remove repeated docstring entry\r\n\r\nCo-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>\r\n\r\n* Remove repeated docstring entry\r\n\r\nCo-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>\r\n\r\n* hyperlink to dask functions\r\n\r\nCo-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>"
}
13221727 https://github.com/pydata/xarray/pull/6566  

Links from other tables

  • 3 rows from pull_requests_id in labels_pull_requests
Powered by Datasette · Queries took 0.682ms