home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1409284680

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/7489#issuecomment-1409284680 https://api.github.com/repos/pydata/xarray/issues/7489 1409284680 IC_kwDOAMm_X85T__pI 11656932 2023-01-30T20:25:22Z 2023-01-30T20:25:22Z CONTRIBUTOR

I've not looked super deeply, so please let me know if I'm missing something, but I think that these lines in get_scheduler are the relevant ones here

https://github.com/dask/dask/blob/db5b2178a79cacc1c882d60a82bf86e2e188eccb/dask/base.py#L1405-L1406

Previously, distributed would set the scheduler config option to point to the default distributed.Client (if one existed). We've since changed that logic and distributed no no longer uses the config option for saying "you've got a distributed.Client, you should use it".

I think the problem here is, with the previous config-based behavior, the scheduler option, which is currently being set to the single-threaded scheduler in the test suite, would be overwritten by distributed to point to the Client. However, now that we're no longer using the scheduler config option in distributed, the test suite is actually using the single-threaded scheduler, which is why get_scheduler() is returning get_sync when using the latest dask / distributed release.

I'd argue the new behavior is actually what we want, but I see what you're saying about it being a change in behavior. I think in this case though it's just a tests-related issue. Does that sounds right? Or was setting the scheduler config option important for real-life user code too?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1562840229
Powered by Datasette · Queries took 81.066ms · About: xarray-datasette