home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "NONE" and issue = 1307523148 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • alessioarena 5

issue 1

  • Passing a distributed.Future to the kwargs of apply_ufunc should resolve the future · 5 ✖

author_association 1

  • NONE · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1280786780 https://github.com/pydata/xarray/issues/6803#issuecomment-1280786780 https://api.github.com/repos/pydata/xarray/issues/6803 IC_kwDOAMm_X85MV0Fc alessioarena 33886395 2022-10-17T12:33:18Z 2022-10-17T12:33:18Z NONE

I will try that. I still find it weird that I need to wrap a numpy object into a task/xarray object to be able to send it to workers when there is dask.scatter made for exactly that purpose.

Thanks for opening that issue. I do feel there is the need to revisit scatter functionality and role particularly around dynamic clusters.

Having a better look at your initial comment, that may still work if you call Future.result() method inside the function applied. That in theory should retrieve the data associated with that Future, in that case "Hello World". However, in a dark gateway setup that will fail

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Passing a distributed.Future to the kwargs of apply_ufunc should resolve the future 1307523148
1280759221 https://github.com/pydata/xarray/issues/6803#issuecomment-1280759221 https://api.github.com/repos/pydata/xarray/issues/6803 IC_kwDOAMm_X85MVtW1 alessioarena 33886395 2022-10-17T12:11:05Z 2022-10-17T12:11:05Z NONE

I'm not sure I understand the code above.

In my case I have an array of approximately 300k elements that each and every function call needs to have access. I can pass it as a kwargs in its numpy form, but once I scale up the calculation across a large dataset (many large chunks) such array gets replicated for every task pushing the scheduler out of memory.

That is why I tried to send the dataset to the cluster beforehand using scatter, but I cannot resolve the Future at the workers

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Passing a distributed.Future to the kwargs of apply_ufunc should resolve the future 1307523148
1280743293 https://github.com/pydata/xarray/issues/6803#issuecomment-1280743293 https://api.github.com/repos/pydata/xarray/issues/6803 IC_kwDOAMm_X85MVpd9 alessioarena 33886395 2022-10-17T11:59:19Z 2022-10-17T11:59:19Z NONE

I can add that this problem is augmented in a dask_gateway system where the task just fails.

With apply_ufunc I never received an error but in similar context I obtained something very similar to https://github.com/dask/dask-gateway/issues/404.

My interpretation is that the Future is resolved at the worker (or in case of apply_ufunc a thread of this worker) and embeds a reference to the Client object. This last however uses a gateway connection that is not understood by the worker as generally is the scheduler dealing with those

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Passing a distributed.Future to the kwargs of apply_ufunc should resolve the future 1307523148
1264523142 https://github.com/pydata/xarray/issues/6803#issuecomment-1264523142 https://api.github.com/repos/pydata/xarray/issues/6803 IC_kwDOAMm_X85LXxeG alessioarena 33886395 2022-10-02T01:29:35Z 2022-10-02T01:29:35Z NONE

I think I may have narrowed down the problem to a limitation in dask using dask_gateway.

If passing a Future to a worker, the worker will try to unpickle that Future, and as part of that unpickle the Client object passed when creating such Future.

Unfortunately, in a dask_gateway context the client is behind a gateway connection that is not understood by the worker as normally does not have to deal with a gateway at all. In my case I do not get any error message, just the task failing and retrying over and over, but fiddling around I managed to get the same error as this post (https://stackoverflow.com/questions/70775315/scattering-data-to-dask-cluster-workers-unknown-address-scheme-gateway)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Passing a distributed.Future to the kwargs of apply_ufunc should resolve the future 1307523148
1260319916 https://github.com/pydata/xarray/issues/6803#issuecomment-1260319916 https://api.github.com/repos/pydata/xarray/issues/6803 IC_kwDOAMm_X85LHvSs alessioarena 33886395 2022-09-28T02:53:25Z 2022-09-28T02:53:25Z NONE

This is still an issue. I noticed that the documentation of map_blocks states: kwargs (mapping) – Passed verbatim to func after unpacking. xarray objects, if any, will not be subset to blocks. Passing dask collections in kwargs is not allowed.

Is this the case for apply_ufunc as well? if yes than it is not documented. Is there another recommended way to pass data to workers without clogging the scheduler for this application?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Passing a distributed.Future to the kwargs of apply_ufunc should resolve the future 1307523148

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.268ms · About: xarray-datasette