home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1977661256

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1977661256 I_kwDOAMm_X8514LdI 8414 Is there any way of having `.map_blocks` be even more opaque to dask? 5635139 closed 0     23 2023-11-05T06:56:43Z 2023-12-12T18:14:57Z 2023-12-12T18:14:57Z MEMBER      

Is your feature request related to a problem?

Currently I have a workload which does something a bit like:

python ds = open_zarr(source) ( ds.assign( x=ds.foo * ds.bar y=ds.foo + ds.bar ).to_zarr(dest) )

(the actual calc is a bit more complicated! And while I don't have a MVCE of the full calc, I pasted a task graph below)

Dask — while very impressive in many ways — handles this extremely badly, because it attempts to load the whole of ds into memory before writing out any chunks. There are lots of issues on this in the dask repo; it seems like an intractable problem for dask.

Describe the solution you'd like

I was hoping to make the internals of this task opaque to dask, so it became a much dumber task runner — just map over the blocks, running the function and writing the result, block by block. I thought I had some success with .map_blocks last week — the internals of the calc are now opaque at least. But the dask cluster is falling over again, I think because the write is seen as a separate task.

Is there any way to make the write more opaque too?

Describe alternatives you've considered

I've built a homegrown thing which is really hacky which does this on a custom scheduler — just runs the functions and writes with region. I'd much prefer to use & contribute to the broader ecosystem...

Additional context

(It's also possible I'm making some basic error — and I do remember it working much better last week — so please feel free to direct me / ask me for more examples, if this doesn't ring true)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8414/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 1.002ms · About: xarray-datasette