home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 656403217

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4209#issuecomment-656403217 https://api.github.com/repos/pydata/xarray/issues/4209 656403217 MDEyOklzc3VlQ29tbWVudDY1NjQwMzIxNw== 2448579 2020-07-09T23:43:17Z 2020-07-09T23:43:17Z MEMBER

Here's an alternative map_blocks solution:

``` python def write_block(ds, t0): if len(ds.time) > 0: fname = (ds.time[0] - t0).values.astype("timedelta64[h]").astype(int) ds.to_netcdf(f"temp/file-{fname:06d}.nc")

# dummy return
return ds.time

ds = xr.tutorial.open_dataset("air_temperature", chunks={"time": 100}) ds.map_blocks(write_block, kwargs=dict(t0=ds.time[0])).compute(scheduler="processes") ```

There are two workarounds here though. 1. The user function always has to return something. 2. We can't provide template=ds.time because it has no chunk information and ds.time.chunk({"time": 100}) silently does nothing because it is an IndexVariable. So the user function still needs the len(ds.time) > 0 workaround.

I think a cleaner API may be to have dask.compute([write_block(block) for block in ds.to_delayed()]) where ds.to_delayed() yields a bunch of tasks; each of which gives a Dataset wrapping one block of the underlying array.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  653442225
Powered by Datasette · Queries took 158.23ms · About: xarray-datasette