html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/6916#issuecomment-1216491512,https://api.github.com/repos/pydata/xarray/issues/6916,1216491512,IC_kwDOAMm_X85Igi_4,1197350,2022-08-16T11:11:38Z,2022-08-16T11:11:38Z,MEMBER,"As a general principle, I think we should try to put enough information in `encoding` to enable one to re-open the dataset from scratch with the same parameters. So that would mean including the engine and other `open_dataset` options in `encoding`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1339129609
https://github.com/pydata/xarray/issues/6916#issuecomment-1215233142,https://api.github.com/repos/pydata/xarray/issues/6916,1215233142,IC_kwDOAMm_X85Ibvx2,3171991,2022-08-15T15:59:29Z,2022-08-15T16:03:41Z,NONE,"@dcherian sure thing!
### Use-case:
Sometimes I map functions over the chunks by passing slices around and read the dataset from zarr in the function, then slice on the subset and apply some function instead of map_blocks because I always struggle with that function and often write to zarr and don't return anything. So, I find myself passing store, group and the dataset itself (dask will complain if I try to pass the dataset around and ask to scatter--my guess is that the meta data is large enough to trigger that recommendation).
```
def iter_dset_chunks(dset: xr.Dataset):
# these correspond to the start/stop of the underlying zarr chunks
x_starts = np.cumsum([0] + list(dset.chunks[""x""])[:-1])
x_start_step = zip(x_starts, dset.chunksizes[""x""])
y_starts = np.cumsum([0] + list(dset.chunks[""y""])[:-1])
y_start_step = zip(y_starts, dset.chunksizes[""y""])
chunk_slices = list(product(x_start_step, y_start_step))
for (x_start, x_step), (y_start, y_step) in chunk_slices:
x_slice = slice(x_start, x_start + x_step)
y_slice = slice(y_start, y_start + y_step)
yield x_slice, y_slice
def compute_write(store, group, x_slice, y_slice):
dset = xr.open_zarr(store=store, group=group).sel(x=x_slice, y=y_slice)
# some longer running operation
result = big_op(dset)
result.to_zarr(...)
def map_compute_write_v1(dset, store, group):
slices = iter_dset_chunks(dset)
for x_slice, y_slice in slices:
f = client.submit(compute_write, store, group, x_slice, y_slice)
...
def map_compute_write_v2(dset):
slices = iter_dset_chunks(dset)
store = dset.encoding['source']['store']
group = dset.encoding['source']['group']
for x_slice, y_slice in slices:
f = client.submit(compute_write, store, group, x_slice, y_slice)
...
```
I would prefer to use `map_compute_write_v2` instead because it feels cleaner and there is less opportunity for store and group to deviate from dset. The issue you all might notice with this approach is that dset could be a chained operation of delayed tasks and we might think we are operating on that, but we would actually be operating only on the original dataset at store and group. I reserve this type of operation for zarr disk to disk ops and it gives me some control on the maximum amount of dask memory usage to help prevent killed workers--I usually batch the slices and submit batches to accomplish that.
I also assume that all datasets are zarr backed, but if I didn't I would need to know how to read again given the dataset's attributes.
https://discourse.pangeo.io/t/given-a-xarray-dataset-opened-from-zarr-how-to-determine-store-and-group/2482","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1339129609
https://github.com/pydata/xarray/issues/6916#issuecomment-1215146323,https://api.github.com/repos/pydata/xarray/issues/6916,1215146323,IC_kwDOAMm_X85IbalT,2448579,2022-08-15T15:21:04Z,2022-08-15T15:21:04Z,MEMBER,"@ljstrnadiii can you tell us a bit more about your use-case and why you need these to be specified on the Dataset?
Note you could add them manually as a Dataset attribute.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1339129609