github: issue_comments: 1 row where author_association = "NONE" and issue = 1339129609 sorted by updated

1 row where author_association = "NONE" and issue = 1339129609 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	performed_via_github_app	issue
1215233142	https://github.com/pydata/xarray/issues/6916#issuecomment-1215233142	https://api.github.com/repos/pydata/xarray/issues/6916	IC_kwDOAMm_X85Ibvx2	ljstrnadiii 3171991	2022-08-15T15:59:29Z	2022-08-15T16:03:41Z	NONE	@dcherian sure thing! Use-case: Sometimes I map functions over the chunks by passing slices around and read the dataset from zarr in the function, then slice on the subset and apply some function instead of map_blocks because I always struggle with that function and often write to zarr and don't return anything. So, I find myself passing store, group and the dataset itself (dask will complain if I try to pass the dataset around and ask to scatter--my guess is that the meta data is large enough to trigger that recommendation). ``` def iter_dset_chunks(dset: xr.Dataset): # these correspond to the start/stop of the underlying zarr chunks x_starts = np.cumsum([0] + list(dset.chunks["x"])[:-1]) x_start_step = zip(x_starts, dset.chunksizes["x"]) y_starts = np.cumsum([0] + list(dset.chunks["y"])[:-1]) y_start_step = zip(y_starts, dset.chunksizes["y"]) chunk_slices = list(product(x_start_step, y_start_step)) `for (x_start, x_step), (y_start, y_step) in chunk_slices: x_slice = slice(x_start, x_start + x_step) y_slice = slice(y_start, y_start + y_step) yield x_slice, y_slice` def compute_write(store, group, x_slice, y_slice): dset = xr.open_zarr(store=store, group=group).sel(x=x_slice, y=y_slice) # some longer running operation result = big_op(dset) result.to_zarr(...) def map_compute_write_v1(dset, store, group): slices = iter_dset_chunks(dset) for x_slice, y_slice in slices: f = client.submit(compute_write, store, group, x_slice, y_slice) ... def map_compute_write_v2(dset): slices = iter_dset_chunks(dset) store = dset.encoding['source']['store'] group = dset.encoding['source']['group'] for x_slice, y_slice in slices: f = client.submit(compute_write, store, group, x_slice, y_slice) ... `` I would prefer to usemap_compute_write_v2` instead because it feels cleaner and there is less opportunity for store and group to deviate from dset. The issue you all might notice with this approach is that dset could be a chained operation of delayed tasks and we might think we are operating on that, but we would actually be operating only on the original dataset at store and group. I reserve this type of operation for zarr disk to disk ops and it gives me some control on the maximum amount of dask memory usage to help prevent killed workers--I usually batch the slices and submit batches to accomplish that. I also assume that all datasets are zarr backed, but if I didn't I would need to know how to read again given the dataset's attributes. https://discourse.pangeo.io/t/given-a-xarray-dataset-opened-from-zarr-how-to-determine-store-and-group/2482	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		Given zarr-backed Xarray determine store and group 1339129609

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

1 row where author_association = "NONE" and issue = 1339129609 sorted by updated_at descending

Use-case:

Advanced export