home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where author_association = "NONE" and issue = 1339129609 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • ljstrnadiii 1

issue 1

  • Given zarr-backed Xarray determine store and group · 1 ✖

author_association 1

  • NONE · 1 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1215233142 https://github.com/pydata/xarray/issues/6916#issuecomment-1215233142 https://api.github.com/repos/pydata/xarray/issues/6916 IC_kwDOAMm_X85Ibvx2 ljstrnadiii 3171991 2022-08-15T15:59:29Z 2022-08-15T16:03:41Z NONE

@dcherian sure thing!

Use-case:

Sometimes I map functions over the chunks by passing slices around and read the dataset from zarr in the function, then slice on the subset and apply some function instead of map_blocks because I always struggle with that function and often write to zarr and don't return anything. So, I find myself passing store, group and the dataset itself (dask will complain if I try to pass the dataset around and ask to scatter--my guess is that the meta data is large enough to trigger that recommendation). ``` def iter_dset_chunks(dset: xr.Dataset): # these correspond to the start/stop of the underlying zarr chunks x_starts = np.cumsum([0] + list(dset.chunks["x"])[:-1]) x_start_step = zip(x_starts, dset.chunksizes["x"]) y_starts = np.cumsum([0] + list(dset.chunks["y"])[:-1]) y_start_step = zip(y_starts, dset.chunksizes["y"]) chunk_slices = list(product(x_start_step, y_start_step))

for (x_start, x_step), (y_start, y_step) in chunk_slices:
    x_slice = slice(x_start, x_start + x_step)
    y_slice = slice(y_start, y_start + y_step)
    yield x_slice, y_slice

def compute_write(store, group, x_slice, y_slice): dset = xr.open_zarr(store=store, group=group).sel(x=x_slice, y=y_slice) # some longer running operation result = big_op(dset) result.to_zarr(...)

def map_compute_write_v1(dset, store, group): slices = iter_dset_chunks(dset) for x_slice, y_slice in slices: f = client.submit(compute_write, store, group, x_slice, y_slice) ...

def map_compute_write_v2(dset): slices = iter_dset_chunks(dset) store = dset.encoding['source']['store'] group = dset.encoding['source']['group'] for x_slice, y_slice in slices: f = client.submit(compute_write, store, group, x_slice, y_slice) ... `` I would prefer to usemap_compute_write_v2` instead because it feels cleaner and there is less opportunity for store and group to deviate from dset. The issue you all might notice with this approach is that dset could be a chained operation of delayed tasks and we might think we are operating on that, but we would actually be operating only on the original dataset at store and group. I reserve this type of operation for zarr disk to disk ops and it gives me some control on the maximum amount of dask memory usage to help prevent killed workers--I usually batch the slices and submit batches to accomplish that.

I also assume that all datasets are zarr backed, but if I didn't I would need to know how to read again given the dataset's attributes.

https://discourse.pangeo.io/t/given-a-xarray-dataset-opened-from-zarr-how-to-determine-store-and-group/2482

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Given zarr-backed Xarray determine store and group 1339129609

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.127ms · About: xarray-datasette