home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "NONE" and user = 3171991 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

issue 3

  • Given zarr-backed Xarray determine store and group 1
  • ArrayNotFoundError when saving xarray Dataset as zarr 1
  • Inspecting arguments with accessors 1

user 1

  • ljstrnadiii · 3 ✖

author_association 1

  • NONE · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1527606751 https://github.com/pydata/xarray/issues/7015#issuecomment-1527606751 https://api.github.com/repos/pydata/xarray/issues/7015 IC_kwDOAMm_X85bDW3f ljstrnadiii 3171991 2023-04-28T13:55:34Z 2023-04-28T13:55:34Z NONE

@jdldeauna how did you resolve this issue? I am seeing similar issues, but only encounter this when writing to zarr with a dask cluster.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ArrayNotFoundError when saving xarray Dataset as zarr 1368186791
1297218347 https://github.com/pydata/xarray/issues/7234#issuecomment-1297218347 https://api.github.com/repos/pydata/xarray/issues/7234 IC_kwDOAMm_X85NUfsr ljstrnadiii 3171991 2022-10-31T15:00:06Z 2022-10-31T15:09:42Z NONE

Yeah, I was afraid of that lol. This must also be a pandas issue, too, since they recommend a similar way of extending pandas dataframes. I couldn't find a similar pandas issue/pr and was a bit surprised by that.

My temp solution is to use type narrowing:

def use_domain_specific_dataset(dset: xr.Dataset): if not isinstance(dset.ma, MyAccessor): raise something # gets some type checking dset.ma.my_special_func() Seems to at least provide the basic sanity checks

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inspecting arguments with accessors 1427457128
1215233142 https://github.com/pydata/xarray/issues/6916#issuecomment-1215233142 https://api.github.com/repos/pydata/xarray/issues/6916 IC_kwDOAMm_X85Ibvx2 ljstrnadiii 3171991 2022-08-15T15:59:29Z 2022-08-15T16:03:41Z NONE

@dcherian sure thing!

Use-case:

Sometimes I map functions over the chunks by passing slices around and read the dataset from zarr in the function, then slice on the subset and apply some function instead of map_blocks because I always struggle with that function and often write to zarr and don't return anything. So, I find myself passing store, group and the dataset itself (dask will complain if I try to pass the dataset around and ask to scatter--my guess is that the meta data is large enough to trigger that recommendation). ``` def iter_dset_chunks(dset: xr.Dataset): # these correspond to the start/stop of the underlying zarr chunks x_starts = np.cumsum([0] + list(dset.chunks["x"])[:-1]) x_start_step = zip(x_starts, dset.chunksizes["x"]) y_starts = np.cumsum([0] + list(dset.chunks["y"])[:-1]) y_start_step = zip(y_starts, dset.chunksizes["y"]) chunk_slices = list(product(x_start_step, y_start_step))

for (x_start, x_step), (y_start, y_step) in chunk_slices:
    x_slice = slice(x_start, x_start + x_step)
    y_slice = slice(y_start, y_start + y_step)
    yield x_slice, y_slice

def compute_write(store, group, x_slice, y_slice): dset = xr.open_zarr(store=store, group=group).sel(x=x_slice, y=y_slice) # some longer running operation result = big_op(dset) result.to_zarr(...)

def map_compute_write_v1(dset, store, group): slices = iter_dset_chunks(dset) for x_slice, y_slice in slices: f = client.submit(compute_write, store, group, x_slice, y_slice) ...

def map_compute_write_v2(dset): slices = iter_dset_chunks(dset) store = dset.encoding['source']['store'] group = dset.encoding['source']['group'] for x_slice, y_slice in slices: f = client.submit(compute_write, store, group, x_slice, y_slice) ... `` I would prefer to usemap_compute_write_v2` instead because it feels cleaner and there is less opportunity for store and group to deviate from dset. The issue you all might notice with this approach is that dset could be a chained operation of delayed tasks and we might think we are operating on that, but we would actually be operating only on the original dataset at store and group. I reserve this type of operation for zarr disk to disk ops and it gives me some control on the maximum amount of dask memory usage to help prevent killed workers--I usually batch the slices and submit batches to accomplish that.

I also assume that all datasets are zarr backed, but if I didn't I would need to know how to read again given the dataset's attributes.

https://discourse.pangeo.io/t/given-a-xarray-dataset-opened-from-zarr-how-to-determine-store-and-group/2482

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Given zarr-backed Xarray determine store and group 1339129609

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.058ms · About: xarray-datasette