home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "MEMBER", issue = 466994138 and user = 1197350 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • rabernat · 3 ✖

issue 1

  • Support parallel writes to zarr store · 3 ✖

author_association 1

  • MEMBER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
730446943 https://github.com/pydata/xarray/issues/3096#issuecomment-730446943 https://api.github.com/repos/pydata/xarray/issues/3096 MDEyOklzc3VlQ29tbWVudDczMDQ0Njk0Mw== rabernat 1197350 2020-11-19T15:22:41Z 2020-11-19T15:22:41Z MEMBER

Just a note that #4035 provides a new way to do parallel writing to zarr stores.

@VincentDehaye & @cdibble, would you be willing to test this out and see if it meets your needs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to zarr store 466994138
516047812 https://github.com/pydata/xarray/issues/3096#issuecomment-516047812 https://api.github.com/repos/pydata/xarray/issues/3096 MDEyOklzc3VlQ29tbWVudDUxNjA0NzgxMg== rabernat 1197350 2019-07-29T15:47:13Z 2019-07-29T15:47:13Z MEMBER

@VincentDehaye - we are eager to help you. But it is difficult to hit a moving target.

I would like to politely suggest that we keep this issue on topic: making sure that parallel append to zarr store works as expected. Your latest post revealed that you did not try our suggested resolution (use open_mfdataset + dask parallelization) but instead introduced a new, possibly unrelated issue.

I recommend you open a new, separate issue related to "storing different variables being indexed by the same dimension".

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to zarr store 466994138
510659320 https://github.com/pydata/xarray/issues/3096#issuecomment-510659320 https://api.github.com/repos/pydata/xarray/issues/3096 MDEyOklzc3VlQ29tbWVudDUxMDY1OTMyMA== rabernat 1197350 2019-07-11T21:23:33Z 2019-07-11T21:23:33Z MEMBER

Hi @VincentDehaye. Thanks for being an early adopter! We really appreciate your feedback. I'm sorry it didn't work as expected. We are in really new territory with this feature.

I'm a bit confused about why you are using the multiprocessing module here. The recommended way of parallelizing xarray operations is via the built-in dask support. There are no guarantees that multiprocessing like you're doing will work right. When we talk about parallel append, we are always talking about dask.

Your MCVE is not especially helpful for debugging because the two key functions (make_xarray_dataset and upload_to_s3) are not shown. Could you try simplifying your example a bit? I know it is hard when cloud is involved. But try to let us see more of what is happening under the hood.

If you are creating a dataset for the first time, you probably don't want append. You want to do python ds = xr.open_mfdataset(all_the_source_files) ds.to_zarr(s3fs_target)

If you are using a dask cluster, this will automatically parallelize everything.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to zarr store 466994138

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 72.777ms · About: xarray-datasette