home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 466994138 and user = 8380659 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • cdibble · 1 ✖

issue 1

  • Support parallel writes to zarr store · 1 ✖

author_association 1

  • NONE 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
672978363 https://github.com/pydata/xarray/issues/3096#issuecomment-672978363 https://api.github.com/repos/pydata/xarray/issues/3096 MDEyOklzc3VlQ29tbWVudDY3Mjk3ODM2Mw== cdibble 8380659 2020-08-12T16:26:46Z 2020-08-12T16:26:46Z NONE

Hi All,

Thanks for all of your great work, support, and discussion on these and other pages. I very much appreciate it as I am working with Xarray and Zarr quite a lot for large geospatial data storage and manipulation.

I wanted to add a note to this discussion that I have had success using Zarr's built-in ProcessSynchornizer (which relies on the fasteners package). This provides a pretty easy and clean implementation of file locks as long as you can provide a file system that is shared across any and all process that might try to access the Zarr file. For me, that means using an AWS EFS mount, which gives me the flexibility to deploy this in a serverless context or on a more standard cloud cluster.

It does seem that providing explicit chunking rules as you have mentioned above (or using the Zarr encoding argument, which I haven't tried but I think is another option) is a great way to handle this and likely outperforms the locking approach (just a guess- would love to hear from others about this). But the locks are pretty easily implemented and seem to have helped me avoid the problems related to race conditions with Zarr.

For the sake of completeness, here is a simple example of how you might do this:

synchronizer = zarr.ProcessSynchronizer(f"/mnt/efs_mnt/tmp/mur_regional_raw_sst/zarr_locks/{bounding_box['grid_loc']}_locker.sync") compressor = zarr.Blosc(cname='zstd', clevel=3) encoding = {vname: {'compressor': compressor} for vname in current_region.data_vars} current_region.to_zarr(store=store, mode='w',encoding=encoding, consolidated=True, synchronizer = synchronizer)

I would be happy to discuss further and am very much open to critique, instruction, etc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to zarr store 466994138

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 868.803ms · About: xarray-datasette