home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 672978363

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3096#issuecomment-672978363 https://api.github.com/repos/pydata/xarray/issues/3096 672978363 MDEyOklzc3VlQ29tbWVudDY3Mjk3ODM2Mw== 8380659 2020-08-12T16:26:46Z 2020-08-12T16:26:46Z NONE

Hi All,

Thanks for all of your great work, support, and discussion on these and other pages. I very much appreciate it as I am working with Xarray and Zarr quite a lot for large geospatial data storage and manipulation.

I wanted to add a note to this discussion that I have had success using Zarr's built-in ProcessSynchornizer (which relies on the fasteners package). This provides a pretty easy and clean implementation of file locks as long as you can provide a file system that is shared across any and all process that might try to access the Zarr file. For me, that means using an AWS EFS mount, which gives me the flexibility to deploy this in a serverless context or on a more standard cloud cluster.

It does seem that providing explicit chunking rules as you have mentioned above (or using the Zarr encoding argument, which I haven't tried but I think is another option) is a great way to handle this and likely outperforms the locking approach (just a guess- would love to hear from others about this). But the locks are pretty easily implemented and seem to have helped me avoid the problems related to race conditions with Zarr.

For the sake of completeness, here is a simple example of how you might do this:

synchronizer = zarr.ProcessSynchronizer(f"/mnt/efs_mnt/tmp/mur_regional_raw_sst/zarr_locks/{bounding_box['grid_loc']}_locker.sync") compressor = zarr.Blosc(cname='zstd', clevel=3) encoding = {vname: {'compressor': compressor} for vname in current_region.data_vars} current_region.to_zarr(store=store, mode='w',encoding=encoding, consolidated=True, synchronizer = synchronizer)

I would be happy to discuss further and am very much open to critique, instruction, etc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  466994138
Powered by Datasette · Queries took 0.679ms · About: xarray-datasette