home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 964318162

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/5954#issuecomment-964318162 https://api.github.com/repos/pydata/xarray/issues/5954 964318162 IC_kwDOAMm_X845elPS 226037 2021-11-09T16:28:59Z 2021-11-09T16:28:59Z MEMBER
  1. but most backends serialise writes anyway, so the advantage is limited.

I'm not sure I understand this comment, specifically what is meant by "serialise writes". I often use Xarray to do distributed writes to Zarr stores using 100+ distributed dask workers. It works great. We would need the same thing from a TileDB backend.

I should have added "except Zarr" 😅 .

All netCDF writers use xr.backends.locks.get_write_lock to get a scheduler appropriate writing lock. The code is intricate and I don't find where to point you, but as I recall the lock was used so only one worker/process/thread could write to disk at a time.

Concurrent writes a la Zarr are awesome and xarray supports them now, so my point was: we can add non-concurrent write support to the plugin architecture quite easily and that will serve a lot of users. But supporting Zarr and other advanced backends via the plugin architecture is a lot more work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1047608434
Powered by Datasette · Queries took 0.652ms · About: xarray-datasette