html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/5954#issuecomment-964318162,https://api.github.com/repos/pydata/xarray/issues/5954,964318162,IC_kwDOAMm_X845elPS,226037,2021-11-09T16:28:59Z,2021-11-09T16:28:59Z,MEMBER,"> > 2. but most backends serialise writes anyway, so the advantage is limited. > > I'm not sure I understand this comment, specifically what is meant by ""serialise writes"". I often use Xarray to do distributed writes to Zarr stores using 100+ distributed dask workers. It works great. We would need the same thing from a TileDB backend. I should have added ""except Zarr"" 😅 . All netCDF writers use `xr.backends.locks.get_write_lock` to get a scheduler appropriate writing lock. The code is intricate and I don't find where to point you, but as I recall the lock was used so only one worker/process/thread could write to disk at a time. Concurrent writes a la Zarr are awesome and *xarray* supports them now, so my point was: we can add non-concurrent write support to the plugin architecture quite easily and that will serve a lot of users. But supporting Zarr and other advanced backends via the plugin architecture is a lot more work.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1047608434 https://github.com/pydata/xarray/issues/5954#issuecomment-964084038,https://api.github.com/repos/pydata/xarray/issues/5954,964084038,IC_kwDOAMm_X845dsFG,1197350,2021-11-09T11:56:30Z,2021-11-09T11:56:30Z,MEMBER,"Thanks for the info @alexamici! > 2\. but most backends serialise writes anyway, so the advantage is limited. I'm not sure I understand this comment, specifically what is meant by ""serialise writes"". I often use Xarray to do distributed writes to Zarr stores using 100+ distributed dask workers. It works great. We would need the same thing from a TileDB backend. We are focusing on the user-facing API, but in the end, whether we call it `.to`, `.to_dataset`, or `.store_dataset` is not really a difficult or important question. It's clear we need _some_ generic writing method. The much harder question is the **back-end API**. As Alessandro says: > Adding support for a single save_dataset entry point to the backend API is trivial, but adding full support for possibly distributed writes looks like it is much more work. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1047608434 https://github.com/pydata/xarray/issues/5954#issuecomment-963895846,https://api.github.com/repos/pydata/xarray/issues/5954,963895846,IC_kwDOAMm_X845c-Im,226037,2021-11-09T07:53:00Z,2021-11-09T08:02:20Z,MEMBER,"@rabernat and all, at the time of the read-only backend refactor @aurghs and I spent quite some time analysing write support and thinking of a unifying strategy. This is my interpretation of our findings: 1. one of the big advantages of the unified `xr.open_dataset` API is that you don't need to specify the `engine` of the input data and you can rely on *xarray* guessing it. This is in general non true when you write your data, as you care about what format you are storing it. 2. another advantage of `xr.open_dataset` is that *xarray* manages all the functionaries related to *dask* and to in-memory cacheing, so backends only need to know how to lazily read from the storage. Current (rather complex) implementation has support for writing from *dask* and *distributed* workers but most backends serialise writes anyway, so the advantage is limited. This is not to say that it is not worth, but the cost / benefit ratio of supporting potentially distributed writes is much lower than read support. 3. that said, I'd really welcome a unified write API like `ds.save(engine=...)` or even `xr.save_dataset(ds, engine=...)` with a `engine` keyword argument and possibly other common options. Adding support for a single `save_dataset` entry point to the backend API is trivial, but adding full support for possibly distributed writes looks like it is much more work. Also note that ATM @aurghs and I are overloaded at work and we would have very little time that we can spend on this :/","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1047608434 https://github.com/pydata/xarray/issues/5954#issuecomment-963887149,https://api.github.com/repos/pydata/xarray/issues/5954,963887149,IC_kwDOAMm_X845c8At,14808389,2021-11-09T07:37:17Z,2021-11-09T07:37:17Z,MEMBER,"If we do that, I'd call it `save_dataset` to be consistent with `{open,save}_mfdataset`","{""total_count"": 5, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1047608434 https://github.com/pydata/xarray/issues/5954#issuecomment-963830525,https://api.github.com/repos/pydata/xarray/issues/5954,963830525,IC_kwDOAMm_X845cuL9,14371165,2021-11-09T05:28:52Z,2021-11-09T05:28:52Z,MEMBER,"Another option is using a similiarly named store function as the read functions: ```python xr.open_dataset(...) xr.store_dataset(...) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1047608434 https://github.com/pydata/xarray/issues/5954#issuecomment-963542493,https://api.github.com/repos/pydata/xarray/issues/5954,963542493,IC_kwDOAMm_X845bn3d,1296209,2021-11-08T20:23:43Z,2021-11-08T20:23:43Z,NONE,"Is `ds.to(` the most discoverable method for users? What about making it so that backends can add methods to `ds.to`, so `ds.to.netcdf()` or `ds.to.tile_db()` based on what backends are installed? That way they might not have to guess as much as to what engine and file types can be written.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1047608434 https://github.com/pydata/xarray/issues/5954#issuecomment-963476936,https://api.github.com/repos/pydata/xarray/issues/5954,963476936,IC_kwDOAMm_X845bX3I,2443309,2021-11-08T18:59:14Z,2021-11-08T18:59:14Z,MEMBER,"Thanks @rabernat for opening up this issue. I think now that the refactor for read support is completed, it is a great time to discuss the opportunities for adding write support to the plugin interface. pinging @aurghs and @alexamici since I know they have some thoughts developed here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1047608434