home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where issue = 1047608434 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 6

  • alexamici 2
  • rabernat 1
  • abkfenris 1
  • jhamman 1
  • Illviljan 1
  • keewis 1

author_association 2

  • MEMBER 6
  • NONE 1

issue 1

  • Writeable backends via entrypoints · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
964318162 https://github.com/pydata/xarray/issues/5954#issuecomment-964318162 https://api.github.com/repos/pydata/xarray/issues/5954 IC_kwDOAMm_X845elPS alexamici 226037 2021-11-09T16:28:59Z 2021-11-09T16:28:59Z MEMBER
  1. but most backends serialise writes anyway, so the advantage is limited.

I'm not sure I understand this comment, specifically what is meant by "serialise writes". I often use Xarray to do distributed writes to Zarr stores using 100+ distributed dask workers. It works great. We would need the same thing from a TileDB backend.

I should have added "except Zarr" 😅 .

All netCDF writers use xr.backends.locks.get_write_lock to get a scheduler appropriate writing lock. The code is intricate and I don't find where to point you, but as I recall the lock was used so only one worker/process/thread could write to disk at a time.

Concurrent writes a la Zarr are awesome and xarray supports them now, so my point was: we can add non-concurrent write support to the plugin architecture quite easily and that will serve a lot of users. But supporting Zarr and other advanced backends via the plugin architecture is a lot more work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writeable backends via entrypoints 1047608434
964084038 https://github.com/pydata/xarray/issues/5954#issuecomment-964084038 https://api.github.com/repos/pydata/xarray/issues/5954 IC_kwDOAMm_X845dsFG rabernat 1197350 2021-11-09T11:56:30Z 2021-11-09T11:56:30Z MEMBER

Thanks for the info @alexamici!

2. but most backends serialise writes anyway, so the advantage is limited.

I'm not sure I understand this comment, specifically what is meant by "serialise writes". I often use Xarray to do distributed writes to Zarr stores using 100+ distributed dask workers. It works great. We would need the same thing from a TileDB backend.

We are focusing on the user-facing API, but in the end, whether we call it .to, .to_dataset, or .store_dataset is not really a difficult or important question. It's clear we need some generic writing method. The much harder question is the back-end API. As Alessandro says:

Adding support for a single save_dataset entry point to the backend API is trivial, but adding full support for possibly distributed writes looks like it is much more work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writeable backends via entrypoints 1047608434
963895846 https://github.com/pydata/xarray/issues/5954#issuecomment-963895846 https://api.github.com/repos/pydata/xarray/issues/5954 IC_kwDOAMm_X845c-Im alexamici 226037 2021-11-09T07:53:00Z 2021-11-09T08:02:20Z MEMBER

@rabernat and all, at the time of the read-only backend refactor @aurghs and I spent quite some time analysing write support and thinking of a unifying strategy. This is my interpretation of our findings:

  1. one of the big advantages of the unified xr.open_dataset API is that you don't need to specify the engine of the input data and you can rely on xarray guessing it. This is in general non true when you write your data, as you care about what format you are storing it.

  2. another advantage of xr.open_dataset is that xarray manages all the functionaries related to dask and to in-memory cacheing, so backends only need to know how to lazily read from the storage. Current (rather complex) implementation has support for writing from dask and distributed workers but most backends serialise writes anyway, so the advantage is limited. This is not to say that it is not worth, but the cost / benefit ratio of supporting potentially distributed writes is much lower than read support.

  3. that said, I'd really welcome a unified write API like ds.save(engine=...) or even xr.save_dataset(ds, engine=...) with a engine keyword argument and possibly other common options. Adding support for a single save_dataset entry point to the backend API is trivial, but adding full support for possibly distributed writes looks like it is much more work.

Also note that ATM @aurghs and I are overloaded at work and we would have very little time that we can spend on this :/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writeable backends via entrypoints 1047608434
963887149 https://github.com/pydata/xarray/issues/5954#issuecomment-963887149 https://api.github.com/repos/pydata/xarray/issues/5954 IC_kwDOAMm_X845c8At keewis 14808389 2021-11-09T07:37:17Z 2021-11-09T07:37:17Z MEMBER

If we do that, I'd call it save_dataset to be consistent with {open,save}_mfdataset

{
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writeable backends via entrypoints 1047608434
963830525 https://github.com/pydata/xarray/issues/5954#issuecomment-963830525 https://api.github.com/repos/pydata/xarray/issues/5954 IC_kwDOAMm_X845cuL9 Illviljan 14371165 2021-11-09T05:28:52Z 2021-11-09T05:28:52Z MEMBER

Another option is using a similiarly named store function as the read functions: python xr.open_dataset(...) xr.store_dataset(...)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writeable backends via entrypoints 1047608434
963542493 https://github.com/pydata/xarray/issues/5954#issuecomment-963542493 https://api.github.com/repos/pydata/xarray/issues/5954 IC_kwDOAMm_X845bn3d abkfenris 1296209 2021-11-08T20:23:43Z 2021-11-08T20:23:43Z NONE

Is ds.to( the most discoverable method for users?

What about making it so that backends can add methods to ds.to, so ds.to.netcdf() or ds.to.tile_db() based on what backends are installed? That way they might not have to guess as much as to what engine and file types can be written.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writeable backends via entrypoints 1047608434
963476936 https://github.com/pydata/xarray/issues/5954#issuecomment-963476936 https://api.github.com/repos/pydata/xarray/issues/5954 IC_kwDOAMm_X845bX3I jhamman 2443309 2021-11-08T18:59:14Z 2021-11-08T18:59:14Z MEMBER

Thanks @rabernat for opening up this issue. I think now that the refactor for read support is completed, it is a great time to discuss the opportunities for adding write support to the plugin interface.

pinging @aurghs and @alexamici since I know they have some thoughts developed here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writeable backends via entrypoints 1047608434

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.684ms · About: xarray-datasette