home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where author_association = "MEMBER" and issue = 302806158 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • shoyer 3
  • jhamman 2
  • rabernat 1

issue 1

  • API Design for Xarray Backends · 6 ✖

author_association 1

  • MEMBER · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
704053953 https://github.com/pydata/xarray/issues/1970#issuecomment-704053953 https://api.github.com/repos/pydata/xarray/issues/1970 MDEyOklzc3VlQ29tbWVudDcwNDA1Mzk1Mw== shoyer 1217238 2020-10-06T06:15:56Z 2020-10-06T06:15:56Z MEMBER

I wrote up a proposal for grouping together decoding options into a single argument: https://github.com/pydata/xarray/issues/4490. Feedback would be very welcome!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API Design for Xarray Backends 302806158
370884840 https://github.com/pydata/xarray/issues/1970#issuecomment-370884840 https://api.github.com/repos/pydata/xarray/issues/1970 MDEyOklzc3VlQ29tbWVudDM3MDg4NDg0MA== shoyer 1217238 2018-03-06T18:43:10Z 2020-06-09T01:20:04Z MEMBER

Backend needs that have changed since I drafted an API refactor in https://github.com/pydata/xarray/pull/1087: - pickle support, required to support backends with dask distributed - caching for managing open files, required for efficiently loading data from multiple files at once (see file_manager.py) - locking, required for threadsafe operations. Note that xarray backends are only threadsafe after files are opened, not during open_dataset (https://github.com/pydata/xarray/issues/4100). - support for array indexing modes, required for supporting various forms of indexing in xarray - customized conventions for encoding/decoding data (e.g., required for zarr)

It would be nice to figure out how to abstract away these details from backend authors as much as possible. Most of the ingredients for these features exist in xarray.backends (e.g., see CachingFileManager), but the lack of a clean-separation between internal and public APIs makes it hard to write backends outside of xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API Design for Xarray Backends 302806158
515315204 https://github.com/pydata/xarray/issues/1970#issuecomment-515315204 https://api.github.com/repos/pydata/xarray/issues/1970 MDEyOklzc3VlQ29tbWVudDUxNTMxNTIwNA== jhamman 2443309 2019-07-26T05:32:43Z 2019-07-26T05:32:43Z MEMBER

@danielballan wrote a blog post on how entrypoints can be used for 3rd party libraries can register plugins: https://blog.danallan.com/posts/2019-07-24-use-entrypoints-more/. I'm thinking this could be a particularly convenient way to allow libraries to register new backend engines to open_dataset in a standard way.

Of course, we still need to come up with a standard API for backends but the entrypoint idea could be the solution to hooking them into xarray.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API Design for Xarray Backends 302806158
372509209 https://github.com/pydata/xarray/issues/1970#issuecomment-372509209 https://api.github.com/repos/pydata/xarray/issues/1970 MDEyOklzc3VlQ29tbWVudDM3MjUwOTIwOQ== jhamman 2443309 2018-03-13T00:46:53Z 2018-03-13T00:46:53Z MEMBER

@darothen - not sure exactly. We probably just need to put in the machinery for the to_rasterio method. I'm also not sure what necessary considerations would need to be made for the cloud part -- presumably that would fall on rasterio.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API Design for Xarray Backends 302806158
370921380 https://github.com/pydata/xarray/issues/1970#issuecomment-370921380 https://api.github.com/repos/pydata/xarray/issues/1970 MDEyOklzc3VlQ29tbWVudDM3MDkyMTM4MA== shoyer 1217238 2018-03-06T20:43:29Z 2018-03-06T20:44:18Z MEMBER

What is the role of the netCDF API in the backend API?

A netCDF-like API is a good starting place for xarray backends, since our data model is strongly modeled on netCDF. But that's not quite unambiguous enough for us. There are lots of details like indexing, dtypes and locking that need awareness of both how xarray works and the specific backend. So I think we are unlikely to be able to eliminate the need for adapter classes.

My understanding of the point of h5netcdf was to provide a netCDF-like interface for HDF5, thereby making it easier to interface with xarray.

Yes, this was a large point of h5netcdf, although there are also users of h5netcdf without xarray. The main reason why it's a separate project is facilitate separation of concerns: xarray backends should be about how to adapt storage systems to work with xarray, not focused on details of another file format.

h5netcdf is now up to about 1500 lines of code (including tests), and that's definitely big enough that I'm happy I wrote it as a separate project. The full netCDF4 data model turns out to involve a fair amount of nuance.

Alternatively, if adaptation to the netCDF data model is easy (e.g., <100 lines of code), then it may not be worth the separate package. This is currently the case for zarr.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API Design for Xarray Backends 302806158
370913422 https://github.com/pydata/xarray/issues/1970#issuecomment-370913422 https://api.github.com/repos/pydata/xarray/issues/1970 MDEyOklzc3VlQ29tbWVudDM3MDkxMzQyMg== rabernat 1197350 2018-03-06T20:16:14Z 2018-03-06T20:16:14Z MEMBER

What is the role of the netCDF API in the backend API?

My understanding of the point of h5netcdf was to provide a netCDF-like interface for HDF5, thereby making it easier to interface with xarray. So one potential answer to the backend API question is simply: make a netCDF-like interface for your library and then xarray can use it.

However we still need a separate h5netcdf backend within xarray, so this design is perhaps not as clean as we would like.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API Design for Xarray Backends 302806158

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.612ms · About: xarray-datasette