home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER" and issue = 715374721 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • shoyer 2
  • alexamici 1
  • dcherian 1

issue 1

  • Group together decoding options into a single argument · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
705594621 https://github.com/pydata/xarray/issues/4490#issuecomment-705594621 https://api.github.com/repos/pydata/xarray/issues/4490 MDEyOklzc3VlQ29tbWVudDcwNTU5NDYyMQ== alexamici 226037 2020-10-08T14:06:35Z 2020-10-08T14:06:35Z MEMBER

@shoyer I favour option 2. as a stable solution, possibly with all arguments moved to keyword-only ones: * users don't need to import and additional class * users get the argument completion onopen_dataset * xarray does validation and mangling in the class and passes to the backends only the non default values

I'm for using the words decode/decoding but I'm against the use of CF.

What backend will do is map the on-disk representation of the data (typically optimised for space) to the in-memory representation preferred by xarray (typically optimised of easy of use). This mapping is especially tricky for time-like variables.

CF is one possible way to specify the map, but it is not the only one. Both the GRIB format and all the spatial formats supported by GDAL/rasterio can encode rich data and decoding has (typically) nothing to do with the CF conventions.

My preferred meaning for the decode_-options is: * True: the backend attempts to map the data to the xarray natural data types (np.datetime64, np.float with mask and scale) * False: the backend attempts to return a representation of the data as close as possible to the on-disk one

Typically when a user asks the backend not to decode they intend to insepct the content of the data file to understand why something is not mapping in the expected way.

As an example: in the case of GRIB time-like values are represented as integers like 20190101, but cfgrib at the moment is forced to convert them into a fake CF representation before passing them to xarray, and when using decode_times=False a GRIB user is presented with something that has nothing to do with the on-disk representation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Group together decoding options into a single argument 715374721
704631854 https://github.com/pydata/xarray/issues/4490#issuecomment-704631854 https://api.github.com/repos/pydata/xarray/issues/4490 MDEyOklzc3VlQ29tbWVudDcwNDYzMTg1NA== shoyer 1217238 2020-10-07T01:02:41Z 2020-10-07T01:02:41Z MEMBER

I see this as complementary to @alexamici's proposal in https://github.com/pydata/xarray/issues/4309#issuecomment-697952300, which I also like. I guess the main difference is moving decode_cf into the explicit signature of open_dataset, rather than leaving it in **kwargs.

Auto-completion is one reason to prefer a class, but enforced error checking and consistency between backends in the data model are also good reasons. In particular, it is important that users get an error if they mis-spell an argument name, e.g., open_dataset(path, decode_times=False) vs open_dataset(path, decode_time=False). We can definitely achieve this with putting decoding options into a dict, too, but we would need to be carefully to always validate the set of dict keys.

I guess cfgrib is an example of a backend with its own CF decoding options? This is indeed a tricky design question. I don't know if it's possible to make xarray.open_dataset() directly extensible in this way -- this could still be a reason for a user to use a backend-specific cfgrib.open_dataset() function.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Group together decoding options into a single argument 715374721
704396229 https://github.com/pydata/xarray/issues/4490#issuecomment-704396229 https://api.github.com/repos/pydata/xarray/issues/4490 MDEyOklzc3VlQ29tbWVudDcwNDM5NjIyOQ== shoyer 1217238 2020-10-06T16:25:54Z 2020-10-06T16:25:54Z MEMBER

I agree that we should add CF to the names. Can we use Decoders instead of DecodingOptions? So decode_cf and CFDecoders?

works for me!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Group together decoding options into a single argument 715374721
704360238 https://github.com/pydata/xarray/issues/4490#issuecomment-704360238 https://api.github.com/repos/pydata/xarray/issues/4490 MDEyOklzc3VlQ29tbWVudDcwNDM2MDIzOA== dcherian 2448579 2020-10-06T15:41:18Z 2020-10-06T15:41:18Z MEMBER

Totally in favour of this. Option 2 does seem like a good intermediate step.

This proposal would make error handling easier (https://github.com/pydata/xarray/issues/3020)

I agree that we should add CF to the names. Can we use Decoders instead of DecodingOptions? So decode_cf and CFDecoders?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Group together decoding options into a single argument 715374721

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.073ms · About: xarray-datasette