home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "CONTRIBUTOR" and issue = 169274464 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • mcgibbon 3
  • dopplershift 1

issue 1

  • Consider how to deal with the proliferation of decoder options on open_dataset · 4 ✖

author_association 1

  • CONTRIBUTOR · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
300838234 https://github.com/pydata/xarray/issues/939#issuecomment-300838234 https://api.github.com/repos/pydata/xarray/issues/939 MDEyOklzc3VlQ29tbWVudDMwMDgzODIzNA== dopplershift 221526 2017-05-11T16:08:19Z 2017-05-11T16:08:19Z CONTRIBUTOR

I agree that having too many keyword arguments is poor design; it's representative of either failing to abstract anything away or having the object/function just do too much. For a specific example, this jumps out to me as a problem: python ds = conventions.decode_cf( store, mask_and_scale=mask_and_scale, decode_times=decode_times, concat_characters=concat_characters, decode_coords=decode_coords, drop_variables=drop_variables) Already open_dataset takes 5 parameters just to pass on directly to another function. This means to add a 6th to decode_cf, you have to update the code and doctstring there, and then make those same changes to open_dataset. Now, you could argue that they're used again within the function within open_dataset python token = tokenize(file_arg, group, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, engine, chunks, drop_variables) but again you're using all of these parameters together. If all of these variable values are needed to define the state, you already have an implicit object in your code; you're just not using the language syntax to help you by encapsulating it.

I'd be in favor of having lightweight classes (essentially mutable named tuples) vs. dictionaries. The former allows more discoverability to the interface (i.e. tab completion in IPython) as well as better up-front error checking (you could use __slots__ to permit only certain attributes). My experience with assembling dictionaries for options is a world of typo-prone pain; trying to prevent that is especially important when teaching new users. You could still give this class the right hooks (e.g. __iter__, asdict) to allow it to be passed as **kwargs to decode_cf.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Consider how to deal with the proliferation of decoder options on open_dataset 169274464
300647473 https://github.com/pydata/xarray/issues/939#issuecomment-300647473 https://api.github.com/repos/pydata/xarray/issues/939 MDEyOklzc3VlQ29tbWVudDMwMDY0NzQ3Mw== mcgibbon 12307589 2017-05-11T00:16:34Z 2017-05-11T00:16:34Z CONTRIBUTOR

It is considered poor software design to have 13 arguments in Java and other languages which do not have optional arguments. The same isn't necessarily true of Python, but I haven't seen much discussion or writing on this.

I'd much rather have pandas.read_csv the way it is right now than to have a ReadOptions object that would need to contain exactly the same documentation and be just as hard to understand as read_csv. That object would serve only to separate the documentation of the settings for read_csv from the docstring for read_csv. If you really want to cut down on arguments, open_dataset should be separated into multiple functions. I wouldn't necessarily encourage these, but some possibilities are:

  • Have a function which takes in an undecoded dataset and returns a CF-decoded dataset, instead of a decode_cf kwarg
  • Have a function which takes in an unmasked/unscaled dataset and returns a masked/scaled dataset, instead of mask_and_scale
  • Have a function which takes in a dataset with undecoded times and returns a decoded dataset, instead of decode_times
  • similarly for decode_coords, chunks, and drop_variables. Should chunks and drop_variables even exist as kwargs, given that the functions to do these to a dataset already exist?

All of that aside, the DecoderOptions object already exists if that's what you want - it's the dict.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Consider how to deal with the proliferation of decoder options on open_dataset 169274464
300640372 https://github.com/pydata/xarray/issues/939#issuecomment-300640372 https://api.github.com/repos/pydata/xarray/issues/939 MDEyOklzc3VlQ29tbWVudDMwMDY0MDM3Mg== mcgibbon 12307589 2017-05-10T23:26:57Z 2017-05-10T23:26:57Z CONTRIBUTOR

I would disagree with the form open_dataset(filename, decode_options=kwargs) over open_dataset(filename, **kwargs), because the former breaks normal Python style. It would make the documentation for the arguments somewhat awkward ("decode_options is a dictionary which can have any of the following keys [...]"). It also forces the user to use a dictionary instead of having the option to use a dictionary or the regular style of entering kwargs.

What do you mean when you say it's easier to do error checking on field names and values? The xarray implementation can still use fields instead of a dictionary, with the user saying open_dataset(filename, **kwargs) if they feel like it. I think I'm not understanding something here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Consider how to deal with the proliferation of decoder options on open_dataset 169274464
237664856 https://github.com/pydata/xarray/issues/939#issuecomment-237664856 https://api.github.com/repos/pydata/xarray/issues/939 MDEyOklzc3VlQ29tbWVudDIzNzY2NDg1Ng== mcgibbon 12307589 2016-08-04T19:55:10Z 2016-08-04T19:55:10Z CONTRIBUTOR

We already have the dictionary. Users can make a decode_options dictionary, and then call what they want to with **decode_options.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Consider how to deal with the proliferation of decoder options on open_dataset 169274464

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.345ms · About: xarray-datasette