html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4490#issuecomment-718346664,https://api.github.com/repos/pydata/xarray/issues/4490,718346664,MDEyOklzc3VlQ29tbWVudDcxODM0NjY2NA==,35919497,2020-10-29T04:07:46Z,2020-10-29T04:07:46Z,COLLABORATOR,"Taking into account the comments in this issue and the calls, I would propose this solution: https://github.com/pydata/xarray/pull/4547","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,715374721 https://github.com/pydata/xarray/issues/4490#issuecomment-705594621,https://api.github.com/repos/pydata/xarray/issues/4490,705594621,MDEyOklzc3VlQ29tbWVudDcwNTU5NDYyMQ==,226037,2020-10-08T14:06:35Z,2020-10-08T14:06:35Z,MEMBER,"@shoyer I favour option 2. as a stable solution, possibly with all arguments moved to keyword-only ones: * users don't need to import and additional class * users get the argument completion on`open_dataset` * *xarray* does validation and mangling in the class and passes to the backends only the non default values I'm for using the words decode/decoding but I'm against the use of CF. What backend will do is map the on-disk representation of the data (typically optimised for space) to the in-memory representation preferred by *xarray* (typically optimised of easy of use). This mapping is especially tricky for time-like variables. CF is one possible way to specify the map, but it is not the only one. Both the GRIB format and all the spatial formats supported by *GDAL/rasterio* can encode rich data and decoding has (typically) nothing to do with the CF conventions. My preferred meaning for the `decode_`-options is: * `True`: the backend attempts to map the data to the *xarray* natural data types (`np.datetime64`, `np.float` with mask and scale) * `False`: the backend attempts to return a representation of the data as close as possible to the on-disk one Typically when a user asks the backend not to decode they intend to insepct the content of the data file to understand why something is not mapping in the expected way. As an example: in the case of GRIB time-like values are represented as integers like `20190101`, but *cfgrib* at the moment is forced to convert them into a fake CF representation before passing them to *xarray*, and when using `decode_times=False` a GRIB user is presented with something that has nothing to do with the on-disk representation.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,715374721 https://github.com/pydata/xarray/issues/4490#issuecomment-704631854,https://api.github.com/repos/pydata/xarray/issues/4490,704631854,MDEyOklzc3VlQ29tbWVudDcwNDYzMTg1NA==,1217238,2020-10-07T01:02:41Z,2020-10-07T01:02:41Z,MEMBER,"I see this as complementary to @alexamici's proposal in https://github.com/pydata/xarray/issues/4309#issuecomment-697952300, which I also like. I guess the main difference is moving `decode_cf` into the explicit signature of `open_dataset`, rather than leaving it in `**kwargs`. Auto-completion is one reason to prefer a class, but enforced error checking and consistency between backends in the data model are also good reasons. In particular, it is important that users get an error if they mis-spell an argument name, e.g., `open_dataset(path, decode_times=False)` vs `open_dataset(path, decode_time=False)`. We can definitely achieve this with putting decoding options into a `dict`, too, but we would need to be carefully to always validate the set of dict keys. I guess cfgrib is an example of a backend with its own CF decoding options? This is indeed a tricky design question. I don't know if it's possible to make `xarray.open_dataset()` directly extensible in this way -- this could still be a reason for a user to use a backend-specific `cfgrib.open_dataset()` function.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,715374721 https://github.com/pydata/xarray/issues/4490#issuecomment-704607745,https://api.github.com/repos/pydata/xarray/issues/4490,704607745,MDEyOklzc3VlQ29tbWVudDcwNDYwNzc0NQ==,35919497,2020-10-06T23:35:30Z,2020-10-06T23:35:30Z,COLLABORATOR,"I agree, `open_dataset()` currently has a very long signature that should be changed. The interface you proposed is obviously clearer, but a class could give a false idea that all backends support all the decoding options listed in the class. I see two other alternatives: - Instead of a class we could use a dictionary. Pros 1, 2 and 3 would still hold. - With the interface proposed by @alexamici in #4309 the pros 2 and 3 would still hold and partially 1 (since the open_dataset interface would be greatly simplified). For both these proposals, we would lose the autocompletion with the tab but, on the other hand, the user would be relieved of managing a class. Finally, I'm not sure that for the user it would be clear the separation between backend_kwargs and decode, since they both contain arguments that will be passed to the backend. Especially if the backend needs more specific decoding options that must be set in backend_kwargs. In this sense, #4309 seems less error-prone. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,715374721 https://github.com/pydata/xarray/issues/4490#issuecomment-704396229,https://api.github.com/repos/pydata/xarray/issues/4490,704396229,MDEyOklzc3VlQ29tbWVudDcwNDM5NjIyOQ==,1217238,2020-10-06T16:25:54Z,2020-10-06T16:25:54Z,MEMBER,"> I agree that we should add CF to the names. Can we use `Decoders` instead of `DecodingOptions`? So `decode_cf` and `CFDecoders`? works for me!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,715374721 https://github.com/pydata/xarray/issues/4490#issuecomment-704360238,https://api.github.com/repos/pydata/xarray/issues/4490,704360238,MDEyOklzc3VlQ29tbWVudDcwNDM2MDIzOA==,2448579,2020-10-06T15:41:18Z,2020-10-06T15:41:18Z,MEMBER,"Totally in favour of this. Option 2 does seem like a good intermediate step. This proposal would make error handling easier (https://github.com/pydata/xarray/issues/3020) I agree that we should add CF to the names. Can we use `Decoders` instead of `DecodingOptions`? So `decode_cf` and `CFDecoders`?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,715374721