issues: 715374721

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
715374721	MDU6SXNzdWU3MTUzNzQ3MjE=	4490	Group together decoding options into a single argument	1217238	open	0			6	2020-10-06T06:15:18Z	2020-10-29T04:07:46Z		MEMBER				Is your feature request related to a problem? Please describe. `open_dataset()` currently has a very long function signature. This makes it hard to keep track of everything it can do, and is particularly problematic for the authors of new backends (e.g., see https://github.com/pydata/xarray/pull/4477), which might need to know how to handle all these arguments. Describe the solution you'd like To simple the interface, I propose to group together all the decoding options into a new `DecodingOptions` class. I'm thinking something like: ```python from dataclasses import dataclass, field, asdict from typing import Optional, List @dataclass(frozen=True) class DecodingOptions: mask: Optional[bool] = None scale: Optional[bool] = None datetime: Optional[bool] = None timedelta: Optional[bool] = None use_cftime: Optional[bool] = None concat_characters: Optional[bool] = None coords: Optional[bool] = None drop_variables: Optional[List[str]] = None `@classmethods def disabled(cls): return cls(mask=False, scale=False, datetime=False, timedelta=False, concat_characters=False, coords=False) def non_defaults(self): return {k: v for k, v in asdict(self).items() if v is not None} # add another method for creating default Variable Coder() objects, # e.g., those listed in encode_cf_variable()` ``` The signature of `open_dataset` would then become: `python def open_dataset( filename_or_obj, group=None, * engine=None, chunks=None, lock=None, cache=None, backend_kwargs=None, decode: Union[DecodingOptions, bool] = None, deprecated_kwargs ): if decode is None: decode = DecodingOptions() if decode is False: decode = DecodingOptions.disabled() # handle deprecated_kwargs... ...` Question: are `decode` and `DecodingOptions` the right names? Maybe these should still include the name "CF", e.g., `decode_cf` and `CFDecodingOptions`, given that these are specific to CF conventions? Note*: the current signature is `open_dataset(filename_or_obj, group=None, decode_cf=True, mask_and_scale=None, decode_times=True, autoclose=None, concat_characters=True, decode_coords=True, engine=None, chunks=None, lock=None, cache=None, drop_variables=None, backend_kwargs=None, use_cftime=None, decode_timedelta=None)` Usage with the new interface would look like `xr.open_dataset(filename, decode=False)` or `xr.open_dataset(filename, decode=xr.DecodingOptions(mask=False, scale=False))`. This requires a little* bit more typing than what we currently have, but it has a few advantages: It's easier to understand the role of different arguments. Now there is a function with ~8 arguments and a class with ~8 arguments rather than a function with ~15 arguments. It's easier to add new decoding arguments (e.g., for more advanced CF conventions), because they don't clutter the `open_dataset` interface. For example, I separated out `mask` and `scale` arguments, versus the current `mask_and_scale` argument. If a new backend plugin for `open_dataset()` needs to handle every option supported by `open_dataset()`, this makes that task significantly easier. The only decoding options they need to worry about are non-default options that were explicitly set, i.e., those exposed by the `non_defaults()` method. If another decoding option wasn't explicitly set and isn't recognized by the backend, they can just ignore it. Describe alternatives you've considered For the overall approach: We could keep the current design, with separate keyword arguments for decoding options, and just be very careful about passing around these arguments. This seems pretty painful for the backend refactor, though. We could keep the current design only for the user facing `open_dataset()` interface, and then internally convert into the `DecodingOptions()` struct for passing to backend constructors. This would provide much needed flexibility for backend authors, but most users wouldn't benefit from the new interface. Perhaps this would make sense as an intermediate step?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4490/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			13221727	issue

Links from other tables

2 rows from issues_id in issues_labels
6 rows from issue in issue_comments