home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1084220684

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1084220684 PR_kwDOAMm_X84wDPg5 6086 Type protocol for internal variable mapping 35968931 closed 0     9 2021-12-19T23:32:04Z 2023-12-06T17:20:48Z 2023-12-06T17:19:30Z MEMBER   1 pydata/xarray/pulls/6086

In #5961 and #6083 I've been experimenting extending Dataset to store variables in a custom mapping object (instead of always in a dict), so as to eventually fix this mutability problem with DataTree.

I've been writing out new storage class implementations in those PRs, but on Friday @shoyer suggested that I could instead simply alter the allowed type for ._variables in xarray.Dataset's type hints. That would allow me to mess about with storage class implementations outside of xarray, whilst guaranteeing type compatibility with xarray main itself with absolutely minimal changes (hopefully no runtime changes to Dataset at all!).

The idea is to define a protocol in xarray which specifies the structural subtyping behaviour of any custom variable storage class that I might want to set as Dataset._variables. The type hint for the ._variables attribute then refers to this protocol, and will be satisfied as long as whatever object is set as ._variables has compatibly-typed methods. Adding type hints to the ._construct_direct and ._replace constructors is enough to propagate this new type specification all over the codebase.

In practice this means writing a protocol which describes the type behaviour of all the methods on dict that currently get used by ._variable accesses.

So far I've written out a CopyableMutableMapping protocol which defines all the methods needed. The issues I'm stuck on at the moment are:

1) The typing behaviour of overloaded methods, specifically update. (setdefault also has similar problems but I think I can safely omit that from the protocol definition because we don't call ._variables.setdefault() anywhere.) Mypy complains that CopyableMutableMapping is not a compatible type when Dict is specified because the type specification of overloaded methods isn't quite right somehow:

```
xarray/core/computation.py:410: error: Argument 1 to "_construct_direct" of "Dataset" has incompatible type "Dict[Hashable, Variable]"; expected "CopyableMutableMapping[Hashable, Variable]"  [arg-type]
xarray/core/computation.py:410: note: Following member(s) of "Dict[Hashable, Variable]" have conflicts:
xarray/core/computation.py:410: note:     Expected:
xarray/core/computation.py:410: note:         @overload
xarray/core/computation.py:410: note:         def update(self, other: Mapping[Hashable, Variable], **kwargs: Variable) -> None
xarray/core/computation.py:410: note:         @overload
xarray/core/computation.py:410: note:         def update(self, other: Iterable[Tuple[Hashable, Variable]], **kwargs: Variable) -> None
xarray/core/computation.py:410: note:         <1 more overload not shown>
xarray/core/computation.py:410: note:     Got:
xarray/core/computation.py:410: note:         @overload
xarray/core/computation.py:410: note:         def update(self, Mapping[Hashable, Variable], **kwargs: Variable) -> None
xarray/core/computation.py:410: note:         @overload
xarray/core/computation.py:410: note:         def update(self, Iterable[Tuple[Hashable, Variable]], **kwargs: Variable) -> None
```
I don't understand what the inconsistency is because I literally looked up the exact way that [the type stubs](https://github.com/python/typeshed/blob/e6911530d4d52db0fbdf05be3aff89e520ee39bc/stdlib/typing.pyi#L490) for `Dict` were written (via `MutableMapping`).

2) Making functions which expect a Mapping accept my CopyableMutableMapping. I would have thought this would just work because I think my protocol defines all the methods which Mapping has, so CopyableMutableMapping should automatically become a subtype of Mapping. But instead I get errors like this with no further information as to what to do about it.

```xarray/core/dataset.py:785: error: Argument 1 to "Frozen" has incompatible type "CopyableMutableMapping[Hashable, Variable]"; expected "Mapping[Hashable, Variable]"  [arg-type]```

3) I'm expecting to get a runtime problem whenever we assert isinstance(ds._variables, dict), which happens in a few places. I'm no sure what the best way to deal with that is, but I'm hoping that simply adding @typing.runtime_checkable to the protocol class definition will be enough?

Once that passes mypy I will write a test that checks that if I define my own custom variable storage class I can _construct_direct a Dataset which uses it without any errors. At that point I can be confident that Dataset is general enough to hold whichever exact variable storage class I end up needing for DataTree.

@max-sixty this is entirely a typing challenge, so I'm tagging you in case you're interested :)

  • [ ] Would supercede #5961 and #6083
  • [ ] Tests added
  • [ ] Passes pre-commit run --all-files

EDIT: Also using Protocol at all is only available in Python 3.8+

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6086/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 1 row from issues_id in issues_labels
  • 8 rows from issue in issue_comments
Powered by Datasette · Queries took 0.894ms · About: xarray-datasette