id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1945654275,PR_kwDOAMm_X85c7HL_,8319,Move parallelcompat and chunkmanagers to NamedArray,35968931,closed,0,,,9,2023-10-16T16:34:26Z,2024-02-12T22:09:24Z,2024-02-12T22:09:24Z,MEMBER,,0,pydata/xarray/pulls/8319,"@dcherian I got to this point before realizing that simply moving `parallelcompat.py` over isn't [what it says in the design doc](https://github.com/pydata/xarray/blob/main/design_notes/named_array_design_doc.md#appendix-implementation-details), which instead talks about > - Could this functionality be left in Xarray proper for now? Alternative array types like JAX also have some notion of ""chunks"" for parallel arrays, but the details differ in a number of ways from the Dask/Cubed. > - Perhaps variable.chunk/load methods should become functions defined in xarray that convert Variable objects. This is easy so long as xarray can reach in and replace `.data` I personally think that simply moving parallelcompat makes sense so long as you expect people to use chunked `NamedArray` objects. I see the chunked arrays as special cases of duck arrays, and my understanding is that `NamedArray` is supposed to have full support for duckarrays. cc @andersy005 - [x] As requested in #8238 - [ ] ~~Tests added~~ - [ ] ~~User visible changes (including notable bug fixes) are documented in `whats-new.rst`~~ - [ ] ~~New functions/methods are listed in `api.rst`~~ ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8319/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1084220684,PR_kwDOAMm_X84wDPg5,6086,Type protocol for internal variable mapping,35968931,closed,0,,,9,2021-12-19T23:32:04Z,2023-12-06T17:20:48Z,2023-12-06T17:19:30Z,MEMBER,,1,pydata/xarray/pulls/6086,"In #5961 and #6083 I've been experimenting extending `Dataset` to store variables in a custom mapping object (instead of always in a `dict`), so as to eventually fix [this mutability problem](https://github.com/TomNicholas/datatree/issues/38) with `DataTree`. I've been writing out new storage class implementations in those PRs, but on Friday @shoyer suggested that I could instead simply alter the allowed type for `._variables` in `xarray.Dataset`'s type hints. That would allow me to mess about with storage class implementations outside of xarray, whilst guaranteeing type compatibility with xarray `main` itself with absolutely minimal changes (hopefully no runtime changes to `Dataset` at all!). The idea is to define a [protocol](https://www.python.org/dev/peps/pep-0544/) in xarray which specifies the structural subtyping behaviour of any custom variable storage class that I might want to set as `Dataset._variables`. The type hint for the `._variables` attribute then refers to this protocol, and will be satisfied as long as whatever object is set as `._variables` has compatibly-typed methods. Adding type hints to the `._construct_direct` and `._replace` constructors is enough to propagate this new type specification all over the codebase. In practice this means writing a protocol which describes the type behaviour of all the methods on `dict` that currently get used by `._variable` accesses. So far I've written out a `CopyableMutableMapping` protocol which defines all the methods needed. The issues I'm stuck on at the moment are: 1) The typing behaviour of overloaded methods, specifically `update`. (`setdefault` also has similar problems but I think I can safely omit that from the protocol definition because we don't call `._variables.setdefault()` anywhere.) Mypy complains that `CopyableMutableMapping` is not a compatible type when `Dict` is specified because the type specification of overloaded methods isn't quite right somehow: ``` xarray/core/computation.py:410: error: Argument 1 to ""_construct_direct"" of ""Dataset"" has incompatible type ""Dict[Hashable, Variable]""; expected ""CopyableMutableMapping[Hashable, Variable]"" [arg-type] xarray/core/computation.py:410: note: Following member(s) of ""Dict[Hashable, Variable]"" have conflicts: xarray/core/computation.py:410: note: Expected: xarray/core/computation.py:410: note: @overload xarray/core/computation.py:410: note: def update(self, other: Mapping[Hashable, Variable], **kwargs: Variable) -> None xarray/core/computation.py:410: note: @overload xarray/core/computation.py:410: note: def update(self, other: Iterable[Tuple[Hashable, Variable]], **kwargs: Variable) -> None xarray/core/computation.py:410: note: <1 more overload not shown> xarray/core/computation.py:410: note: Got: xarray/core/computation.py:410: note: @overload xarray/core/computation.py:410: note: def update(self, Mapping[Hashable, Variable], **kwargs: Variable) -> None xarray/core/computation.py:410: note: @overload xarray/core/computation.py:410: note: def update(self, Iterable[Tuple[Hashable, Variable]], **kwargs: Variable) -> None ``` I don't understand what the inconsistency is because I literally looked up the exact way that [the type stubs](https://github.com/python/typeshed/blob/e6911530d4d52db0fbdf05be3aff89e520ee39bc/stdlib/typing.pyi#L490) for `Dict` were written (via `MutableMapping`). 2) Making functions which expect a `Mapping` accept my `CopyableMutableMapping`. I would have thought this would just work because I think my protocol defines all the methods which `Mapping` has, so `CopyableMutableMapping` should automatically become a subtype of `Mapping`. But instead I get errors like this with no further information as to what to do about it. ```xarray/core/dataset.py:785: error: Argument 1 to ""Frozen"" has incompatible type ""CopyableMutableMapping[Hashable, Variable]""; expected ""Mapping[Hashable, Variable]"" [arg-type]``` 3) I'm expecting to get a runtime problem whenever we `assert isinstance(ds._variables, dict)`, which happens in a few places. I'm no sure what the best way to deal with that is, but I'm hoping that simply [adding `@typing.runtime_checkable`](https://www.python.org/dev/peps/pep-0544/#runtime-checkable-decorator-and-narrowing-types-by-isinstance) to the protocol class definition will be enough? Once that passes mypy I will write a test that checks that if I define my own custom variable storage class I can `_construct_direct` a `Dataset` which uses it without any errors. At that point I can be confident that `Dataset` is general enough to hold whichever exact variable storage class I end up needing for `DataTree`. @max-sixty this is entirely a typing challenge, so I'm tagging you in case you're interested :) - [ ] Would supercede #5961 and #6083 - [ ] Tests added - [ ] Passes `pre-commit run --all-files` EDIT: Also using `Protocol` at all is only available in Python 3.8+","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6086/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 936045730,MDExOlB1bGxSZXF1ZXN0NjgyODYzMjgz,5568,Add to_numpy() and as_numpy() methods,35968931,closed,0,,,9,2021-07-02T20:17:40Z,2021-07-21T22:06:47Z,2021-07-21T21:42:48Z,MEMBER,,0,pydata/xarray/pulls/5568," - [x] Closes #3245 - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5568/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull