home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where comments = 9, type = "pull" and user = 35968931 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: draft, created_at (date), updated_at (date)

type 1

  • pull · 3 ✖

state 1

  • closed 3

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1945654275 PR_kwDOAMm_X85c7HL_ 8319 Move parallelcompat and chunkmanagers to NamedArray TomNicholas 35968931 closed 0     9 2023-10-16T16:34:26Z 2024-02-12T22:09:24Z 2024-02-12T22:09:24Z MEMBER   0 pydata/xarray/pulls/8319

@dcherian I got to this point before realizing that simply moving parallelcompat.py over isn't what it says in the design doc, which instead talks about

  • Could this functionality be left in Xarray proper for now? Alternative array types like JAX also have some notion of "chunks" for parallel arrays, but the details differ in a number of ways from the Dask/Cubed.
  • Perhaps variable.chunk/load methods should become functions defined in xarray that convert Variable objects. This is easy so long as xarray can reach in and replace .data

I personally think that simply moving parallelcompat makes sense so long as you expect people to use chunked NamedArray objects. I see the chunked arrays as special cases of duck arrays, and my understanding is that NamedArray is supposed to have full support for duckarrays.

cc @andersy005

  • [x] As requested in #8238
  • [ ] ~~Tests added~~
  • [ ] ~~User visible changes (including notable bug fixes) are documented in whats-new.rst~~
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8319/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1084220684 PR_kwDOAMm_X84wDPg5 6086 Type protocol for internal variable mapping TomNicholas 35968931 closed 0     9 2021-12-19T23:32:04Z 2023-12-06T17:20:48Z 2023-12-06T17:19:30Z MEMBER   1 pydata/xarray/pulls/6086

In #5961 and #6083 I've been experimenting extending Dataset to store variables in a custom mapping object (instead of always in a dict), so as to eventually fix this mutability problem with DataTree.

I've been writing out new storage class implementations in those PRs, but on Friday @shoyer suggested that I could instead simply alter the allowed type for ._variables in xarray.Dataset's type hints. That would allow me to mess about with storage class implementations outside of xarray, whilst guaranteeing type compatibility with xarray main itself with absolutely minimal changes (hopefully no runtime changes to Dataset at all!).

The idea is to define a protocol in xarray which specifies the structural subtyping behaviour of any custom variable storage class that I might want to set as Dataset._variables. The type hint for the ._variables attribute then refers to this protocol, and will be satisfied as long as whatever object is set as ._variables has compatibly-typed methods. Adding type hints to the ._construct_direct and ._replace constructors is enough to propagate this new type specification all over the codebase.

In practice this means writing a protocol which describes the type behaviour of all the methods on dict that currently get used by ._variable accesses.

So far I've written out a CopyableMutableMapping protocol which defines all the methods needed. The issues I'm stuck on at the moment are:

1) The typing behaviour of overloaded methods, specifically update. (setdefault also has similar problems but I think I can safely omit that from the protocol definition because we don't call ._variables.setdefault() anywhere.) Mypy complains that CopyableMutableMapping is not a compatible type when Dict is specified because the type specification of overloaded methods isn't quite right somehow:

```
xarray/core/computation.py:410: error: Argument 1 to "_construct_direct" of "Dataset" has incompatible type "Dict[Hashable, Variable]"; expected "CopyableMutableMapping[Hashable, Variable]"  [arg-type]
xarray/core/computation.py:410: note: Following member(s) of "Dict[Hashable, Variable]" have conflicts:
xarray/core/computation.py:410: note:     Expected:
xarray/core/computation.py:410: note:         @overload
xarray/core/computation.py:410: note:         def update(self, other: Mapping[Hashable, Variable], **kwargs: Variable) -> None
xarray/core/computation.py:410: note:         @overload
xarray/core/computation.py:410: note:         def update(self, other: Iterable[Tuple[Hashable, Variable]], **kwargs: Variable) -> None
xarray/core/computation.py:410: note:         <1 more overload not shown>
xarray/core/computation.py:410: note:     Got:
xarray/core/computation.py:410: note:         @overload
xarray/core/computation.py:410: note:         def update(self, Mapping[Hashable, Variable], **kwargs: Variable) -> None
xarray/core/computation.py:410: note:         @overload
xarray/core/computation.py:410: note:         def update(self, Iterable[Tuple[Hashable, Variable]], **kwargs: Variable) -> None
```
I don't understand what the inconsistency is because I literally looked up the exact way that [the type stubs](https://github.com/python/typeshed/blob/e6911530d4d52db0fbdf05be3aff89e520ee39bc/stdlib/typing.pyi#L490) for `Dict` were written (via `MutableMapping`).

2) Making functions which expect a Mapping accept my CopyableMutableMapping. I would have thought this would just work because I think my protocol defines all the methods which Mapping has, so CopyableMutableMapping should automatically become a subtype of Mapping. But instead I get errors like this with no further information as to what to do about it.

```xarray/core/dataset.py:785: error: Argument 1 to "Frozen" has incompatible type "CopyableMutableMapping[Hashable, Variable]"; expected "Mapping[Hashable, Variable]"  [arg-type]```

3) I'm expecting to get a runtime problem whenever we assert isinstance(ds._variables, dict), which happens in a few places. I'm no sure what the best way to deal with that is, but I'm hoping that simply adding @typing.runtime_checkable to the protocol class definition will be enough?

Once that passes mypy I will write a test that checks that if I define my own custom variable storage class I can _construct_direct a Dataset which uses it without any errors. At that point I can be confident that Dataset is general enough to hold whichever exact variable storage class I end up needing for DataTree.

@max-sixty this is entirely a typing challenge, so I'm tagging you in case you're interested :)

  • [ ] Would supercede #5961 and #6083
  • [ ] Tests added
  • [ ] Passes pre-commit run --all-files

EDIT: Also using Protocol at all is only available in Python 3.8+

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6086/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
936045730 MDExOlB1bGxSZXF1ZXN0NjgyODYzMjgz 5568 Add to_numpy() and as_numpy() methods TomNicholas 35968931 closed 0     9 2021-07-02T20:17:40Z 2021-07-21T22:06:47Z 2021-07-21T21:42:48Z MEMBER   0 pydata/xarray/pulls/5568
  • [x] Closes #3245
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5568/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 2482.217ms · About: xarray-datasette