home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "CONTRIBUTOR" and issue = 807764700 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • rhkleijn 7

issue 1

  • add typing to unary and binary arithmetic operators · 7 ✖

author_association 1

  • CONTRIBUTOR · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
813277878 https://github.com/pydata/xarray/pull/4904#issuecomment-813277878 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDgxMzI3Nzg3OA== rhkleijn 32801740 2021-04-05T08:41:24Z 2021-04-05T08:41:24Z CONTRIBUTOR

Thanks, there was one more minor merge artifact. Should be fixed now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
799648923 https://github.com/pydata/xarray/pull/4904#issuecomment-799648923 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDc5OTY0ODkyMw== rhkleijn 32801740 2021-03-15T18:27:18Z 2021-03-15T19:01:07Z CONTRIBUTOR

I forgot to reply to one question of yours — let's use this PR, if you want to replace the current code with the gen_ops branch.

done

Out of interest, was there a reason we don't make a single .py file with the typing info too? (tbc, no need to redo!)

I have tried that but PyCharm would not handle the @overloads correctly in that case.

I still have one question — if we remove a line like this, does mypy not recognize it can use the method on a Dataset?

python class DataArrayOpsMixin: def __add__(self, other: T_Dataset) -> T_Dataset: ...

If we remove that line, it incorrectly infers the result of da + ds to be of type DataArray. Mypy does not recognize that it could fallback to the reflexive method on Dataset. Pylance does get it right.

With NoReturn mypy correctly errors in that case. Without NoReturn mypy incorrectly infers the result to be Variable.

That's surprising — I wonder where it's getting that information from. I would have expected that it wouldn't be able to find any compatible implementation of a GroupBy + Variable.

Agreed that this is surprising behaviour. I suspect mypy might have something like an implicit internal rule to fallback to the type of the first operand of a binary operator. That would be consistent with the behaviours of both of the above cases.

There seem to be quite some open issues here with no indication they currently are actively working on fixing them. I am not using PyCharm myself, but as it is widely used I would also expect quite some xarray users to be using it.

I would vote not to prioritize a tool if it's not compatible — this code will last for a long time, and I expect PyCharm will catch up to the language specs soon.

I can relate to that. On the other hand, there may also be a pragmatic reason to go the extra mile. If we don't it may result in users thinking xarray's newly added typing is to blame for unexpected type inferences and reporting issues here. When at some point in the future these tools indeed do catch up it is straightforward to adapt the script and regenerate.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
798793062 https://github.com/pydata/xarray/pull/4904#issuecomment-798793062 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDc5ODc5MzA2Mg== rhkleijn 32801740 2021-03-13T22:07:41Z 2021-03-13T22:07:41Z CONTRIBUTOR

The gen-ops branch looks really cool @rhkleijn .

thanks!

I have a few questions, but they don't need to hold back merging — if others agree — in particular on the .py and .pyi design — we could merge now and then iterate.

  • Above you mentioned: The mypy documentation states that if a directory contains both a .py and a .pyi file for the same module, the .pyi file takes precedence. It is my understanding that stub files work on a file by file basis and you cannot mix them for a single file.

    And here we have both a xarray/core/_typed_ops.pyi and xarray/core/_typed_ops.py. Does that mean the .pyi file is obscuring the .py file?

It takes precedence only during type checking as the source for static type information. The generated .py and .pyi files are by their construction in sync and since the .py file does not contain typing information nothing gets obscured here. At runtime the .pyi file is ignored.

  • Do you know the extent to which PyCharm not picking up the @overloads is because PyCharm hasn't implemented a PEP yet, vs. an issue with the proposed code? If PyCharm is going to catch up soon, I don't think we need to engineer around it.

There seem to be quite some open issues here with no indication they currently are actively working on fixing them. I am not using PyCharm myself, but as it is widely used I would also expect quite some xarray users to be using it.

  • Python typing has run ahead of me recently, so forgive the uninformed question, but do you know whether it's possible to define a parent class to be generic over a type, and avoid repeating methods like this for each Mixin class? python def __neg__(self: T_Dataset) -> T_Dataset: ... def __pos__(self: T_DataArray) -> T_DataArray: ... def __neg__(self: T_Variable) -> T_Variable: ... Or are there only a few of these that are common across the classes, such that it's not worth having a parent class for the Mixins?

For these unary ops with simple type hints if would probably work but for most other methods with multiple overloads the typing ecosystem is not yet mature enough. In this issue I have commented on it. I hope one day we will be able to refactor this into some generic parent classes (one for unary ops, one for binary (and reflexive) ops and one for inplace ops) from which we can inherit. Using a generic parent class only for the unary ops and not for the others would make the script more complicated. For now, I'd say this somewhat low-tech solution gets the job done.

  • Do we need to define methods for the DataArray's ops with a Dataset — e.g. here — or can we rely on Datasets implementation? I think that's how the runtime handles it.

Unfortunately we have to define them both ways for typing purposes. This comment in generate_ops.py contains some explanation: ```

Note: in some of the overloads below the return value in reality is NotImplemented,

which cannot accurately be expressed with type hints,e.g. Literal[NotImplemented]

or type(NotImplemented) are not allowed and NoReturn has a different meaning.

In such cases we are lending the type checkers a hand by specifying the return type

of the corresponding reflexive method on other which will be called instead.

``` Also here, my hope is that someday this will not be needed anymore.

  • Do we need to define these NoReturn methods? I would have thought that if there weren't a valid type signature we'd get an error anyway.

The NoReturn reflects the exception raised from the GroupBy._binary_op method here. It might help users trying to e.g. multiply a GroupBy object and a Variable. With NoReturn mypy correctly errors in that case. Without NoReturn mypy incorrectly infers the result to be Variable. It seems to be shortcoming of mypy, since Pylance does infer Any in that case (which doesn't rule out a NoReturn).

Using python to template the methods works pretty well! Better than I would have expected relative to a templating library like jinja.

I was also pleasantly surprised by it. And even black leaves the generated files unchanged.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
798771631 https://github.com/pydata/xarray/pull/4904#issuecomment-798771631 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDc5ODc3MTYzMQ== rhkleijn 32801740 2021-03-13T19:24:08Z 2021-03-13T19:25:00Z CONTRIBUTOR

You can now view the branch I mentioned in my previous message at https://github.com/pydata/xarray/compare/master...rhkleijn:gen-ops . It would get rid of the need for dynamically generating (injecting) unary and binary operators at runtime.

Note that its script generates both the module containing the arithmetic methods and its accompanying stub file. This setup turned out to be more robust than both:

  • the approach taken in this PR so far (dynamically adding the methods at runtime and generating only a stub file). PyCharm would be happy with the typing information but it turns out that it does complain about missing methods.
  • directly attaching the typing information to the methods in the generated module. I have tried it but PyCharm would not handle the @overloads correctly in that case.

In all cases mypy and pylance work nicely.

I prefer the rhkleijn:gen-ops branch over the rhkleijn:ops-typing branch on which this PR is based, but I am curious to hear your opinions and further guidance on it.

Also, I am not sure what to do with the rhkleijn:gen-ops branch. Should I open a new (draft) PR here from the rhkleijn:gen-ops branch or would it be better to merge it into rhkleijn:ops-typing so it would appear here in this PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
790995670 https://github.com/pydata/xarray/pull/4904#issuecomment-790995670 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDc5MDk5NTY3MA== rhkleijn 32801740 2021-03-04T22:39:14Z 2021-03-04T22:39:14Z CONTRIBUTOR

I vote for ScalarOrArray.

Done

Just FYI #4881 and #4878 are done

Locally it seems to work fine. Let's see what happens in CI now.

Let me know if there's anything from my end that would be helpful.

Not yet. I am working on generating the methods themselves on a local branch. It required changing all _binary_op, _unary_op and _inplace_binary_op methods from their current closure approach which wraps the operator to just exposing the inner function to be called by the generated operators. With mypy and VS Code/pylance/pyright it seems mostly fine but I have not yet come around to ironing out the wrinkles with PyCharm typing. Once I have done that I'll share it and come back to ask for your feedback and opinion on that approach.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
779404893 https://github.com/pydata/xarray/pull/4904#issuecomment-779404893 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDc3OTQwNDg5Mw== rhkleijn 32801740 2021-02-15T19:10:56Z 2021-02-15T19:10:56Z CONTRIBUTOR

Thanks for the PR & your work. I'll need some time to digest that all... One thing I certainly would like to fix before merging this is #4881 (and merge #4878).

I agree. Both PRs you mention would be nice to see merged. I was still using numpy 1.19.

I'd also be interested in your motivation for the stub files. (It's just that we don't have any of these so far).

I usually prefer inline type hints. The motivation for stub files was mostly because all other approaches I tried where not sufficient, see my comment in #4054. It turns out we might not even need a stub file, see my reply above to max-sixty's comments.

I am sure you saw this comment: ... so there would be an argument to create mixin-classes*. Another motivation would be that currently (with the inject approach) mypy does not know which methods DataArray actually has. This is kind of hidden by __getattr__ (see #4601 & #4616). Now I am not saying you have to do this here but also wanted to mention it.

  • Well I just realised that your stub files are kind of mixin classes...

That is correct. It can currently be considered a mixin class during type checking and a no-op at compile and runtime. If I can get max-sixty's suggestion to work, then we would quite literally be following up on that TODO by using mixin classes instead of the unintuitive "inject" functions.

What are all the # type: ignore[misc] you are suppressing?

Those are mypy warning for overlapping overload signatures with incompatible return types. In practice it is not a problem as the first matching overload gets picked. One reason for overlapping is that with numpy <=1.20 numpy.ndarray is typed as Any, which makes it overlap with other overload signatures. Another one is due to mypy checking the consistency of binops and their reflexive counterparts.

I was slightly confused by the T_ of T_Other etc because this is not a TypeVar (but an alias) so I'd prefer another name.

Agreed. What name would work in this case? Something like OtherType or just Other (or ScalarOrArray)? Anything else?

For T_Dataset = TypeVar("T_Dataset", bound=Dataset) etc - why do you use a bound? To allow subclassing? As we discourage subclassing we may have to discuss if we should allow this.

I think it would be nice to allow some simple kinds of subclassing (some (many?) methods already preserve the subclass in the return value but the type hint mentions the base class as return type). The particular use case I have in mind involves having a custom Accessor registered with xr.register_dataarray_accessor. Currently I have found no way to add meaningful type hints for these dynamically added attributes on the DataArray or Dataset classes, and I always wish I had signature information and type hints available when working on code using the accessor. One way this could be achieved is by using a simple wrapper class during type checking which includes a type declaration to signal both the existence and the type of the accessor, e.g. something along the lines of

``` import xarray as xr

if TYPE_CHECKING: class Dataset(xr.Dataset): myaccessor: MyAccessor else: Dataset = xr.Dataset # note that at runtime we just work with xr.Dataset

ds = Dataset(...) ds2 = (ds + 1).sel(x=1) # some operations (ideally preserving subclass!)

In the line below I would like my IDE to be able to show me the docstrings,

methods signatures and typing information for ds2.myaccessor and for

ds2.myaccessor.method and the inferred type of result

result = ds2.myaccessor.method()
```

I will also bring this use case up in #3980. Until that is resolved, I am open to here use either bound TypeVars or just "Dataset" and "DataArray".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
779295725 https://github.com/pydata/xarray/pull/4904#issuecomment-779295725 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDc3OTI5NTcyNQ== rhkleijn 32801740 2021-02-15T15:30:51Z 2021-02-15T15:30:51Z CONTRIBUTOR

Did you ever consider replacing the injections with this macro-like approach to writing out the python methods, rather than using the macro to generate the stub files? (I don't bring this up with a view that it's better, but the injections have always seemed awkward)

I did not consider it but that is a very promising idea. It is probably feasible to adapt the script to generate the methods themselves together with their type hints into a regular Python module instead of a stub file. I will give it a try this week.

It would have the same disadvantage of needing the script for generating the file. There are several advantages: the stub file approach which may feel a bit odd would not even be necessary, it could replace the current injection logic, implementation and type hints would be in a single place and it could simplify the class hierarchy a bit.

Is there anything we should be aware of around the interactions between stub files and inline definitions?

The mypy documentation states that if a directory contains both a .py and a .pyi file for the same module, the .pyi file takes precedence. It is my understanding that stub files work on a file by file basis and you cannot mix them for a single file. There should be no interactions here, as a _typed_ops.py module doesn't even exist in this case, only the _typed_ops.pyi stub file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 11.737ms · About: xarray-datasette