home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

25 rows where user = 32801740 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 15

  • add typing to unary and binary arithmetic operators 7
  • astype method lost its order parameter 4
  • New deep copy behavior in 2022.9.0 causes maximum recursion error 2
  • How should xarray serialize bytes/unicode strings across Python/netCDF versions? 1
  • Changing dtype on v0.13.0 causes Dataset attributes to be lost 1
  • Type checking fails for multiplication 1
  • Attribute style access is slow 1
  • bump the dependencies 1
  • Add unique method 1
  • NameError: name '_DType_co' is not defined 1
  • improve typing of DataArray and Dataset reductions 1
  • Switch to T_DataArray and T_Dataset in concat 1
  • xarray.concat for datasets fails when object is a child class of Dataset 1
  • ⚠️ Nightly upstream-dev CI failed ⚠️ 1
  • .sel return errors when using floats for no apparent reason 1

user 1

  • rhkleijn · 25 ✖

author_association 1

  • CONTRIBUTOR 25
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1264322604 https://github.com/pydata/xarray/issues/7111#issuecomment-1264322604 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXAgs rhkleijn 32801740 2022-10-01T10:40:48Z 2022-10-01T10:40:48Z CONTRIBUTOR

To avoid code duplication you may consider moving all logic from the copy methods to new _copy methods and extending that with an optional memo argument and have the copy, __copy__ and __deepcopy__ methods as thin wrappers around it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264014472 https://github.com/pydata/xarray/issues/7111#issuecomment-1264014472 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LV1SI rhkleijn 32801740 2022-09-30T20:52:28Z 2022-09-30T20:54:25Z CONTRIBUTOR

Is there some feature that python uses to check whether a data structure is recursive when it's copying, which we're not taking advantage of? I can look more later.

yes, def __deepcopy__(self, memo=None) has the memo argument exactly for the purpose of dealing with recursion, see https://docs.python.org/3/library/copy.html. Currently, xarray's __deepcopy__ methods do not pass on the memo argument when deepcopying its components.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1263689036 https://github.com/pydata/xarray/issues/7108#issuecomment-1263689036 https://api.github.com/repos/pydata/xarray/issues/7108 IC_kwDOAMm_X85LUl1M rhkleijn 32801740 2022-09-30T15:01:42Z 2022-09-30T15:01:42Z CONTRIBUTOR

Pandas docs seem stricter than the implementation. From this snippet from pandas source code monotonicity is only required after get_loc fails.

My concern with checking first is that code like below will stop working (if I understand correctly). Is has unique but (alphabetically) unsorted coords (although its order may have meaning for the user). I regularly select a slice by specifying the labels corresponding to the first and last elements I want to extract.

I would suggest in this case to just try while catching any KeyError and raising with a nicer message instead of always checking first.

import xarray as xr da = xr.DataArray([0, 1, 2, 3], coords={'x': ['zero', 'one', 'two', 'three']}) da.sel(x=slice('zero', 'two')) Out[1]: <xarray.DataArray (x: 3)> array([0, 1, 2]) Coordinates: * x (x) <U5 'zero' 'one' 'two'

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  .sel return errors when using floats for no apparent reason 1391699976
1263288581 https://github.com/pydata/xarray/issues/6941#issuecomment-1263288581 https://api.github.com/repos/pydata/xarray/issues/6941 IC_kwDOAMm_X85LTEEF rhkleijn 32801740 2022-09-30T08:43:47Z 2022-09-30T08:50:06Z CONTRIBUTOR

The last run reports (development) version 2022.9.1.dev3+g341f1302 of xarray (derived from the most recent tag 2022.9.0).

I assume you have seen version 0.18.x in your personal fork. I have seen similar problems where version 0.16.x of xarray was shown. The culprit seems to be that the github button for syncing your fork with the upstream repo doesn't update tags. It only contains the tags from back when you created the fork. I updated the tags manually using something like git fetch --tags upstream git push --tags origin which did the trick for me.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ⚠️ Nightly upstream-dev CI failed ⚠️ 1345644028
1195752253 https://github.com/pydata/xarray/issues/6827#issuecomment-1195752253 https://api.github.com/repos/pydata/xarray/issues/6827 IC_kwDOAMm_X85HRbs9 rhkleijn 32801740 2022-07-26T17:10:02Z 2022-07-26T18:15:38Z CONTRIBUTOR

In the example the naming and meaning of the arguments of the __init__ method are different from the Dataset base class.

Subclassing is often more reliable when it adheres to the Liskov Substitution Principle which states that objects of a superclass should be replaceable with objects of its subclasses and in other words: a subclass should only add functionality and not reduce it.

There are some other workarounds though:

If all you need is a convenient custom constructor then you could add a classmethod factory method to your subclass or even just use a plain function which returns a Dataset.

Another option is to convert your MyDataset instances to plain Dataset instances before passing them into xarray.concat, e.g. xarray.concat([ds.to_dataset() for ds in [ds1, ds2, ds3]], dim="coord1")

where your subclass may define as a convenience method: ``` def to_dataset(self): """Convert to plain Dataset.""" return xarray.Dataset(self, attrs=self.attrs) ````

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.concat for datasets fails when object is a child class of Dataset 1318173644
1186138098 https://github.com/pydata/xarray/pull/6784#issuecomment-1186138098 https://api.github.com/repos/pydata/xarray/issues/6784 IC_kwDOAMm_X85Gswfy rhkleijn 32801740 2022-07-16T09:58:05Z 2022-07-16T09:58:05Z CONTRIBUTOR

Is there any trick to avoid this or should I just close this for now?

Casting arr.rename(name) to T_DataArray using typing.cast might work until the mypy bug is fixed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Switch to T_DataArray and T_Dataset in concat 1305351232
1173044870 https://github.com/pydata/xarray/pull/6746#issuecomment-1173044870 https://api.github.com/repos/pydata/xarray/issues/6746 IC_kwDOAMm_X85F6z6G rhkleijn 32801740 2022-07-03T09:19:36Z 2022-07-03T09:19:36Z CONTRIBUTOR

Agreed, I just reached the same conclusion.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  improve typing of DataArray and Dataset reductions 1292142108
885867225 https://github.com/pydata/xarray/issues/5631#issuecomment-885867225 https://api.github.com/repos/pydata/xarray/issues/5631 IC_kwDOAMm_X840zULZ rhkleijn 32801740 2021-07-23T19:27:45Z 2021-07-23T19:35:44Z CONTRIBUTOR

From https://github.com/numpy/numpy/issues/19521: installing typing-extensions may help when running on python 3.7.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  NameError: name '_DType_co' is not defined 951644054
813277878 https://github.com/pydata/xarray/pull/4904#issuecomment-813277878 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDgxMzI3Nzg3OA== rhkleijn 32801740 2021-04-05T08:41:24Z 2021-04-05T08:41:24Z CONTRIBUTOR

Thanks, there was one more minor merge artifact. Should be fixed now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
811306431 https://github.com/pydata/xarray/pull/5091#issuecomment-811306431 https://api.github.com/repos/pydata/xarray/issues/5091 MDEyOklzc3VlQ29tbWVudDgxMTMwNjQzMQ== rhkleijn 32801740 2021-03-31T18:20:30Z 2021-03-31T21:40:58Z CONTRIBUTOR

We don't have precedent for methods that return a numpy array. But it's unclear what xarray object this would return.

There is a precedent in xarray (it might be the only one):

In [2]: xr.DataArray([1,2,3,4,5]).searchsorted([-10, 10, 2, 3]) Out[2]: array([0, 5, 1, 2])

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add unique method 843961481
799648923 https://github.com/pydata/xarray/pull/4904#issuecomment-799648923 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDc5OTY0ODkyMw== rhkleijn 32801740 2021-03-15T18:27:18Z 2021-03-15T19:01:07Z CONTRIBUTOR

I forgot to reply to one question of yours — let's use this PR, if you want to replace the current code with the gen_ops branch.

done

Out of interest, was there a reason we don't make a single .py file with the typing info too? (tbc, no need to redo!)

I have tried that but PyCharm would not handle the @overloads correctly in that case.

I still have one question — if we remove a line like this, does mypy not recognize it can use the method on a Dataset?

python class DataArrayOpsMixin: def __add__(self, other: T_Dataset) -> T_Dataset: ...

If we remove that line, it incorrectly infers the result of da + ds to be of type DataArray. Mypy does not recognize that it could fallback to the reflexive method on Dataset. Pylance does get it right.

With NoReturn mypy correctly errors in that case. Without NoReturn mypy incorrectly infers the result to be Variable.

That's surprising — I wonder where it's getting that information from. I would have expected that it wouldn't be able to find any compatible implementation of a GroupBy + Variable.

Agreed that this is surprising behaviour. I suspect mypy might have something like an implicit internal rule to fallback to the type of the first operand of a binary operator. That would be consistent with the behaviours of both of the above cases.

There seem to be quite some open issues here with no indication they currently are actively working on fixing them. I am not using PyCharm myself, but as it is widely used I would also expect quite some xarray users to be using it.

I would vote not to prioritize a tool if it's not compatible — this code will last for a long time, and I expect PyCharm will catch up to the language specs soon.

I can relate to that. On the other hand, there may also be a pragmatic reason to go the extra mile. If we don't it may result in users thinking xarray's newly added typing is to blame for unexpected type inferences and reporting issues here. When at some point in the future these tools indeed do catch up it is straightforward to adapt the script and regenerate.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
798793062 https://github.com/pydata/xarray/pull/4904#issuecomment-798793062 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDc5ODc5MzA2Mg== rhkleijn 32801740 2021-03-13T22:07:41Z 2021-03-13T22:07:41Z CONTRIBUTOR

The gen-ops branch looks really cool @rhkleijn .

thanks!

I have a few questions, but they don't need to hold back merging — if others agree — in particular on the .py and .pyi design — we could merge now and then iterate.

  • Above you mentioned: The mypy documentation states that if a directory contains both a .py and a .pyi file for the same module, the .pyi file takes precedence. It is my understanding that stub files work on a file by file basis and you cannot mix them for a single file.

    And here we have both a xarray/core/_typed_ops.pyi and xarray/core/_typed_ops.py. Does that mean the .pyi file is obscuring the .py file?

It takes precedence only during type checking as the source for static type information. The generated .py and .pyi files are by their construction in sync and since the .py file does not contain typing information nothing gets obscured here. At runtime the .pyi file is ignored.

  • Do you know the extent to which PyCharm not picking up the @overloads is because PyCharm hasn't implemented a PEP yet, vs. an issue with the proposed code? If PyCharm is going to catch up soon, I don't think we need to engineer around it.

There seem to be quite some open issues here with no indication they currently are actively working on fixing them. I am not using PyCharm myself, but as it is widely used I would also expect quite some xarray users to be using it.

  • Python typing has run ahead of me recently, so forgive the uninformed question, but do you know whether it's possible to define a parent class to be generic over a type, and avoid repeating methods like this for each Mixin class? python def __neg__(self: T_Dataset) -> T_Dataset: ... def __pos__(self: T_DataArray) -> T_DataArray: ... def __neg__(self: T_Variable) -> T_Variable: ... Or are there only a few of these that are common across the classes, such that it's not worth having a parent class for the Mixins?

For these unary ops with simple type hints if would probably work but for most other methods with multiple overloads the typing ecosystem is not yet mature enough. In this issue I have commented on it. I hope one day we will be able to refactor this into some generic parent classes (one for unary ops, one for binary (and reflexive) ops and one for inplace ops) from which we can inherit. Using a generic parent class only for the unary ops and not for the others would make the script more complicated. For now, I'd say this somewhat low-tech solution gets the job done.

  • Do we need to define methods for the DataArray's ops with a Dataset — e.g. here — or can we rely on Datasets implementation? I think that's how the runtime handles it.

Unfortunately we have to define them both ways for typing purposes. This comment in generate_ops.py contains some explanation: ```

Note: in some of the overloads below the return value in reality is NotImplemented,

which cannot accurately be expressed with type hints,e.g. Literal[NotImplemented]

or type(NotImplemented) are not allowed and NoReturn has a different meaning.

In such cases we are lending the type checkers a hand by specifying the return type

of the corresponding reflexive method on other which will be called instead.

``` Also here, my hope is that someday this will not be needed anymore.

  • Do we need to define these NoReturn methods? I would have thought that if there weren't a valid type signature we'd get an error anyway.

The NoReturn reflects the exception raised from the GroupBy._binary_op method here. It might help users trying to e.g. multiply a GroupBy object and a Variable. With NoReturn mypy correctly errors in that case. Without NoReturn mypy incorrectly infers the result to be Variable. It seems to be shortcoming of mypy, since Pylance does infer Any in that case (which doesn't rule out a NoReturn).

Using python to template the methods works pretty well! Better than I would have expected relative to a templating library like jinja.

I was also pleasantly surprised by it. And even black leaves the generated files unchanged.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
798771631 https://github.com/pydata/xarray/pull/4904#issuecomment-798771631 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDc5ODc3MTYzMQ== rhkleijn 32801740 2021-03-13T19:24:08Z 2021-03-13T19:25:00Z CONTRIBUTOR

You can now view the branch I mentioned in my previous message at https://github.com/pydata/xarray/compare/master...rhkleijn:gen-ops . It would get rid of the need for dynamically generating (injecting) unary and binary operators at runtime.

Note that its script generates both the module containing the arithmetic methods and its accompanying stub file. This setup turned out to be more robust than both:

  • the approach taken in this PR so far (dynamically adding the methods at runtime and generating only a stub file). PyCharm would be happy with the typing information but it turns out that it does complain about missing methods.
  • directly attaching the typing information to the methods in the generated module. I have tried it but PyCharm would not handle the @overloads correctly in that case.

In all cases mypy and pylance work nicely.

I prefer the rhkleijn:gen-ops branch over the rhkleijn:ops-typing branch on which this PR is based, but I am curious to hear your opinions and further guidance on it.

Also, I am not sure what to do with the rhkleijn:gen-ops branch. Should I open a new (draft) PR here from the rhkleijn:gen-ops branch or would it be better to merge it into rhkleijn:ops-typing so it would appear here in this PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
790995670 https://github.com/pydata/xarray/pull/4904#issuecomment-790995670 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDc5MDk5NTY3MA== rhkleijn 32801740 2021-03-04T22:39:14Z 2021-03-04T22:39:14Z CONTRIBUTOR

I vote for ScalarOrArray.

Done

Just FYI #4881 and #4878 are done

Locally it seems to work fine. Let's see what happens in CI now.

Let me know if there's anything from my end that would be helpful.

Not yet. I am working on generating the methods themselves on a local branch. It required changing all _binary_op, _unary_op and _inplace_binary_op methods from their current closure approach which wraps the operator to just exposing the inner function to be called by the generated operators. With mypy and VS Code/pylance/pyright it seems mostly fine but I have not yet come around to ironing out the wrinkles with PyCharm typing. Once I have done that I'll share it and come back to ask for your feedback and opinion on that approach.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
783507609 https://github.com/pydata/xarray/pull/4942#issuecomment-783507609 https://api.github.com/repos/pydata/xarray/issues/4942 MDEyOklzc3VlQ29tbWVudDc4MzUwNzYwOQ== rhkleijn 32801740 2021-02-22T16:41:44Z 2021-02-22T16:41:44Z CONTRIBUTOR

If the minimum version of numpy is increased to 1.17 you may want to change the PendingDeprecationWarning in ufuncs.py to DeprecationWarning: https://github.com/pydata/xarray/blob/200c2b2df28bd477dbec863ac0837b901535c955/xarray/ufuncs.py#L47-L53

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  bump the dependencies 813654286
779404893 https://github.com/pydata/xarray/pull/4904#issuecomment-779404893 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDc3OTQwNDg5Mw== rhkleijn 32801740 2021-02-15T19:10:56Z 2021-02-15T19:10:56Z CONTRIBUTOR

Thanks for the PR & your work. I'll need some time to digest that all... One thing I certainly would like to fix before merging this is #4881 (and merge #4878).

I agree. Both PRs you mention would be nice to see merged. I was still using numpy 1.19.

I'd also be interested in your motivation for the stub files. (It's just that we don't have any of these so far).

I usually prefer inline type hints. The motivation for stub files was mostly because all other approaches I tried where not sufficient, see my comment in #4054. It turns out we might not even need a stub file, see my reply above to max-sixty's comments.

I am sure you saw this comment: ... so there would be an argument to create mixin-classes*. Another motivation would be that currently (with the inject approach) mypy does not know which methods DataArray actually has. This is kind of hidden by __getattr__ (see #4601 & #4616). Now I am not saying you have to do this here but also wanted to mention it.

  • Well I just realised that your stub files are kind of mixin classes...

That is correct. It can currently be considered a mixin class during type checking and a no-op at compile and runtime. If I can get max-sixty's suggestion to work, then we would quite literally be following up on that TODO by using mixin classes instead of the unintuitive "inject" functions.

What are all the # type: ignore[misc] you are suppressing?

Those are mypy warning for overlapping overload signatures with incompatible return types. In practice it is not a problem as the first matching overload gets picked. One reason for overlapping is that with numpy <=1.20 numpy.ndarray is typed as Any, which makes it overlap with other overload signatures. Another one is due to mypy checking the consistency of binops and their reflexive counterparts.

I was slightly confused by the T_ of T_Other etc because this is not a TypeVar (but an alias) so I'd prefer another name.

Agreed. What name would work in this case? Something like OtherType or just Other (or ScalarOrArray)? Anything else?

For T_Dataset = TypeVar("T_Dataset", bound=Dataset) etc - why do you use a bound? To allow subclassing? As we discourage subclassing we may have to discuss if we should allow this.

I think it would be nice to allow some simple kinds of subclassing (some (many?) methods already preserve the subclass in the return value but the type hint mentions the base class as return type). The particular use case I have in mind involves having a custom Accessor registered with xr.register_dataarray_accessor. Currently I have found no way to add meaningful type hints for these dynamically added attributes on the DataArray or Dataset classes, and I always wish I had signature information and type hints available when working on code using the accessor. One way this could be achieved is by using a simple wrapper class during type checking which includes a type declaration to signal both the existence and the type of the accessor, e.g. something along the lines of

``` import xarray as xr

if TYPE_CHECKING: class Dataset(xr.Dataset): myaccessor: MyAccessor else: Dataset = xr.Dataset # note that at runtime we just work with xr.Dataset

ds = Dataset(...) ds2 = (ds + 1).sel(x=1) # some operations (ideally preserving subclass!)

In the line below I would like my IDE to be able to show me the docstrings,

methods signatures and typing information for ds2.myaccessor and for

ds2.myaccessor.method and the inferred type of result

result = ds2.myaccessor.method()
```

I will also bring this use case up in #3980. Until that is resolved, I am open to here use either bound TypeVars or just "Dataset" and "DataArray".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
779295725 https://github.com/pydata/xarray/pull/4904#issuecomment-779295725 https://api.github.com/repos/pydata/xarray/issues/4904 MDEyOklzc3VlQ29tbWVudDc3OTI5NTcyNQ== rhkleijn 32801740 2021-02-15T15:30:51Z 2021-02-15T15:30:51Z CONTRIBUTOR

Did you ever consider replacing the injections with this macro-like approach to writing out the python methods, rather than using the macro to generate the stub files? (I don't bring this up with a view that it's better, but the injections have always seemed awkward)

I did not consider it but that is a very promising idea. It is probably feasible to adapt the script to generate the methods themselves together with their type hints into a regular Python module instead of a stub file. I will give it a try this week.

It would have the same disadvantage of needing the script for generating the file. There are several advantages: the stub file approach which may feel a bit odd would not even be necessary, it could replace the current injection logic, implementation and type hints would be in a single place and it could simplify the class hierarchy a bit.

Is there anything we should be aware of around the interactions between stub files and inline definitions?

The mypy documentation states that if a directory contains both a .py and a .pyi file for the same module, the .pyi file takes precedence. It is my understanding that stub files work on a file by file basis and you cannot mix them for a single file. There should be no interactions here, as a _typed_ops.py module doesn't even exist in this case, only the _typed_ops.pyi stub file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add typing to unary and binary arithmetic operators 807764700
778628502 https://github.com/pydata/xarray/issues/4054#issuecomment-778628502 https://api.github.com/repos/pydata/xarray/issues/4054 MDEyOklzc3VlQ29tbWVudDc3ODYyODUwMg== rhkleijn 32801740 2021-02-13T14:40:51Z 2021-02-13T14:40:51Z CONTRIBUTOR

I have implemented a possible solution to have typing for all unary and binary operators which works for mypy and also displays correctly inferred type hints on hovering over the code in Pylance (with VS Code) and PyCharm. The approach involves generating a stub file using a simple helper script. I'll open a draft PR so you can see this approach and for soliciting feedback.

I have been experimenting with all kinds of other approaches to solve this issue but this was the only one with a satisfactory result. For reference, and jotting down from memory, I think I tried the following approaches without much success:

  • add the typing information to the not_implemented function in https://github.com/pydata/xarray/blob/971bad7c21551f297d3e1e24e3779488ea1b9565/xarray/core/arithmetic.py#L83-L105
  • adding type declarations in a Generic mixin class parameterized on the return type of the methods. This doesn't handle dependencies of the return type on the type of the second argument of binary operators.
  • adding type declarations in a Generic mixin class parameterized on a Callback Protocol (PEP 544). This is too much magic for typing tools to grasp.
  • adding type declarations directly using a Callback Protocol. This seems to works fine for regular function but for methods the type of self of the class implementing the Protocol gets confused with the type of self in the Protocol itself.
  • using an overloaded abstract method for one operator (and then assigning the method to the dunders of all other operators to avoid a lot of repetition) results in mypy errors because it does not recognize that the concrete implementations are added dynamically. Making them concrete and returning NotImplemented doesn't work as type checkers are not sufficiently aware of the fallback logic of calling the corresponding reflexive method on the other argument of the binary operation. And raising causes some type checkers to think NoReturn is the intended return type.
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Type checking fails for multiplication 617140674
752674453 https://github.com/pydata/xarray/issues/4741#issuecomment-752674453 https://api.github.com/repos/pydata/xarray/issues/4741 MDEyOklzc3VlQ29tbWVudDc1MjY3NDQ1Mw== rhkleijn 32801740 2020-12-30T16:06:45Z 2020-12-30T16:06:45Z CONTRIBUTOR

With some modest refactoring in https://github.com/rhkleijn/xarray/tree/faster-attr-access I managed to speed up attribute style access, dir() and _ipython_key_completions_ (in this case ~100 fold) by using a more lazy approach and especially avoiding the eager {d: self[d] for d in self.dims} which constructs many (mostly unneeded) DataArray objects.

``` %timeit ds.var0 468 µs ± 1.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit dir(ds) 499 µs ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit ds.ipython_key_completions() 242 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ```

Shall I open a pull request for this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Attribute style access is slow 776520994
750929476 https://github.com/pydata/xarray/issues/3348#issuecomment-750929476 https://api.github.com/repos/pydata/xarray/issues/3348 MDEyOklzc3VlQ29tbWVudDc1MDkyOTQ3Ng== rhkleijn 32801740 2020-12-24T17:12:15Z 2020-12-24T17:12:15Z CONTRIBUTOR

Dataset.astype() preserves attributes by default since xarray 0.16.1 with #4314. Running the scripts above with 0.16.2 I get the output below, which confirms attributes are now preserved. Can this issue be closed?

Output 1 <xarray.Dataset> Dimensions: (location: 3, time: 731) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31 * location (location) <U2 'IA' 'IN' 'IL' Data variables: tmin (time, location) float32 -8.03737 -1.7884412 ... -4.543927 tmax (time, location) float32 12.980549 3.3104093 ... 3.8052793 Attributes: CRS: EPSG:4326

Output 2 ``` Original xr.DataArray attrs: {'at_str': 'at1_value', 'at_float': 123.0, 'at_int': 123} dtype: int64

After casting to int64: {'at_str': 'at1_value', 'at_float': 123.0, 'at_int': 123} After casting to int32: {'at_str': 'at1_value', 'at_float': 123.0, 'at_int': 123} After casting to float32: {'at_str': 'at1_value', 'at_float': 123.0, 'at_int': 123}

Changing the np.array attrs: {'at_str': 'at1_value', 'at_float': 123.0, 'at_int': 123} dtype: float32 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Changing dtype on v0.13.0 causes Dataset attributes to be lost 499196320
743129921 https://github.com/pydata/xarray/issues/4644#issuecomment-743129921 https://api.github.com/repos/pydata/xarray/issues/4644 MDEyOklzc3VlQ29tbWVudDc0MzEyOTkyMQ== rhkleijn 32801740 2020-12-11T11:05:04Z 2020-12-11T11:08:57Z CONTRIBUTOR

Working on a unit test for the subok argument I found no way for creating a DataArray backed by a subclass of np.ndarray (even with NEP 18 enabled).

The DataArray constructor calls xr.core.variable.as_compatible_data. This function:

  • converts np.ma instances to np.ndarray with fill values applied.
  • the NEP 18 specific branch which would keep the supplied data argument (and its type) is not reachable for (subclasses of) np.ndarray
  • it converts these subclasses to np.ndarray.

Is this behaviour intentional?

What to do with astype? If the behaviour is not intentional we could add the subok unit test but marked as xfail until that is fixed. If it is intentional we could omit the subok argument here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  astype method lost its order parameter 755607271
743129841 https://github.com/pydata/xarray/issues/4644#issuecomment-743129841 https://api.github.com/repos/pydata/xarray/issues/4644 MDEyOklzc3VlQ29tbWVudDc0MzEyOTg0MQ== rhkleijn 32801740 2020-12-11T11:04:53Z 2020-12-11T11:04:53Z CONTRIBUTOR

I will also include xarray's recently added keep_attrs argument. Since the signature has been in flux both in version 0.16.1 and now again I intend to make all arguments following dtype keyword-only in order to avoid introducing unnoticed bugs when user code calls it with positional arguments.

def astype(self, dtype, *, order=None, casting=None, subok=None, copy=None, keep_attrs=True)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  astype method lost its order parameter 755607271
738946975 https://github.com/pydata/xarray/issues/4644#issuecomment-738946975 https://api.github.com/repos/pydata/xarray/issues/4644 MDEyOklzc3VlQ29tbWVudDczODk0Njk3NQ== rhkleijn 32801740 2020-12-04T18:32:46Z 2020-12-04T18:32:46Z CONTRIBUTOR

That is a nicer solution. Never contributed a PR before, but I'll try to work on this next week. It looks like a good first issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  astype method lost its order parameter 755607271
737899772 https://github.com/pydata/xarray/issues/4644#issuecomment-737899772 https://api.github.com/repos/pydata/xarray/issues/4644 MDEyOklzc3VlQ29tbWVudDczNzg5OTc3Mg== rhkleijn 32801740 2020-12-03T12:03:42Z 2020-12-03T12:03:42Z CONTRIBUTOR

IIUC before PR #4314 numpy.ndarray.astype was used and the order parameter was just part of that. Looking through PR #4314 it seems that the 'casting' parameter required some special casing in duck_array_ops.py since not all duck arrays did support it (in particular sparse arrays only gained it very recently). In the current case the order parameter is not supported at all for sparse arrays.

I am thinking it might not even be worthwhile to add support for an order parameter similar to the numpy.ndarray.astype method:

  • It would add more special casing logic to xarray.
  • To the user it would result in either noisy warnings or silently dropping parameters.
  • To me, the kind of contiguousness seems more like an implementation detail of the array type of the underlying data than as a property of the encapsulating xarray object.
  • For my use case (with numpy arrays as the underlying data) I can quite easily work around this issue by using something like da.copy(data=np.asfortranarray(da.data)) for the example above (or np.ascontiguousarray for C contiguousness).

Another option might be to allow arbitrary **kwargs which will be passed through as-is to the astype method of the underlying array and making it the responsibility of the user to only supply parameters which make sense for that particular array type.

What are your thoughts? Does xarray have some kind of policy for supporting parameters which might not make sense for all types of duck arrays?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  astype method lost its order parameter 755607271
412095066 https://github.com/pydata/xarray/issues/2059#issuecomment-412095066 https://api.github.com/repos/pydata/xarray/issues/2059 MDEyOklzc3VlQ29tbWVudDQxMjA5NTA2Ng== rhkleijn 32801740 2018-08-10T14:14:12Z 2018-08-10T14:14:12Z CONTRIBUTOR

Currently, the dtype does not seem to roundtrip faithfully. When I write np.unicode_ / str to file, it gets transformed to object when I subsequently read it from disk. I am using xarray 0.10.8 with Python 3 on Windows.

This can be reproduced by inserting the following line in the script above (and adjusting the print statement accordingly)

python with xr.open_dataset(filename) as ds: read_dtype = ds['data'].dtype which gives:

Python version | NetCDF version | NumPy datatype | NetCDF datatype|Numpy datatype (read) -- | -- | -- | --|-- | Python 3 | NETCDF3 | np.string_ / bytes | NC_CHAR | \|S3 | | Python 3 | NETCDF4 | np.string_ / bytes | NC_CHAR | \|S3 | | Python 3 | NETCDF3 | np.unicode_ / str | NC_CHAR with UTF-8 encoding | object | | Python 3 | NETCDF4 | np.unicode_ / str | NC_STRING | object | | Python 3 | NETCDF3 | object bytes/bytes | NC_CHAR | \|S3 | | Python 3 | NETCDF4 | object bytes/bytes | NC_CHAR | \|S3 | | Python 3 | NETCDF3 | object unicode/str | NC_CHAR with UTF-8 encoding | object | | Python 3 | NETCDF4 | object unicode/str | NC_STRING | object |

Also object bytes/bytes seems not to roundtrip nicely as it seems to be converted to np.string_ / bytes.

Is it possible to preserve dtype when persisting xarray Datasets/DataArrays to disk?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray serialize bytes/unicode strings across Python/netCDF versions? 314444743

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 18.833ms · About: xarray-datasette