html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7721#issuecomment-1516494141,https://api.github.com/repos/pydata/xarray/issues/7721,1516494141,IC_kwDOAMm_X85aY909,98330,2023-04-20T15:04:17Z,2023-04-20T15:04:17Z,NONE,"> So really, my question is: how do we support python scalars for libraries that only implement `__array_namespace__`, given that stopping to do so would be a major breaking change?

I was considering this question for SciPy (xref [scipy#18286](https://github.com/scipy/scipy/issues/18286)) this week, and I think I'm happy with this strategy:
1. Cast all ""array-like"" inputs like Python scalars, lists/sequences, and generators, to `numpy.ndarray`. 
2. Require ""same array type"" input, forbid mixing numpy-cupy, numpy-pytorch, cupy-pytorch, etc. - this will raise an exception
3. As a result, cupy-pyscalar and pytorch-pyscalar will _also_ raise an exception.

What that results in is an API that's backwards-compatible for numpy and array-like usage, and much stricter when using other array libraries. That strictness to me is a good thing, because:
- that's what CuPy, PyTorch & co themselves do, and it works well there
- it avoids the complexity raised by arbitrary mixing, which results in questions like the one raised in this issue.
  - in case you do need to use a scalar from within a function inside your own library, just convert it explicitly to the desired array type with `xp.asarray(a_scalar)` giving you a 0-D array of the correct type (add `dtype=x.dtype` to make sure dtypes match if that matters)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1655290694
https://github.com/pydata/xarray/issues/5648#issuecomment-925025457,https://api.github.com/repos/pydata/xarray/issues/5648,925025457,IC_kwDOAMm_X843IsSx,98330,2021-09-22T15:12:14Z,2021-09-22T19:25:49Z,NONE,"There are also some relevant and very interesting PyTorch development discussions; there are a lot of Tensor subclasses in the making and thought being put into how those interact:
- https://dev-discuss.pytorch.org/t/state-of-pytorch-core-september-2021-edition/332#alternative-tensors-5
- https://pytorch-dev-podcast.simplecast.com/episodes/tensor-subclasses-and-liskov-substitution-principle
- https://dev-discuss.pytorch.org/t/functorch-levels-as-dynamically-allocated-classes/294","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,956103236
https://github.com/pydata/xarray/issues/5648#issuecomment-890310954,https://api.github.com/repos/pydata/xarray/issues/5648,890310954,IC_kwDOAMm_X841EREq,98330,2021-07-31T08:24:02Z,2021-07-31T08:24:02Z,NONE,"> Interesting - could you say a bit more? Looking at these two issues, it seemed more like the question was simply on hold until someone who wanted it badly enough came along?

There is a significant backwards compatibility break when a library adds `__array_ufunc__` and `__array_function__`. JAX maintainers were not comfortable with that. @shoyer wrote https://numpy.org/neps/nep-0037-array-module.html as a follow-up largely because of that. That NEP is effectively superceded by the array API standard (https://data-apis.org/array-api/latest/ and NEP 47). PyTorch has decided to adopt that and implementation is well underway. Experimental support in NumPy will land next week (complete except for `linalg`). CuPy and MXNet plan to follow that NumPy implementation. JAX and Dask not yet decided I believe, but likely to follow NumPy as well.

> Canonical/minimal API of a ""duck array"" and how to detect it (though may be superseded by NEPs 30 and 47 among others)

This is basically what the array API standard provides (most functions follow NumPy, with deviations mostly where other libraries were already deviating because they could implement something on GPU, with a JIT, or with a non-strided memory model). `__array_function__` has worked quite well for CuPy and Dask, because they follow the NumPy API almost to the letter, with only a couple of exceptions (e.g. 0-D array instead of array scalars in CuPy). JAX, PyTorch and MXNet all deviate much more, and since the NumPy API is not very well-defined (there's 1500+ public objects plus more semi-public ones), you'd have no guarantees about what works and what doesn't.

That said, `__array_ufunc__` and `__array_function__` aren't going anywhere. The RAPIDS ecosystem is invested in it and I believe largely happy with it. So adding Xarray and Pint to the mix sounds potentially interesting.","{""total_count"": 2, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,956103236
https://github.com/pydata/xarray/issues/5648#issuecomment-889875756,https://api.github.com/repos/pydata/xarray/issues/5648,889875756,IC_kwDOAMm_X841Cm0s,98330,2021-07-30T12:59:04Z,2021-07-30T12:59:04Z,NONE,"I'm happy to join, seems interesting. And yes, I can say something about PyTorch. There probably isn't much to say though - PyTorch is unlikely to adopt `__array_function__` at this point, just like JAX. And it doesn't seem critical for this hierarchy anyway - the fundamental array objects (PyTorch/CuPy/NumPy/Sparse/JAX arrays or tensors) do not have or need a class hierarchy, they are all at the bottom and should not be mixed. 

The key thing here seems to be Dask <-> Xarray <-> Pint, unless I'm missing something?","{""total_count"": 2, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,956103236
https://github.com/pydata/xarray/issues/3232#issuecomment-769656592,https://api.github.com/repos/pydata/xarray/issues/3232,769656592,MDEyOklzc3VlQ29tbWVudDc2OTY1NjU5Mg==,98330,2021-01-29T08:26:23Z,2021-01-29T08:26:23Z,NONE,"> I'm starting to suspect not because that would involve data_array being _both_ `DataArray` and a `Torch.Tensor` object. It seems what I'm in fact enabling is that `DataArray.data` is a `Torch.Tensor`.

`some_sum` is still a `DataArray`, which doesn't have a `backward` method. You could use
```
data_array = xr.DataArray(
    xr_tsr,
    coords=dict(a=[""a1"", ""a2"", ""a3""], b=[""b1"", ""b1""]),
    dims=[""a"", ""b""],
    name=""dummy"",
    attrs={""grad"": xr_tsr.grad, ""backward"": xr_tsr.backward},
)
```
and your example should work (I assume you meant `.grad` not `.grid`).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,482543307
https://github.com/pydata/xarray/issues/3232#issuecomment-766669784,https://api.github.com/repos/pydata/xarray/issues/3232,766669784,MDEyOklzc3VlQ29tbWVudDc2NjY2OTc4NA==,98330,2021-01-25T09:12:51Z,2021-01-25T09:12:51Z,NONE,"> Does this mean I shouldn't fill out `__array_function__` in my subclass? Or is this just a forward looking expectation?

No, adding it should be perfectly fine. The dispatch mechanism itself isn't going anywhere, it's part of numpy and it works. Whether or not `torch.Tensor` itself has an `__array_function__` method isn't too relevant for your subclass.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,482543307
https://github.com/pydata/xarray/issues/3232#issuecomment-765906982,https://api.github.com/repos/pydata/xarray/issues/3232,765906982,MDEyOklzc3VlQ29tbWVudDc2NTkwNjk4Mg==,98330,2021-01-23T11:12:59Z,2021-01-23T11:12:59Z,NONE,"Note that your the main work in adding `__array_function__` is not the dispatch mechanism, but mapping to 100% compatible APIs. That job should have gotten a lot easier now compared to 9 months ago. PyTorch now has a completely matching `fft` module, and a ~70% complete `linalg` module in master. And functions in the main namespace have gained dtype keywords, integer-to-float promotion, and other NumPy compat changes. So it should be feasible to write your custom subclass.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,482543307
https://github.com/pydata/xarray/issues/3232#issuecomment-765905229,https://api.github.com/repos/pydata/xarray/issues/3232,765905229,MDEyOklzc3VlQ29tbWVudDc2NTkwNTIyOQ==,98330,2021-01-23T10:57:48Z,2021-01-23T11:09:52Z,NONE,"> Create a custom subclass of PyTorch's Tensors which meets the [duck array](http://xarray.pydata.org/en/latest/internals.html) required methods and attributes. Since this isn't officially supported, looks like I could run into issues getting this subclass to persist through tensor operations.

If you use PyTorch 1.7.1 or later, then Tensor subclasses are much better preserved through pytorch functions and operations like slicing. So a custom subclass, adding the attributes and methods Xarray requires for a duck array should be feasible.

> `data = as_compatible_data(data)`

Looks like you need to patch that internally just a bit, probably adding pytorch to `NON_NUMPY_SUPPORTED_ARRAY_TYPES`.


Note that I do not expect anymore that we'll be adding `__array_function__` to `torch.Tensor`, and certainly not any time soon. My current expectation is that the ""get the correct namespace from an array/tensor object directly"" from https://numpy.org/neps/nep-0037-array-module.html#how-to-use-get-array-module and https://data-apis.github.io/array-api/latest/ will turn out to be a much better design long-term.

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,482543307
https://github.com/pydata/xarray/issues/3232#issuecomment-523101805,https://api.github.com/repos/pydata/xarray/issues/3232,523101805,MDEyOklzc3VlQ29tbWVudDUyMzEwMTgwNQ==,98330,2019-08-20T16:53:40Z,2019-08-20T16:53:40Z,NONE,"> This is a definite downside of reusing NumPy's existing namespace.

We didn't discuss an alternative very explicitly I think, but at least we'll have wide adoption fast. Hopefully the pain is limited ....","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,482543307
https://github.com/pydata/xarray/issues/3232#issuecomment-522824647,https://api.github.com/repos/pydata/xarray/issues/3232,522824647,MDEyOklzc3VlQ29tbWVudDUyMjgyNDY0Nw==,98330,2019-08-20T02:18:59Z,2019-08-20T02:18:59Z,NONE,"> Personally, I think the most viable way to achieve seamless integration with deep learning libraries would be to support integration with JAX, which already implements NumPy's API almost exactly.

Less familiar with that, but pytorch does have experimental XLA support, so that's a start. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,482543307
https://github.com/pydata/xarray/issues/3232#issuecomment-522824210,https://api.github.com/repos/pydata/xarray/issues/3232,522824210,MDEyOklzc3VlQ29tbWVudDUyMjgyNDIxMA==,98330,2019-08-20T02:16:32Z,2019-08-20T02:16:32Z,NONE,"> I think there has been some discussion about this, but I don't know the current status (CC @rgommers). 

The PyTorch team is definitely receptive to the idea of adding `__array_function__` and `__array_ufunc__`, as well as expanding the API for better NumPy compatibility.

Also, they want a `Tensor.__torch_function__` styled after `__array_function__` so they can make their own API overridable.

The tracking issue for all of this is https://github.com/pytorch/pytorch/issues/22402

> The biggest challenge for pytorch would be defining the translation layer that implements NumPy's API.

Agreed. No one is working on `__array_function__` at the moment. Implementing it has some backwards compat concerns as well, because people may be relying on `np.somefunc(some_torch_tensor)` to be coerced to `ndarray`. It's not a small project, but implementing a prototype with a few function in the `torch` namespace that are not exactly matching the NumPy API would be a useful way to start pushing this forward.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,482543307
https://github.com/pydata/xarray/issues/1375#issuecomment-511121578,https://api.github.com/repos/pydata/xarray/issues/1375,511121578,MDEyOklzc3VlQ29tbWVudDUxMTEyMTU3OA==,98330,2019-07-13T13:18:34Z,2019-07-13T13:18:34Z,NONE,"I haven't talked to anyone at SciPy'19 yet who was interested in sparse arrays, but I'll keep an eye out today.

And yes, this is a fun issue to work on and would be really nice to have!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1613#issuecomment-422090593,https://api.github.com/repos/pydata/xarray/issues/1613,422090593,MDEyOklzc3VlQ29tbWVudDQyMjA5MDU5Mw==,98330,2018-09-17T16:53:59Z,2018-09-17T16:53:59Z,NONE,"> Requiring monotonicity lets us guarantee that the result is a NumPy view rather than possibly being a copy.

Sure, but if a users happens to have non-monotonic data it just requires her to then make that copy first anyway. Still a good thing overall for performance, but there'll be cases where it's just an extra thing to understand for the user without any performance gain.

Anyway, the non-monotonic case is less relevant, because it's harder to run into in practice. The decreasing case however is easy - there is standard geo software (looking at you ArcGIS) that can write geoTiff's with monotonic decreasing indices. That's how I ran into this. Rewriting multi-GB source data that I didn't produce is not an option, so I'm left with the manual monotonicity checks and juggling label-based slices.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,263403430
https://github.com/pydata/xarray/issues/1613#issuecomment-422083146,https://api.github.com/repos/pydata/xarray/issues/1613,422083146,MDEyOklzc3VlQ29tbWVudDQyMjA4MzE0Ng==,98330,2018-09-17T16:31:51Z,2018-09-17T16:31:51Z,NONE,"> Consider using `index.is_monotonic_ascending` and `index.is_monotonic_descending` instead of subtracting the first few values -- those are exactly the checks that pandas uses.

Thanks, that's nicer, will do. And thanks for digging up the background/rationales.

> (1)Integer slicing by position and by label with positive integers should work the same for an index with values given by `range(N)`.

This I don't think I agree with. Slicing by position and by label are semantically very different operations. 

(2) is correct, but irrelevant to label-based indexing.

(3) yes, agree that's a mistake

(4) indeed

(5) I'd say that it's in practice less important, because users normally won't do `slice(high, low)`, but for consistency I agree that there should be symmetry in behaviour. Making `slice(high, low)` return the same as `slice(low, high)` on a monotonic increasing index seems reasonable. 

Additionally: arguably monotonicity should not be required. When one uses labels, the semantics are clear without monotonicity. This doesn't have a position-based equivalent.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,263403430
https://github.com/pydata/xarray/issues/1613#issuecomment-421676579,https://api.github.com/repos/pydata/xarray/issues/1613,421676579,MDEyOklzc3VlQ29tbWVudDQyMTY3NjU3OQ==,98330,2018-09-16T02:39:50Z,2018-09-16T02:39:50Z,NONE,"In case it helps anyone else, I ended up doing:
```
    # Note that xarray is fiddly with indexing - if x or y values are ordered
    # high to low, then the slice bounds need to be reversed.  So check that
    x_ordered_low2high = data.x.values[-1] - data.x.values[0] > 0
    y_ordered_low2high = data.y.values[-1] - data.y.values[0] > 0

    if x_ordered_low2high:
        x_index = slice(lower_lon, upper_lon)
    else:
        x_index = slice(upper_lon, lower_lon)

    if y_ordered_low2high:
        y_index = slice(lower_lat, upper_lat)
    else:
        y_index = slice(upper_lat, lower_lat)

    subset = data.sel(x=x_index, y=y_index)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,263403430
https://github.com/pydata/xarray/issues/1613#issuecomment-421676344,https://api.github.com/repos/pydata/xarray/issues/1613,421676344,MDEyOklzc3VlQ29tbWVudDQyMTY3NjM0NA==,98330,2018-09-16T02:37:40Z,2018-09-16T02:37:40Z,NONE,"The only related issues I can find are:

- https://github.com/pandas-dev/pandas/issues/14316
- https://github.com/pandas-dev/pandas/issues/7860

They don't look identical though. Don't really have the time to dive into that further now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,263403430
https://github.com/pydata/xarray/issues/1613#issuecomment-420296389,https://api.github.com/repos/pydata/xarray/issues/1613,420296389,MDEyOklzc3VlQ29tbWVudDQyMDI5NjM4OQ==,98330,2018-09-11T14:35:19Z,2018-09-11T14:35:19Z,NONE,"Ah okay, that makes sense. I'm sure there's a related pandas issue (or many), will try to find that later.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,263403430
https://github.com/pydata/xarray/issues/1613#issuecomment-420084948,https://api.github.com/repos/pydata/xarray/issues/1613,420084948,MDEyOklzc3VlQ29tbWVudDQyMDA4NDk0OA==,98330,2018-09-10T22:40:15Z,2018-09-10T22:41:58Z,NONE,"> The justification for this logic is that sel/loc would work like normal indexing [] on a Python list, and if you supply slice limits in the opposite order on a Python list, you get an empty result.

Given that `sel` is label/value-based, I don't see how the analogy with positional list-based indexing applies. This behavior looks like a functional bug. If you have decided to keep it as is, I'd suggest at the very least to add a warning if an empty dataset is returned because values in an attribute are ordered high...low.

EDIT: also then best to close this issue as wontfix","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,263403430