home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

18 rows where user = 98330 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 5

  • Use pytorch as backend for xarrays 7
  • Should sel with slice objects care about underlying coordinate order? 6
  • Duck array compatibility meeting 3
  • Sparse arrays 1
  • `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved 1

user 1

  • rgommers · 18 ✖

author_association 1

  • NONE 18
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1516494141 https://github.com/pydata/xarray/issues/7721#issuecomment-1516494141 https://api.github.com/repos/pydata/xarray/issues/7721 IC_kwDOAMm_X85aY909 rgommers 98330 2023-04-20T15:04:17Z 2023-04-20T15:04:17Z NONE

So really, my question is: how do we support python scalars for libraries that only implement __array_namespace__, given that stopping to do so would be a major breaking change?

I was considering this question for SciPy (xref scipy#18286) this week, and I think I'm happy with this strategy: 1. Cast all "array-like" inputs like Python scalars, lists/sequences, and generators, to numpy.ndarray. 2. Require "same array type" input, forbid mixing numpy-cupy, numpy-pytorch, cupy-pytorch, etc. - this will raise an exception 3. As a result, cupy-pyscalar and pytorch-pyscalar will also raise an exception.

What that results in is an API that's backwards-compatible for numpy and array-like usage, and much stricter when using other array libraries. That strictness to me is a good thing, because: - that's what CuPy, PyTorch & co themselves do, and it works well there - it avoids the complexity raised by arbitrary mixing, which results in questions like the one raised in this issue. - in case you do need to use a scalar from within a function inside your own library, just convert it explicitly to the desired array type with xp.asarray(a_scalar) giving you a 0-D array of the correct type (add dtype=x.dtype to make sure dtypes match if that matters)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved 1655290694
925025457 https://github.com/pydata/xarray/issues/5648#issuecomment-925025457 https://api.github.com/repos/pydata/xarray/issues/5648 IC_kwDOAMm_X843IsSx rgommers 98330 2021-09-22T15:12:14Z 2021-09-22T19:25:49Z NONE

There are also some relevant and very interesting PyTorch development discussions; there are a lot of Tensor subclasses in the making and thought being put into how those interact: - https://dev-discuss.pytorch.org/t/state-of-pytorch-core-september-2021-edition/332#alternative-tensors-5 - https://pytorch-dev-podcast.simplecast.com/episodes/tensor-subclasses-and-liskov-substitution-principle - https://dev-discuss.pytorch.org/t/functorch-levels-as-dynamically-allocated-classes/294

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Duck array compatibility meeting 956103236
890310954 https://github.com/pydata/xarray/issues/5648#issuecomment-890310954 https://api.github.com/repos/pydata/xarray/issues/5648 IC_kwDOAMm_X841EREq rgommers 98330 2021-07-31T08:24:02Z 2021-07-31T08:24:02Z NONE

Interesting - could you say a bit more? Looking at these two issues, it seemed more like the question was simply on hold until someone who wanted it badly enough came along?

There is a significant backwards compatibility break when a library adds __array_ufunc__ and __array_function__. JAX maintainers were not comfortable with that. @shoyer wrote https://numpy.org/neps/nep-0037-array-module.html as a follow-up largely because of that. That NEP is effectively superceded by the array API standard (https://data-apis.org/array-api/latest/ and NEP 47). PyTorch has decided to adopt that and implementation is well underway. Experimental support in NumPy will land next week (complete except for linalg). CuPy and MXNet plan to follow that NumPy implementation. JAX and Dask not yet decided I believe, but likely to follow NumPy as well.

Canonical/minimal API of a "duck array" and how to detect it (though may be superseded by NEPs 30 and 47 among others)

This is basically what the array API standard provides (most functions follow NumPy, with deviations mostly where other libraries were already deviating because they could implement something on GPU, with a JIT, or with a non-strided memory model). __array_function__ has worked quite well for CuPy and Dask, because they follow the NumPy API almost to the letter, with only a couple of exceptions (e.g. 0-D array instead of array scalars in CuPy). JAX, PyTorch and MXNet all deviate much more, and since the NumPy API is not very well-defined (there's 1500+ public objects plus more semi-public ones), you'd have no guarantees about what works and what doesn't.

That said, __array_ufunc__ and __array_function__ aren't going anywhere. The RAPIDS ecosystem is invested in it and I believe largely happy with it. So adding Xarray and Pint to the mix sounds potentially interesting.

{
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  Duck array compatibility meeting 956103236
889875756 https://github.com/pydata/xarray/issues/5648#issuecomment-889875756 https://api.github.com/repos/pydata/xarray/issues/5648 IC_kwDOAMm_X841Cm0s rgommers 98330 2021-07-30T12:59:04Z 2021-07-30T12:59:04Z NONE

I'm happy to join, seems interesting. And yes, I can say something about PyTorch. There probably isn't much to say though - PyTorch is unlikely to adopt __array_function__ at this point, just like JAX. And it doesn't seem critical for this hierarchy anyway - the fundamental array objects (PyTorch/CuPy/NumPy/Sparse/JAX arrays or tensors) do not have or need a class hierarchy, they are all at the bottom and should not be mixed.

The key thing here seems to be Dask <-> Xarray <-> Pint, unless I'm missing something?

{
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Duck array compatibility meeting 956103236
769656592 https://github.com/pydata/xarray/issues/3232#issuecomment-769656592 https://api.github.com/repos/pydata/xarray/issues/3232 MDEyOklzc3VlQ29tbWVudDc2OTY1NjU5Mg== rgommers 98330 2021-01-29T08:26:23Z 2021-01-29T08:26:23Z NONE

I'm starting to suspect not because that would involve data_array being both DataArray and a Torch.Tensor object. It seems what I'm in fact enabling is that DataArray.data is a Torch.Tensor.

some_sum is still a DataArray, which doesn't have a backward method. You could use data_array = xr.DataArray( xr_tsr, coords=dict(a=["a1", "a2", "a3"], b=["b1", "b1"]), dims=["a", "b"], name="dummy", attrs={"grad": xr_tsr.grad, "backward": xr_tsr.backward}, ) and your example should work (I assume you meant .grad not .grid).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use pytorch as backend for xarrays 482543307
766669784 https://github.com/pydata/xarray/issues/3232#issuecomment-766669784 https://api.github.com/repos/pydata/xarray/issues/3232 MDEyOklzc3VlQ29tbWVudDc2NjY2OTc4NA== rgommers 98330 2021-01-25T09:12:51Z 2021-01-25T09:12:51Z NONE

Does this mean I shouldn't fill out __array_function__ in my subclass? Or is this just a forward looking expectation?

No, adding it should be perfectly fine. The dispatch mechanism itself isn't going anywhere, it's part of numpy and it works. Whether or not torch.Tensor itself has an __array_function__ method isn't too relevant for your subclass.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use pytorch as backend for xarrays 482543307
765906982 https://github.com/pydata/xarray/issues/3232#issuecomment-765906982 https://api.github.com/repos/pydata/xarray/issues/3232 MDEyOklzc3VlQ29tbWVudDc2NTkwNjk4Mg== rgommers 98330 2021-01-23T11:12:59Z 2021-01-23T11:12:59Z NONE

Note that your the main work in adding __array_function__ is not the dispatch mechanism, but mapping to 100% compatible APIs. That job should have gotten a lot easier now compared to 9 months ago. PyTorch now has a completely matching fft module, and a ~70% complete linalg module in master. And functions in the main namespace have gained dtype keywords, integer-to-float promotion, and other NumPy compat changes. So it should be feasible to write your custom subclass.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use pytorch as backend for xarrays 482543307
765905229 https://github.com/pydata/xarray/issues/3232#issuecomment-765905229 https://api.github.com/repos/pydata/xarray/issues/3232 MDEyOklzc3VlQ29tbWVudDc2NTkwNTIyOQ== rgommers 98330 2021-01-23T10:57:48Z 2021-01-23T11:09:52Z NONE

Create a custom subclass of PyTorch's Tensors which meets the duck array required methods and attributes. Since this isn't officially supported, looks like I could run into issues getting this subclass to persist through tensor operations.

If you use PyTorch 1.7.1 or later, then Tensor subclasses are much better preserved through pytorch functions and operations like slicing. So a custom subclass, adding the attributes and methods Xarray requires for a duck array should be feasible.

data = as_compatible_data(data)

Looks like you need to patch that internally just a bit, probably adding pytorch to NON_NUMPY_SUPPORTED_ARRAY_TYPES.

Note that I do not expect anymore that we'll be adding __array_function__ to torch.Tensor, and certainly not any time soon. My current expectation is that the "get the correct namespace from an array/tensor object directly" from https://numpy.org/neps/nep-0037-array-module.html#how-to-use-get-array-module and https://data-apis.github.io/array-api/latest/ will turn out to be a much better design long-term.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use pytorch as backend for xarrays 482543307
523101805 https://github.com/pydata/xarray/issues/3232#issuecomment-523101805 https://api.github.com/repos/pydata/xarray/issues/3232 MDEyOklzc3VlQ29tbWVudDUyMzEwMTgwNQ== rgommers 98330 2019-08-20T16:53:40Z 2019-08-20T16:53:40Z NONE

This is a definite downside of reusing NumPy's existing namespace.

We didn't discuss an alternative very explicitly I think, but at least we'll have wide adoption fast. Hopefully the pain is limited ....

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use pytorch as backend for xarrays 482543307
522824647 https://github.com/pydata/xarray/issues/3232#issuecomment-522824647 https://api.github.com/repos/pydata/xarray/issues/3232 MDEyOklzc3VlQ29tbWVudDUyMjgyNDY0Nw== rgommers 98330 2019-08-20T02:18:59Z 2019-08-20T02:18:59Z NONE

Personally, I think the most viable way to achieve seamless integration with deep learning libraries would be to support integration with JAX, which already implements NumPy's API almost exactly.

Less familiar with that, but pytorch does have experimental XLA support, so that's a start.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use pytorch as backend for xarrays 482543307
522824210 https://github.com/pydata/xarray/issues/3232#issuecomment-522824210 https://api.github.com/repos/pydata/xarray/issues/3232 MDEyOklzc3VlQ29tbWVudDUyMjgyNDIxMA== rgommers 98330 2019-08-20T02:16:32Z 2019-08-20T02:16:32Z NONE

I think there has been some discussion about this, but I don't know the current status (CC @rgommers).

The PyTorch team is definitely receptive to the idea of adding __array_function__ and __array_ufunc__, as well as expanding the API for better NumPy compatibility.

Also, they want a Tensor.__torch_function__ styled after __array_function__ so they can make their own API overridable.

The tracking issue for all of this is https://github.com/pytorch/pytorch/issues/22402

The biggest challenge for pytorch would be defining the translation layer that implements NumPy's API.

Agreed. No one is working on __array_function__ at the moment. Implementing it has some backwards compat concerns as well, because people may be relying on np.somefunc(some_torch_tensor) to be coerced to ndarray. It's not a small project, but implementing a prototype with a few function in the torch namespace that are not exactly matching the NumPy API would be a useful way to start pushing this forward.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use pytorch as backend for xarrays 482543307
511121578 https://github.com/pydata/xarray/issues/1375#issuecomment-511121578 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUxMTEyMTU3OA== rgommers 98330 2019-07-13T13:18:34Z 2019-07-13T13:18:34Z NONE

I haven't talked to anyone at SciPy'19 yet who was interested in sparse arrays, but I'll keep an eye out today.

And yes, this is a fun issue to work on and would be really nice to have!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
422090593 https://github.com/pydata/xarray/issues/1613#issuecomment-422090593 https://api.github.com/repos/pydata/xarray/issues/1613 MDEyOklzc3VlQ29tbWVudDQyMjA5MDU5Mw== rgommers 98330 2018-09-17T16:53:59Z 2018-09-17T16:53:59Z NONE

Requiring monotonicity lets us guarantee that the result is a NumPy view rather than possibly being a copy.

Sure, but if a users happens to have non-monotonic data it just requires her to then make that copy first anyway. Still a good thing overall for performance, but there'll be cases where it's just an extra thing to understand for the user without any performance gain.

Anyway, the non-monotonic case is less relevant, because it's harder to run into in practice. The decreasing case however is easy - there is standard geo software (looking at you ArcGIS) that can write geoTiff's with monotonic decreasing indices. That's how I ran into this. Rewriting multi-GB source data that I didn't produce is not an option, so I'm left with the manual monotonicity checks and juggling label-based slices.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should sel with slice objects care about underlying coordinate order? 263403430
422083146 https://github.com/pydata/xarray/issues/1613#issuecomment-422083146 https://api.github.com/repos/pydata/xarray/issues/1613 MDEyOklzc3VlQ29tbWVudDQyMjA4MzE0Ng== rgommers 98330 2018-09-17T16:31:51Z 2018-09-17T16:31:51Z NONE

Consider using index.is_monotonic_ascending and index.is_monotonic_descending instead of subtracting the first few values -- those are exactly the checks that pandas uses.

Thanks, that's nicer, will do. And thanks for digging up the background/rationales.

(1)Integer slicing by position and by label with positive integers should work the same for an index with values given by range(N).

This I don't think I agree with. Slicing by position and by label are semantically very different operations.

(2) is correct, but irrelevant to label-based indexing.

(3) yes, agree that's a mistake

(4) indeed

(5) I'd say that it's in practice less important, because users normally won't do slice(high, low), but for consistency I agree that there should be symmetry in behaviour. Making slice(high, low) return the same as slice(low, high) on a monotonic increasing index seems reasonable.

Additionally: arguably monotonicity should not be required. When one uses labels, the semantics are clear without monotonicity. This doesn't have a position-based equivalent.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should sel with slice objects care about underlying coordinate order? 263403430
421676579 https://github.com/pydata/xarray/issues/1613#issuecomment-421676579 https://api.github.com/repos/pydata/xarray/issues/1613 MDEyOklzc3VlQ29tbWVudDQyMTY3NjU3OQ== rgommers 98330 2018-09-16T02:39:50Z 2018-09-16T02:39:50Z NONE

In case it helps anyone else, I ended up doing: ``` # Note that xarray is fiddly with indexing - if x or y values are ordered # high to low, then the slice bounds need to be reversed. So check that x_ordered_low2high = data.x.values[-1] - data.x.values[0] > 0 y_ordered_low2high = data.y.values[-1] - data.y.values[0] > 0

if x_ordered_low2high:
    x_index = slice(lower_lon, upper_lon)
else:
    x_index = slice(upper_lon, lower_lon)

if y_ordered_low2high:
    y_index = slice(lower_lat, upper_lat)
else:
    y_index = slice(upper_lat, lower_lat)

subset = data.sel(x=x_index, y=y_index)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should sel with slice objects care about underlying coordinate order? 263403430
421676344 https://github.com/pydata/xarray/issues/1613#issuecomment-421676344 https://api.github.com/repos/pydata/xarray/issues/1613 MDEyOklzc3VlQ29tbWVudDQyMTY3NjM0NA== rgommers 98330 2018-09-16T02:37:40Z 2018-09-16T02:37:40Z NONE

The only related issues I can find are:

  • https://github.com/pandas-dev/pandas/issues/14316
  • https://github.com/pandas-dev/pandas/issues/7860

They don't look identical though. Don't really have the time to dive into that further now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should sel with slice objects care about underlying coordinate order? 263403430
420296389 https://github.com/pydata/xarray/issues/1613#issuecomment-420296389 https://api.github.com/repos/pydata/xarray/issues/1613 MDEyOklzc3VlQ29tbWVudDQyMDI5NjM4OQ== rgommers 98330 2018-09-11T14:35:19Z 2018-09-11T14:35:19Z NONE

Ah okay, that makes sense. I'm sure there's a related pandas issue (or many), will try to find that later.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should sel with slice objects care about underlying coordinate order? 263403430
420084948 https://github.com/pydata/xarray/issues/1613#issuecomment-420084948 https://api.github.com/repos/pydata/xarray/issues/1613 MDEyOklzc3VlQ29tbWVudDQyMDA4NDk0OA== rgommers 98330 2018-09-10T22:40:15Z 2018-09-10T22:41:58Z NONE

The justification for this logic is that sel/loc would work like normal indexing [] on a Python list, and if you supply slice limits in the opposite order on a Python list, you get an empty result.

Given that sel is label/value-based, I don't see how the analogy with positional list-based indexing applies. This behavior looks like a functional bug. If you have decided to keep it as is, I'd suggest at the very least to add a warning if an empty dataset is returned because values in an attribute are ordered high...low.

EDIT: also then best to close this issue as wontfix

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should sel with slice objects care about underlying coordinate order? 263403430

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 18.528ms · About: xarray-datasette