home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 1655290694 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • leofang 2
  • keewis 2
  • rgommers 1
  • jacobtomlinson 1

author_association 3

  • NONE 3
  • MEMBER 2
  • CONTRIBUTOR 1

issue 1

  • `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1516494141 https://github.com/pydata/xarray/issues/7721#issuecomment-1516494141 https://api.github.com/repos/pydata/xarray/issues/7721 IC_kwDOAMm_X85aY909 rgommers 98330 2023-04-20T15:04:17Z 2023-04-20T15:04:17Z NONE

So really, my question is: how do we support python scalars for libraries that only implement __array_namespace__, given that stopping to do so would be a major breaking change?

I was considering this question for SciPy (xref scipy#18286) this week, and I think I'm happy with this strategy: 1. Cast all "array-like" inputs like Python scalars, lists/sequences, and generators, to numpy.ndarray. 2. Require "same array type" input, forbid mixing numpy-cupy, numpy-pytorch, cupy-pytorch, etc. - this will raise an exception 3. As a result, cupy-pyscalar and pytorch-pyscalar will also raise an exception.

What that results in is an API that's backwards-compatible for numpy and array-like usage, and much stricter when using other array libraries. That strictness to me is a good thing, because: - that's what CuPy, PyTorch & co themselves do, and it works well there - it avoids the complexity raised by arbitrary mixing, which results in questions like the one raised in this issue. - in case you do need to use a scalar from within a function inside your own library, just convert it explicitly to the desired array type with xp.asarray(a_scalar) giving you a 0-D array of the correct type (add dtype=x.dtype to make sure dtypes match if that matters)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved 1655290694
1516251985 https://github.com/pydata/xarray/issues/7721#issuecomment-1516251985 https://api.github.com/repos/pydata/xarray/issues/7721 IC_kwDOAMm_X85aYCtR keewis 14808389 2023-04-20T12:35:30Z 2023-04-20T12:36:55Z MEMBER

there's two things that happen in as_shared_dtype (which may not be good design, and we should probably consider splitting it into as_shared_dtype and as_compatible_arrays or something): first, we cast everything to an array, then decide on a common dtype and cast everything to that.

The latter could easily be done by using numpy scalars, which as far as I can tell would be supported by most array libraries, including cupy. However, the reason we need to cast to arrays is that the array API (i.e. __array_namespace__) does not allow using scalars of any type, e.g. np.array_api.where (this is important for libraries that don't implement __array_ufunc__ / __array_function__). To clarify, what we're trying to support is something like python import numpy.array_api as np np.where(cond, cupy_array, python_scalar) which (intentionally?) does not work.

At the moment, as_shared_dtype (or, really, the hypothetical as_compatible_arrays) correctly casts python_scalar to a 0-d cupy.array for the example above, but if we were to replace cupy_array with chunked_cupy_array or chunked_cupy_array_with_units, the special casing for cupy stops to work and scalars will be cast to 0-d numpy.array. Conceptually, I tend to think of 0-d arrays as equivalent to scalars, hence the suggestion to have cupy treat numpy scalars and 0-d numpy.array the same way (I don't follow the array api closely enough to know whether that was already discussed and rejected).

So really, my question is: how do we support python scalars for libraries that only implement __array_namespace__, given that stopping to do so would be a major breaking change?

Of course, I would prefer removing the special casing for specific libraries, but I wouldn't be opposed to keeping the existing one. I guess as a short-term fix we could just pull _meta out of duck dask arrays and determine the common array type for that (the downside is that we'd add another special case for dask, which in another PR we're actually trying to remove).

As a long-term fix I guess we'd need to revive the stalled nested duck array discussion.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved 1655290694
1515600072 https://github.com/pydata/xarray/issues/7721#issuecomment-1515600072 https://api.github.com/repos/pydata/xarray/issues/7721 IC_kwDOAMm_X85aVjjI leofang 5534781 2023-04-20T01:50:58Z 2023-04-20T01:50:58Z NONE

Thanks, Justus, for expanding on this. It sounds to me the question is "how do we cast dtypes when multiple array libraries are participating in the same computation?" and I am not sure I am knowledgable enough to make any comment.

From the array API point of view, long long ago we decided that this is UB (undefined behavior), meaning it's completely up to each library to decide what to do. You can raise or come up with a special rule that you can make sense of.

It sounds like Xarray has some machinery to deal with this situation, but you'd rather prefer to not keep special-casing for a certain array library? Am I understanding it right?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved 1655290694
1510169910 https://github.com/pydata/xarray/issues/7721#issuecomment-1510169910 https://api.github.com/repos/pydata/xarray/issues/7721 IC_kwDOAMm_X85aA102 keewis 14808389 2023-04-16T08:09:09Z 2023-04-16T08:09:09Z MEMBER

The issue is that here: https://github.com/pydata/xarray/blob/d4db16699f30ad1dc3e6861601247abf4ac96567/xarray/core/duck_array_ops.py#L193-L206 we try to convert everything to the same dtype, casting numpy and python scalars to an array. The latter is important, because e.g. numpy.array_api.where only accepts arrays as input.

However, detecting cupy beneath (multiple) layers of duckarrays is not easy, which means that for example passing a pint(dask(cupy)) array together with scalars will currently cast the scalars to 0-d numpy arrays, while passing a cupy array instead will result in 0-d cupy arrays.

My naive suggestion was to treat np.int64(0) and np.array(0, dtype="int64") the same, where at the moment the latter would fail for the same reason as np.array([0], dtype="int64").

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved 1655290694
1510016072 https://github.com/pydata/xarray/issues/7721#issuecomment-1510016072 https://api.github.com/repos/pydata/xarray/issues/7721 IC_kwDOAMm_X85aAQRI leofang 5534781 2023-04-16T01:25:53Z 2023-04-16T01:25:53Z NONE

Sorry that I missed the ping, Jacob, but I'd need more context for making any suggestions/answers 😅 Is the question about why CuPy wouldn't return scalars?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved 1655290694
1498773949 https://github.com/pydata/xarray/issues/7721#issuecomment-1498773949 https://api.github.com/repos/pydata/xarray/issues/7721 IC_kwDOAMm_X85ZVXm9 jacobtomlinson 1610850 2023-04-06T09:39:43Z 2023-04-06T09:39:43Z CONTRIBUTOR

Ping @leofang in case you have thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved 1655290694

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.673ms · About: xarray-datasette