home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "MEMBER", issue = 1655290694 and user = 14808389 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • keewis · 2 ✖

issue 1

  • `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved · 2 ✖

author_association 1

  • MEMBER · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1516251985 https://github.com/pydata/xarray/issues/7721#issuecomment-1516251985 https://api.github.com/repos/pydata/xarray/issues/7721 IC_kwDOAMm_X85aYCtR keewis 14808389 2023-04-20T12:35:30Z 2023-04-20T12:36:55Z MEMBER

there's two things that happen in as_shared_dtype (which may not be good design, and we should probably consider splitting it into as_shared_dtype and as_compatible_arrays or something): first, we cast everything to an array, then decide on a common dtype and cast everything to that.

The latter could easily be done by using numpy scalars, which as far as I can tell would be supported by most array libraries, including cupy. However, the reason we need to cast to arrays is that the array API (i.e. __array_namespace__) does not allow using scalars of any type, e.g. np.array_api.where (this is important for libraries that don't implement __array_ufunc__ / __array_function__). To clarify, what we're trying to support is something like python import numpy.array_api as np np.where(cond, cupy_array, python_scalar) which (intentionally?) does not work.

At the moment, as_shared_dtype (or, really, the hypothetical as_compatible_arrays) correctly casts python_scalar to a 0-d cupy.array for the example above, but if we were to replace cupy_array with chunked_cupy_array or chunked_cupy_array_with_units, the special casing for cupy stops to work and scalars will be cast to 0-d numpy.array. Conceptually, I tend to think of 0-d arrays as equivalent to scalars, hence the suggestion to have cupy treat numpy scalars and 0-d numpy.array the same way (I don't follow the array api closely enough to know whether that was already discussed and rejected).

So really, my question is: how do we support python scalars for libraries that only implement __array_namespace__, given that stopping to do so would be a major breaking change?

Of course, I would prefer removing the special casing for specific libraries, but I wouldn't be opposed to keeping the existing one. I guess as a short-term fix we could just pull _meta out of duck dask arrays and determine the common array type for that (the downside is that we'd add another special case for dask, which in another PR we're actually trying to remove).

As a long-term fix I guess we'd need to revive the stalled nested duck array discussion.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved 1655290694
1510169910 https://github.com/pydata/xarray/issues/7721#issuecomment-1510169910 https://api.github.com/repos/pydata/xarray/issues/7721 IC_kwDOAMm_X85aA102 keewis 14808389 2023-04-16T08:09:09Z 2023-04-16T08:09:09Z MEMBER

The issue is that here: https://github.com/pydata/xarray/blob/d4db16699f30ad1dc3e6861601247abf4ac96567/xarray/core/duck_array_ops.py#L193-L206 we try to convert everything to the same dtype, casting numpy and python scalars to an array. The latter is important, because e.g. numpy.array_api.where only accepts arrays as input.

However, detecting cupy beneath (multiple) layers of duckarrays is not easy, which means that for example passing a pint(dask(cupy)) array together with scalars will currently cast the scalars to 0-d numpy arrays, while passing a cupy array instead will result in 0-d cupy arrays.

My naive suggestion was to treat np.int64(0) and np.array(0, dtype="int64") the same, where at the moment the latter would fail for the same reason as np.array([0], dtype="int64").

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved 1655290694

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 98.801ms · About: xarray-datasette