home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 224878728 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 3 ✖

issue 1

  • argmin / argmax behavior doesn't match documentation · 3 ✖

author_association 1

  • MEMBER 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
362951129 https://github.com/pydata/xarray/issues/1388#issuecomment-362951129 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDM2Mjk1MTEyOQ== shoyer 1217238 2018-02-04T23:54:10Z 2018-02-04T23:54:10Z MEMBER

I think it would be fine to add a special case to preserve coordinates corresponding to min/max values with xarray's min() and max() methods, but I don't feel strongly about this. The exact coordinates could be surprising if there are multiple min/max values.

I agree that it does not make sense to preserve coordinates along aggregated dimensions for argmin/argmax, but we can preserve other coordinates. So I like @fujiisoup's example behavior above.

I suppose we now have two candidate APIs for returning multiple indices from a method like argmin/argmax: 1. Add an additional dimension, e.g., argmaxdim for keeping track of multiple indices. 2. Return a dict or Dataset with multiple indices.

I think my favorite option is (2) with da.argmin_indices() returning a Dataset, which will allow da[da.argmin_indices()] to work after we finish switching __iter__ to only iterate over data variables (https://github.com/pydata/xarray/issues/884#issuecomment-338445322). One downside of this approach is that it is not obvious how we woudl define argmin_indices() to work on a Dataset, but that's probably OK given that you cannot use a Dataset (yet) for indexing.

My concern with adding an additional dimension is that it is always a little surprising and error-prone when we invent new dimension names not supplied by the user (for example, this can lead to conflicting names). Also, consolidating indices will not work as well with idxmin(), which might put indices of different dtypes in the same array.

Either way, I would like a separate dedicated method for returning multiple indexing arrays. It's convenient (and what users expect) for argmax to return a single array if taking the max only over one dimension. However, if we switch to add an argmaxdim or return a dict/Dataset for multiple dimensions, then we will end up with an annoying inconsistency between the 1D and N-D versions. It would be better to say argmax(dim) is only for one dimension (and raise an error if this is not true) and have the separate argmax_indices(dims) that is consistently defined for any number of dimensions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728
298256260 https://github.com/pydata/xarray/issues/1388#issuecomment-298256260 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDI5ODI1NjI2MA== shoyer 1217238 2017-04-30T20:50:49Z 2017-04-30T20:50:49Z MEMBER

I agree that arr[arr.argmin(dim)] == arr.min(dim) is a useful invariant, but currently xarray's indexing works a little differently from NumPy. Probably arr.isel_points(**arr.argmin(dim)) == arg.min(dim) is the better invariant for now, and in the future arr[arr.argmin(dim)] will change to work consistently (#974).

The main downside of returning a tuple or dict from argmin() is that it makes the common case of taking the max/min over one dimension a little harder. So possibly it would be better to write two methods:

  • argmin would work like it does currently (returning an xarray object), but error if reducing over multiple dimensions.
  • argmin_indices would return a dict suitable for use in indexing.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728
297845435 https://github.com/pydata/xarray/issues/1388#issuecomment-297845435 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDI5Nzg0NTQzNQ== shoyer 1217238 2017-04-27T21:33:06Z 2017-04-27T21:33:06Z MEMBER

Agreed, the current implementation of argmin() only gives the correct result when given one dimension. It's not entirely obvious what argmin() should yield when done to multiple dimensions, certainly this is not very useful. I would prefer raising an error to this current behavior.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 183.11ms · About: xarray-datasette