home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 362951129

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1388#issuecomment-362951129 https://api.github.com/repos/pydata/xarray/issues/1388 362951129 MDEyOklzc3VlQ29tbWVudDM2Mjk1MTEyOQ== 1217238 2018-02-04T23:54:10Z 2018-02-04T23:54:10Z MEMBER

I think it would be fine to add a special case to preserve coordinates corresponding to min/max values with xarray's min() and max() methods, but I don't feel strongly about this. The exact coordinates could be surprising if there are multiple min/max values.

I agree that it does not make sense to preserve coordinates along aggregated dimensions for argmin/argmax, but we can preserve other coordinates. So I like @fujiisoup's example behavior above.

I suppose we now have two candidate APIs for returning multiple indices from a method like argmin/argmax: 1. Add an additional dimension, e.g., argmaxdim for keeping track of multiple indices. 2. Return a dict or Dataset with multiple indices.

I think my favorite option is (2) with da.argmin_indices() returning a Dataset, which will allow da[da.argmin_indices()] to work after we finish switching __iter__ to only iterate over data variables (https://github.com/pydata/xarray/issues/884#issuecomment-338445322). One downside of this approach is that it is not obvious how we woudl define argmin_indices() to work on a Dataset, but that's probably OK given that you cannot use a Dataset (yet) for indexing.

My concern with adding an additional dimension is that it is always a little surprising and error-prone when we invent new dimension names not supplied by the user (for example, this can lead to conflicting names). Also, consolidating indices will not work as well with idxmin(), which might put indices of different dtypes in the same array.

Either way, I would like a separate dedicated method for returning multiple indexing arrays. It's convenient (and what users expect) for argmax to return a single array if taking the max only over one dimension. However, if we switch to add an argmaxdim or return a dict/Dataset for multiple dimensions, then we will end up with an annoying inconsistency between the 1D and N-D versions. It would be better to say argmax(dim) is only for one dimension (and raise an error if this is not true) and have the separate argmax_indices(dims) that is consistently defined for any number of dimensions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  224878728
Powered by Datasette · Queries took 0.824ms · About: xarray-datasette