home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where issue = 224878728 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 6

  • shoyer 3
  • fujiisoup 3
  • gajomi 2
  • dcherian 1
  • lamorton 1
  • stale[bot] 1

author_association 3

  • MEMBER 7
  • CONTRIBUTOR 2
  • NONE 2

issue 1

  • argmin / argmax behavior doesn't match documentation · 11 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
656361865 https://github.com/pydata/xarray/issues/1388#issuecomment-656361865 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDY1NjM2MTg2NQ== dcherian 2448579 2020-07-09T21:32:52Z 2020-07-09T21:32:52Z MEMBER

I think this is fixed now thanks to @johnomotani

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728
571137508 https://github.com/pydata/xarray/issues/1388#issuecomment-571137508 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDU3MTEzNzUwOA== stale[bot] 26384082 2020-01-06T13:26:42Z 2020-01-06T13:26:42Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728
363072151 https://github.com/pydata/xarray/issues/1388#issuecomment-363072151 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDM2MzA3MjE1MQ== gajomi 244887 2018-02-05T12:35:10Z 2018-02-05T12:35:10Z CONTRIBUTOR

@fujiisoup and @shoyer

Really enlightening comments above. I think I am starting to get the dao of xarray a bit better :)

I was thinking whether such aggregation methods (including argmin) should propagate the coordinate.

Agreed it would be nice to have a consistent and well reasoned rule for coordinate propagation in aggregation methods. I think a key point here, which gets brought up in your example is that it might make sense to have different subrules depending on the semantics of the operation. Functions like argmax are explicitly concerned with underlying "indices" (dimensions or otherwise) and so may call for different behavior from the mean, which explicitly is explicitly invariant under permutations of the underlying indices. The max/min/median functions are interesting case to think about, in that they are also invariant with under change of underlying indices, but can have potentially more than one index that they are associated with and do not "destroy" information about the value at those indices.

My concern with adding an additional dimension is that it is always a little surprising and error-prone when we invent new dimension names not supplied by the user (for example, this can lead to conflicting names)

Yeah, I felt a little dirty appending '_argmax'.

I think my favorite option is (2) with da.argmin_indices() returning a Dataset, which will allow da[da.argmin_indices()]

OK. I think I understand now why @fujiisoup proposed output a Dataset rather than an array. That's a natural syntax for getting the values from the indices.

Either way, I would like a separate dedicated method for returning multiple indexing arrays.

+1 to dedicated adding more methods if needed, since I think even if it isn;t needed the associated docs will need to make sure users are aware of the analogous idx* methods if they get added.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728
362951129 https://github.com/pydata/xarray/issues/1388#issuecomment-362951129 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDM2Mjk1MTEyOQ== shoyer 1217238 2018-02-04T23:54:10Z 2018-02-04T23:54:10Z MEMBER

I think it would be fine to add a special case to preserve coordinates corresponding to min/max values with xarray's min() and max() methods, but I don't feel strongly about this. The exact coordinates could be surprising if there are multiple min/max values.

I agree that it does not make sense to preserve coordinates along aggregated dimensions for argmin/argmax, but we can preserve other coordinates. So I like @fujiisoup's example behavior above.

I suppose we now have two candidate APIs for returning multiple indices from a method like argmin/argmax: 1. Add an additional dimension, e.g., argmaxdim for keeping track of multiple indices. 2. Return a dict or Dataset with multiple indices.

I think my favorite option is (2) with da.argmin_indices() returning a Dataset, which will allow da[da.argmin_indices()] to work after we finish switching __iter__ to only iterate over data variables (https://github.com/pydata/xarray/issues/884#issuecomment-338445322). One downside of this approach is that it is not obvious how we woudl define argmin_indices() to work on a Dataset, but that's probably OK given that you cannot use a Dataset (yet) for indexing.

My concern with adding an additional dimension is that it is always a little surprising and error-prone when we invent new dimension names not supplied by the user (for example, this can lead to conflicting names). Also, consolidating indices will not work as well with idxmin(), which might put indices of different dtypes in the same array.

Either way, I would like a separate dedicated method for returning multiple indexing arrays. It's convenient (and what users expect) for argmax to return a single array if taking the max only over one dimension. However, if we switch to add an argmaxdim or return a dict/Dataset for multiple dimensions, then we will end up with an annoying inconsistency between the 1D and N-D versions. It would be better to say argmax(dim) is only for one dimension (and raise an error if this is not true) and have the separate argmax_indices(dims) that is consistently defined for any number of dimensions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728
362902669 https://github.com/pydata/xarray/issues/1388#issuecomment-362902669 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDM2MjkwMjY2OQ== fujiisoup 6815844 2018-02-04T12:20:33Z 2018-02-04T12:52:29Z MEMBER

@gajomi

Sorry for my late response and thank you for the proposal.

But aside from my previous proposal, I was thinking whether such aggregation methods (including argmin) should propagate the coordinate. For example, as you pointed out, in theory, we may be able to track x-coordinate at the argmin index after da.argmin(dim='x'). But it is not reasonable for da.mean(dim='x'). It may be reasonable for da.max(dim='x') but not for da.median(dim='x').

Such specific rules may be confusing and bring additional complexity. I think the rule we do not track coordinates after aggregations would be much simpler and easier to understand.

If we adopt the above rule, I think the argmin would give just an array of indices, ```python In [1]: import xarray as xr ...: da = xr.DataArray([[0, 3, 2], [2, 1, 4]], dims=['x', 'y'], ...: coords={'x': [1, 2], 'y': ['a', 'b', 'c']}) ...:

In [4]: da.argmin(dim='x') Out[4]: <xarray.DataArray (y: 3)> array([0, 1, 0]) Coordinates: * y (y) <U1 'a' 'b' 'c'

In [3]: da.isel(x=da.argmin(dim='x')) Out[3]: <xarray.DataArray (y: 3)> array([0, 1, 2]) Coordinates: x (y) int64 1 2 1 * y (y) <U1 'a' 'b' 'c'

```

I think your logic would be useful even we do not track the coordinate.

I would appreciate any feedback.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728
362717912 https://github.com/pydata/xarray/issues/1388#issuecomment-362717912 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDM2MjcxNzkxMg== gajomi 244887 2018-02-02T21:50:05Z 2018-02-02T21:50:05Z CONTRIBUTOR

I just came across the various argmax/idxmax (and related min) related issues recently in a project I have been working on. In addition to agreeing that docs should be updated when appropriate here are my two or three cents:

  • As someone new to xarray I like the idea of having both argmax/argmin and argmax_indices/argmin_indices, with the former returning the coordinate indices and the latter the underlying numpy indices analogous to numpy.argmax/numpy.argmin methods. This makes migrating from numpy ndarrays data and collection of associated index arrays obvious (a common path into the xarray world I think).
  • I can also get that idxmax/idxmin might make a better name given that one can have multi-indexed coordinates. If both argmax and idxmax methods are retained probably good to have docs cross reference.
  • In any case, to respond to @fujiisoup's above proposal, I like the idea of retaining the dimension names in the output, and adding a dimension to hold argmax dims, but think it might make more sense to output a DataArray. By way of example, if I had something like: ```python size = (2,2,2,2) dims = list("wxyz") data = np.random.rand(*size) coords = {dim:["{0}_{1}".format(dim,s) for s in range(s)] for dim,s in zip(dims,size)} da = xr.DataArray(data, dims=dims, coords=coords)

da <xarray.DataArray (w: 2, x: 2, y: 2, z: 2)> array([[[[ 0.149945, 0.230338], [ 0.626969, 0.299918]],

    [[ 0.351764,  0.286436],
     [ 0.130604,  0.982152]]],


   [[[ 0.262667,  0.950426],
     [ 0.76655 ,  0.681631]],

    [[ 0.635468,  0.735071],
     [ 0.901116,  0.601303]]]])

Coordinates: * w (w) <U3 'w_0' 'w_1' * x (x) <U3 'x_0' 'x_1' * y (y) <U3 'y_0' 'y_1' * z (z) <U3 'z_0' 'z_1' ```

I would like to get something like the following

```python

argmax(da) <xarray.DataArray '_argmax' (argmaxdim: 4)> array(['w_0', 'x_1', 'y_1', 'z_1'], dtype='<U3') Coordinates: * argmaxdim (argmaxdim) <U1 'w' 'x' 'y' 'z'

argmax(da, dim=list("wy")) <xarray.DataArray '_argmax' (x: 2, z: 2, argmaxdim: 2)> array([[['w_1', 'y_1'], ['w_1', 'y_0']],

   [['w_1', 'y_1'],
    ['w_0', 'y_1']]], dtype=object)

Coordinates: * x (x) object 'x_0' 'x_1' * z (z) object 'z_0' 'z_1' * argmaxdim (argmaxdim) <U1 'w' 'y' where the order of the dims in the unreduced and argmax cases are in the right order as above. For reference, just in case that these examples aren't enough to generalize, a horribly inefficient implementation of above (assuming unique maximum indices):python def _argmaxstackeddim(dastacked, ind): keepdims = dastacked.indexes['keepdims'].names values = dastacked.keepdims.values[ind] coords = {keepdim:[val] for keepdim,val in zip(keepdims,values)} result = dastacked.sel(keepdims=values)\ .pipe(argmax)\ .expand_dims(keepdims)\ .assign_coords(**coords) return result

def argmax(da, dim=None): daname = "" if da.name is None else da.name name = daname+"_argmax" if dim is None: maxda = da.where(da == da.max(),drop=True) dims = list(maxda.dims) dimmaxvals = [maxda.coords[dim].values[0] for dim in dims] result = xr.DataArray(dimmaxvals, dims='argmaxdim', coords={'argmaxdim':dims}, name = name) return result else: if isinstance(dim,str): dim = [dim] keepdims = [d for d in da.dims if d not in dim] dastacked = da.stack(keepdims = keepdims) slices = [_argmaxstackeddim(dastacked,i) for i in range(len(dastacked.keepdims))] return xr.merge(slices)[name] ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728
338397633 https://github.com/pydata/xarray/issues/1388#issuecomment-338397633 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDMzODM5NzYzMw== fujiisoup 6815844 2017-10-21T13:51:04Z 2017-10-21T13:51:04Z MEMBER

I am thinking again how argmin should work with our new vectorizing indexing #1639 . It would be great if arr.isel(**arr.argmin(dim)) == arr.min(dim) could be satisfied even with a multi-dimensional array, although the behavior is different from numpy.argmin. (Maybe our current min should be replaced by arr.isel(**arr.argmin(dim)) so that it preserves the coordinates.)

(We discussed the name for this new method in #1469 but here I just use argmin for the simplicity.)

For example with a three dimensional array with dims=['x', 'y', 'z'], such as arr = xr.DataArray(np.random.randn(4, 3, 2), dims=['x', 'y', 'z']) I am thinking that... + arr.argmin() would return a xr.Dataset which contains 'x', 'y', 'z' as its data_vars. 1. ds = arr.argmin(dims=None) case: - ds['x'], ds['y'], ds['z'] would be 0d-integers. 2. ds = arr.argmin(dims=['x', 'y']) case: - ds['x'], ds['y'], ds['z'] would be 1d-integer-arrays. - The dimension of these three arrays would be 'z_argmin', where ds['z_argmin'] == arr['z']. 3. ds = arr.argmin(dims='x') case: - ds['x'], ds['y'], ds['z'] would be 2d-integer-arrays. - The dimensions of these three arrays are 'y_argmin' and 'z_argmin', where ds['y_argmin'] == arr['y'] and ds['z_argmin'] == arr['z'].

The above proposal for ii (and iii) is not quite clean, as if it is used as an argument of isel, it appends a new coordinate 'z_argmin', which is just a duplicate of 'arr['z']', i.e. arr.isel(**arr.argmin(dims=['x', 'y']))['z_argmin'] == arr['z'].

Any thoughts are welcome.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728
309280411 https://github.com/pydata/xarray/issues/1388#issuecomment-309280411 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDMwOTI4MDQxMQ== fujiisoup 6815844 2017-06-18T14:18:40Z 2017-06-19T12:56:03Z MEMBER

I'm working to fix this and I would like to make some design decisions;

  1. What should max() look like? I guess this method should work also for multi-dimensional data. To satisfy the arr.isel_points(**arr.argmin_indices(dim)) == arg.min(dim) relation, the result array should have proper coordinates?

  2. Multiple dim arguments Currently, doc says argmin accepts multiple axes, but np.argmin does not. Can we limit argmin's arguments only str not sequence of strs?

Edit:

  1. Multi-dimensional array to isel_points Currently, isel_points only accepts 1-dimensional array, while the result of argmin_indexes can be multi-dimensional, e.g. python xr.DataArray(np.random.randn(4, 3, 2), dims=['x', 'y', 'z']).argmin_indexes(dims=['x']) Do we need special treatment for this (maybe in isel_points) or just raise an Error (current behavior)?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728
298256260 https://github.com/pydata/xarray/issues/1388#issuecomment-298256260 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDI5ODI1NjI2MA== shoyer 1217238 2017-04-30T20:50:49Z 2017-04-30T20:50:49Z MEMBER

I agree that arr[arr.argmin(dim)] == arr.min(dim) is a useful invariant, but currently xarray's indexing works a little differently from NumPy. Probably arr.isel_points(**arr.argmin(dim)) == arg.min(dim) is the better invariant for now, and in the future arr[arr.argmin(dim)] will change to work consistently (#974).

The main downside of returning a tuple or dict from argmin() is that it makes the common case of taking the max/min over one dimension a little harder. So possibly it would be better to write two methods:

  • argmin would work like it does currently (returning an xarray object), but error if reducing over multiple dimensions.
  • argmin_indices would return a dict suitable for use in indexing.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728
298253809 https://github.com/pydata/xarray/issues/1388#issuecomment-298253809 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDI5ODI1MzgwOQ== lamorton 23484003 2017-04-30T20:08:25Z 2017-04-30T20:08:25Z NONE

Well, xarray at least agrees with numpy's implementation of that function, but that's not to say it is 'correct.' It would be nice if numpy.argmin worked intuitively. That aside, it seems to me that applying min() to a xr.DataArray should return a reduced array with length 1 in each dimension; then you could just query this object and find the coordinate/dimension values. Perhaps then argmin() would just get a tuple of axis indices, such that arr[*arr.argmin()] == arr.min() would hold.

The next question is, what happens if you start supplying coordinate/dimension optional arguments to argmin? It doesn't make sense to minimize over a coordinate, so only dimensions should be accepted. This should result in a tuple of lists, the way numpy.where does.

Does that seem reasonable?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728
297845435 https://github.com/pydata/xarray/issues/1388#issuecomment-297845435 https://api.github.com/repos/pydata/xarray/issues/1388 MDEyOklzc3VlQ29tbWVudDI5Nzg0NTQzNQ== shoyer 1217238 2017-04-27T21:33:06Z 2017-04-27T21:33:06Z MEMBER

Agreed, the current implementation of argmin() only gives the correct result when given one dimension. It's not entirely obvious what argmin() should yield when done to multiple dimensions, certainly this is not very useful. I would prefer raising an error to this current behavior.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  argmin / argmax behavior doesn't match documentation 224878728

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 27.085ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows