home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "MEMBER" and issue = 29136905 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • shoyer 6
  • mathause 1

issue 1

  • Implement DataArray.idxmax() · 7 ✖

author_association 1

  • MEMBER · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
598493015 https://github.com/pydata/xarray/issues/60#issuecomment-598493015 https://api.github.com/repos/pydata/xarray/issues/60 MDEyOklzc3VlQ29tbWVudDU5ODQ5MzAxNQ== shoyer 1217238 2020-03-13T00:43:48Z 2020-03-13T00:43:48Z MEMBER

idxmax() should return the coordinate labels, not integer positions, corresponding to the max.

e.g., xr.DataArray([1, 3, 2], dims=['x'], coords={'x': [10, 20, 30]}).argmax() should return 20 (but probably inside an xarray.DataArray)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement DataArray.idxmax() 29136905
598487450 https://github.com/pydata/xarray/issues/60#issuecomment-598487450 https://api.github.com/repos/pydata/xarray/issues/60 MDEyOklzc3VlQ29tbWVudDU5ODQ4NzQ1MA== mathause 10194086 2020-03-13T00:16:32Z 2020-03-13T00:16:32Z MEMBER

How would idxmax be different to argmax? E.g.

python import xarray as xr xr.DataArray([1, 3, 2]).argmax() Could this be closed?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement DataArray.idxmax() 29136905
457059732 https://github.com/pydata/xarray/issues/60#issuecomment-457059732 https://api.github.com/repos/pydata/xarray/issues/60 MDEyOklzc3VlQ29tbWVudDQ1NzA1OTczMg== shoyer 1217238 2019-01-24T04:05:17Z 2019-01-24T04:05:17Z MEMBER

This is still relevant

{
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement DataArray.idxmax() 29136905
276543337 https://github.com/pydata/xarray/issues/60#issuecomment-276543337 https://api.github.com/repos/pydata/xarray/issues/60 MDEyOklzc3VlQ29tbWVudDI3NjU0MzMzNw== shoyer 1217238 2017-02-01T01:01:27Z 2017-02-01T01:01:27Z MEMBER

Would using obj.fillna(0) not mess with argmax if for instance all the data is negative? Could fill with the min value instead?

Indeed, fillna(0) won't work right. For what I was thinking of, we could use the three argument version of where (#576) here, e.g., obj.where(allna, 0). But fillna with the min value could also work -- that's actually exactly how np.nanargmax works.

Ah yes true. I was slightly anticipating e.g. filling with NaT if the dim was time-like, though time types are not something I am familiar with.

Yes, ideally we would detect the dtype and find an appropriate fill or minimum value, similar to _maybe_promote. The argument to fillna would either be a scalar (for a DataArray)` or a dict (for a Dataset).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement DataArray.idxmax() 29136905
276538303 https://github.com/pydata/xarray/issues/60#issuecomment-276538303 https://api.github.com/repos/pydata/xarray/issues/60 MDEyOklzc3VlQ29tbWVudDI3NjUzODMwMw== shoyer 1217238 2017-02-01T00:30:32Z 2017-02-01T00:30:32Z MEMBER

Yes, that looks pretty reasonable. Two minor concerns: - obj.fillna(-np.inf) converts all dtypes to float. It would be better to stick to obj.fillna(0), though integers can't have NaNs anyways. - I'm pretty sure .fillna(np.nan) is a no-op, filling in NaNs with NaN.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement DataArray.idxmax() 29136905
276235524 https://github.com/pydata/xarray/issues/60#issuecomment-276235524 https://api.github.com/repos/pydata/xarray/issues/60 MDEyOklzc3VlQ29tbWVudDI3NjIzNTUyNA== shoyer 1217238 2017-01-31T00:21:35Z 2017-01-31T00:21:35Z MEMBER

take is numpy function that only handles scalar or 1d arguments: https://docs.scipy.org/doc/numpy/reference/generated/numpy.take.html#numpy.take

I just merged #1237 -- see if it works with that.

multiple maxes is presumably fine as long as user is aware it just takes the first.

Yeah, that's not a problem here, only for the where based implementation.

However, nanargmax is probably the actual desired function here, but looks like it will raise on all-nan slices. Would dropping these and then re-aligning be too much overhead?

This behavior for nanargmax is unfortunate. The "right" behavior for xarray is probably to use NaN or NaT to mark the index in such locations but numpy makes this tricky. I think this could be achieved, though, with some mix of where, isnull and other vectorized operations. Basically you need to replace all NaN slices with some placeholder value before calculating nanargmax, and then use the locations of all NaN slices again to replace the results of nanargmax with the appropriate fill value.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement DataArray.idxmax() 29136905
275960531 https://github.com/pydata/xarray/issues/60#issuecomment-275960531 https://api.github.com/repos/pydata/xarray/issues/60 MDEyOklzc3VlQ29tbWVudDI3NTk2MDUzMQ== shoyer 1217238 2017-01-30T00:54:09Z 2017-01-30T17:30:48Z MEMBER

See http://stackoverflow.com/questions/40179593/how-to-get-the-coordinates-of-the-maximum-in-xarray for examples of how to do this with the current version of xarray. @MaximilianR's answer using where is pretty clean, but maybe not the most efficient or exactly what we want. (I think it breaks in a few edge cases, such as if the max value appears multiple times, or the array is all NaN.)

@jcmgray Your proposal looks pretty close to me. But to handle higher dimension arrays, instead of take(y, indx), I think you need to NumPy style fancy indexing, y[indx,]. That doesn't work with dask, so you'll need to write a function function that uses dask.array.map_blocks when necessary.

I think something like the following would work: ```python def _index_from_1d_array(array, indices): return array[indices,]

def gufunc_idxmax(x, y, axis=None): # note: y is always a numpy.ndarray, because IndexVariable objects # always have their data loaded into memory indx = argmax(x, axis) func = functools.partial(_index_from_1d_array, y)

if isinstance(array, dask_array_type):
    import dask.array as da
    return da.map_blocks(func, indx, dtype=indx.dtype)
else:
    return func(indx)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement DataArray.idxmax() 29136905

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.286ms · About: xarray-datasette