home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 483280810 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 2
  • ulijh 2

author_association 2

  • CONTRIBUTOR 2
  • MEMBER 2

issue 1

  • ``argmax()`` causes dask to compute · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
524075569 https://github.com/pydata/xarray/issues/3237#issuecomment-524075569 https://api.github.com/repos/pydata/xarray/issues/3237 MDEyOklzc3VlQ29tbWVudDUyNDA3NTU2OQ== ulijh 13190237 2019-08-22T21:00:34Z 2019-08-22T21:00:34Z CONTRIBUTOR

Thanks @shoyer. Cool, then this was easier than I expected. I added the patch and nanargmax/min to the nputils in #3244. What do you think?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ``argmax()`` causes dask to compute 483280810
523991579 https://github.com/pydata/xarray/issues/3237#issuecomment-523991579 https://api.github.com/repos/pydata/xarray/issues/3237 MDEyOklzc3VlQ29tbWVudDUyMzk5MTU3OQ== shoyer 1217238 2019-08-22T17:03:27Z 2019-08-22T17:03:27Z MEMBER

Thanks for sharing the patch! I dropped into a debugger by adding --pdb to the pytest command, which revealed what is going on here: ```

PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /Users/shoyer/dev/xarray/xarray/core/duck_array_ops.py(295)f() -> raise NotImplementedError(msg) (Pdb) values array(['2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.000000000', '2000-01-03T00:00:00.000000000'], dtype='datetime64[ns]') (Pdb) func <function nanargmax at 0x1156ff730> (Pdb) func(values, axis=axis, **kwargs) *** AttributeError: module 'xarray.core.nputils' has no attribute 'nanargmax' ```

So it looks like nputils doesn't have nanargmax defined. Instead we need to use nanargmax from NumPy.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ``argmax()`` causes dask to compute 483280810
523952983 https://github.com/pydata/xarray/issues/3237#issuecomment-523952983 https://api.github.com/repos/pydata/xarray/issues/3237 MDEyOklzc3VlQ29tbWVudDUyMzk1Mjk4Mw== ulijh 13190237 2019-08-22T15:22:33Z 2019-08-22T15:22:33Z CONTRIBUTOR

Those little changes do solve the MCVE, but break at least one test. I don't have enough of an understanding of the (nan)ops logic in xarray to get around the issue. But may be this helps:

The change

``` diff --git a/xarray/core/nanops.py b/xarray/core/nanops.py index 9ba4eae2..784a1d01 100644 --- a/xarray/core/nanops.py +++ b/xarray/core/nanops.py @@ -91,17 +91,9 @@ def nanargmin(a, axis=None): fill_value = dtypes.get_pos_infinity(a.dtype) if a.dtype.kind == "O": return _nan_argminmax_object("argmin", fill_value, a, axis=axis) - a, mask = _replace_nan(a, fill_value) - if isinstance(a, dask_array_type): - res = dask_array.argmin(a, axis=axis) - else: - res = np.argmin(a, axis=axis)

  • if mask is not None:
  • mask = mask.all(axis=axis)
  • if mask.any():
  • raise ValueError("All-NaN slice encountered")
  • return res
  • module = dask_array if isinstance(a, dask_array_type) else nputils
  • return module.nanargmin(a, axis=axis)

def nanargmax(a, axis=None): @@ -109,17 +101,8 @@ def nanargmax(a, axis=None): if a.dtype.kind == "O": return _nan_argminmax_object("argmax", fill_value, a, axis=axis)

  • a, mask = _replace_nan(a, fill_value)
  • if isinstance(a, dask_array_type):
  • res = dask_array.argmax(a, axis=axis)
  • else:
  • res = np.argmax(a, axis=axis)

  • if mask is not None:
  • mask = mask.all(axis=axis)
  • if mask.any():
  • raise ValueError("All-NaN slice encountered")
  • return res
  • module = dask_array if isinstance(a, dask_array_type) else nputils
  • return module.nanargmax(a, axis=axis)

def nansum(a, axis=None, dtype=None, out=None, min_count=None):

```

The failing test

``` python
...
__ TestVariable.testreduce __
...

def f(values, axis=None, skipna=None, **kwargs):
    if kwargs.pop("out", None) is not None:
        raise TypeError("`out` is not valid for {}".format(name))

    values = asarray(values)

    if coerce_strings and values.dtype.kind in "SU":
        values = values.astype(object)

    func = None
    if skipna or (skipna is None and values.dtype.kind in "cfO"):
        nanname = "nan" + name
        func = getattr(nanops, nanname)
    else:
        func = _dask_or_eager_func(name)

    try:
        return func(values, axis=axis, **kwargs)
    except AttributeError:
        if isinstance(values, dask_array_type):
            try:  # dask/dask#3133 dask sometimes needs dtype argument
                # if func does not accept dtype, then raises TypeError
                return func(values, axis=axis, dtype=values.dtype, **kwargs)
            except (AttributeError, TypeError):
                msg = "%s is not yet implemented on dask arrays" % name
        else:
            msg = (
                "%s is not available with skipna=False with the "
                "installed version of numpy; upgrade to numpy 1.12 "
                "or newer to use skipna=True or skipna=None" % name
            )
      raise NotImplementedError(msg)

E NotImplementedError: argmax is not available with skipna=False with the installed version of numpy; upgrade to numpy 1.12 or newer to use skipna=True or skipna=None

... ``` Note: I habe numpy 1.17 instaleed so the error msg here seems missleading.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ``argmax()`` causes dask to compute 483280810
523659811 https://github.com/pydata/xarray/issues/3237#issuecomment-523659811 https://api.github.com/repos/pydata/xarray/issues/3237 MDEyOklzc3VlQ29tbWVudDUyMzY1OTgxMQ== shoyer 1217238 2019-08-21T21:38:26Z 2019-08-21T21:38:26Z MEMBER

Yes, this is definitely a bug -- thanks for clear example to reproduce it!

These helper functions were originally added back in https://github.com/pydata/xarray/pull/1883 to handle object dtype arrays properly.

So it would be nice to fix this for object arrays in dask, but for the much more common non-object dtype arrays we should really just be using dask.array.nnargmax.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ``argmax()`` causes dask to compute 483280810

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1363.041ms · About: xarray-datasette