home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 456195815

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1251#issuecomment-456195815 https://api.github.com/repos/pydata/xarray/issues/1251 456195815 MDEyOklzc3VlQ29tbWVudDQ1NjE5NTgxNQ== 1217238 2019-01-21T20:52:07Z 2019-01-21T20:52:07Z MEMBER

I don't think we should consider ourselves beholden to pandas's bad names, but we should definitely try to preserve backwards compatibility and interpretability for users.

Going back to Python itself: - apply(func, args, kwargs) (from Python 2.x) is equivalent to func(*args, **kwargs) - map() maps a function over each element of an iterable - functools.reduce() applies a binary function repeatedly to convert an iterable into a single element

For xarray, we need: 1. a method for wrapping functions that work on unlabeled arrays 2. a method for mapping functions over each element of a Dataset or grouped object. 3. (possibly) a method for wrapping aggregation functions that act on unlabeled arrays

Currently, we call both (1) and (2) apply(), which is pretty confusing, and use reduce() for (3) even though it could potentially be a special case of (1) with a bit of extra magic and is quite unlike functools.reduce. In contrast, pandas calls both (1) and (2) apply() (using raw=True/raw=False to distinguish), and calls (3) aggregate or agg.

So long term, it could make sense to rename the current Dataset.apply()/GroupBy.apply() (case 2) to .map, and also rename .reduce() to the more generic .aggregate().

That said, I'm trying to imagine what the transition process for switching to new behavior for Dataset.apply looks like. We already will re-add dimensions to the output from calling functions in apply(), but at some point we have to a do a hard cut-off from passing DataArray objects to the function in apply to passing in a raw array.

I suppose we could do this by adding a raw keyword-only argument to .apply(): - If raw=False (current default), we would raise a warning about changing behavior and would pass-on DataArray objects to the applied function. Users would be encouraged to use .map() instead. - If raw=True (future default behavior), we would pass in raw numpy/dask arrays to the future function. - The dim argument might only be supported with raw=True.

We would end up with an extra extraneous raw argument, which we could remove/deprecate at our leisure.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  205455788
Powered by Datasette · Queries took 1.291ms · About: xarray-datasette