issue_comments: 456195815

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/1251#issuecomment-456195815	https://api.github.com/repos/pydata/xarray/issues/1251	456195815	MDEyOklzc3VlQ29tbWVudDQ1NjE5NTgxNQ==	1217238	2019-01-21T20:52:07Z	2019-01-21T20:52:07Z	MEMBER	I don't think we should consider ourselves beholden to pandas's bad names, but we should definitely try to preserve backwards compatibility and interpretability for users. Going back to Python itself: - `apply(func, args, kwargs)` (from Python 2.x) is equivalent to `func(args, *kwargs)` - `map()` maps a function over each element of an iterable - `functools.reduce()` applies a binary function repeatedly to convert an iterable into a single element For xarray, we need: 1. a method for wrapping functions that work on unlabeled arrays 2. a method for mapping functions over each element of a Dataset or grouped object. 3. (possibly) a method for wrapping aggregation functions that act on unlabeled arrays Currently, we call both (1) and (2) `apply()`, which is pretty confusing, and use `reduce()` for (3) even though it could potentially be a special case of (1) with a bit of extra magic and is quite unlike `functools.reduce`. In contrast, pandas calls both (1) and (2) `apply()` (using `raw=True`/`raw=False` to distinguish), and calls (3) `aggregate` or `agg`. So long term, it could make sense to rename the current `Dataset.apply()`/`GroupBy.apply()` (case 2) to `.map`, and also rename `.reduce()` to the more generic `.aggregate()`. That said, I'm trying to imagine what the transition process for switching to new behavior for `Dataset.apply` looks like. We already will re-add dimensions to the output from calling functions in `apply()`, but at some point we have to a do a hard cut-off from passing `DataArray` objects to the function in `apply` to passing in a raw array. I suppose we could do this by adding a `raw` keyword-only argument to `.apply()`: - If `raw=False` (current default), we would raise a warning about changing behavior and would pass-on `DataArray` objects to the applied function. Users would be encouraged to use `.map()` instead. - If `raw=True` (future default behavior), we would pass in raw numpy/dask arrays to the future function. - The `dim` argument might only be supported with `raw=True`. We would end up with an extra extraneous `raw` argument, which we could remove/deprecate at our leisure.	{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		205455788