issues: 499477363
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
499477363 | MDU6SXNzdWU0OTk0NzczNjM= | 3349 | Implement polyfit? | 1197350 | closed | 0 | 25 | 2019-09-27T14:25:14Z | 2020-03-25T17:17:45Z | 2020-03-25T17:17:45Z | MEMBER | Fitting a line (or curve) to data along a specified axis is a long-standing need of xarray users. There are many blog posts and SO questions about how to do it: - http://atedstone.github.io/rate-of-change-maps/ - https://gist.github.com/luke-gregor/4bb5c483b2d111e52413b260311fbe43 - https://stackoverflow.com/questions/38960903/applying-numpy-polyfit-to-xarray-dataset - https://stackoverflow.com/questions/52094320/with-xarray-how-to-parallelize-1d-operations-on-a-multidimensional-dataset - https://stackoverflow.com/questions/36275052/applying-a-function-along-an-axis-of-a-dask-array The main use case in my domain is finding the temporal trend on a 3D variable (e.g. temperature in time, lon, lat). Yes, you can do it with apply_ufunc, but apply_ufunc is inaccessibly complex for many users. Much of our existing API could be removed and replaced with apply_ufunc calls, but that doesn't mean we should do it. I am proposing we add a Dataarray method called ```python x_ = np.linspace(0, 1, 10) y_ = np.arange(5) a_ = np.cos(y_) x = xr.DataArray(x_, dims=['x'], coords={'x': x_}) a = xr.DataArray(a_, dims=['y']) f = a*x p = f.polyfit(dim='x', deg=1) equivalent numpy codep_ = np.polyfit(x_, f.values.transpose(), 1) np.testing.assert_allclose(p_[0], a_) ``` Numpy's polyfit function is already vectorized in the sense that it accepts 1D x and 2D y, performing the fit independently over each column of y. To extend this to ND, we would just need to reshape the data going in and out of the function. We do this already in other packages. For dask, we could simply require that the dimension over which the fit is calculated be contiguous, and then call map_blocks. Thoughts? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3349/reactions", "total_count": 9, "+1": 9, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |