{"database": "github", "table": "issues", "is_view": false, "human_description_en": "where state_reason = \"completed\", \"updated_at\" is on date 2021-03-31 and user = 35968931 sorted by updated_at descending", "rows": [[671609109, "MDU6SXNzdWU2NzE2MDkxMDk=", 4300, "General curve fitting method", 35968931, "closed", 0, null, null, 9, "2020-08-02T12:35:49Z", "2021-03-31T16:55:53Z", "2021-03-31T16:55:53Z", "MEMBER", null, null, null, "Xarray should have a general curve-fitting function as part of its main API.\r\n\r\n## Motivation\r\n\r\nYesterday I wanted to fit a simple decaying exponential function to the data in a DataArray and realised there currently isn't an immediate way to do this in xarray. You have to either pull out the `.values` (losing the power of dask), or use `apply_ufunc` (complicated).\r\n\r\nThis is an incredibly common, domain-agnostic task, so although I don't think we should support various kinds of unusual optimisation procedures (which could always go in an extension package instead), I think a basic fitting method is within scope for the main library. There are [SO questions](https://stackoverflow.com/questions/62987617/using-scipy-curve-fit-with-dask-xarray) asking how to achieve this.\r\n\r\nWe already have [`.polyfit` and `polyval` anyway](https://github.com/pydata/xarray/pull/3733/files#), which are more specific. (@AndrewWilliams3142 and @aulemahal I expect you will have thoughts on how implement this generally.)\r\n\r\n## Proposed syntax\r\n\r\nI want something like this to work:\r\n\r\n```python\r\ndef exponential_decay(xdata, A=10, L=5):\r\n    return A*np.exp(-xdata/L)\r\n\r\n# returns a dataset containing the optimised values of each parameter\r\nfitted_params = da.fit(exponential_decay)\r\n\r\nfitted_line = exponential_decay(da.x, A=fitted_params['A'], L=fitted_params['L'])\r\n\r\n# Compare\r\nda.plot(ax)\r\nfitted_line.plot(ax)\r\n```\r\n\r\nIt would also be nice to be able to fit in multiple dimensions. That means both for example fitting a 2D function to 2D data:\r\n\r\n```python\r\ndef hat(xdata, ydata, h=2, r0=1):\r\n    r = xdata**2 + ydata**2\r\n    return h*np.exp(-r/r0)\r\n\r\nfitted_params = da.fit(hat)\r\n\r\nfitted_hat = hat(da.x, da.y, h=fitted_params['h'], r0=fitted_params['r0'])\r\n```\r\n\r\nbut also repeatedly fitting a 1D function to 2D data:\r\n\r\n```python\r\n# da now has a y dimension too\r\nfitted_params = da.fit(exponential_decay, fit_along=['x'])\r\n\r\n# As fitted_params now has y-dependence, broadcasting means fitted_lines does too\r\nfitted_lines = exponential_decay(da.x, A=fitted_params.A, L=fitted_params.L)\r\n```\r\nThe latter would be useful for fitting the same curve to multiple model runs, but means we need some kind of `fit_along` or `dim` argument, which would default to all dims.\r\n\r\nSo the method docstring would end up like\r\n```python\r\ndef fit(self, f, fit_along=None, skipna=None, full=False, cov=False):\r\n    \"\"\"\r\n    Fits the function f to the DataArray.\r\n\r\n    Expects the function f to have a signature like\r\n    `result = f(*coords, **params)`\r\n    for example\r\n    `result_da = f(da.xcoord, da.ycoord, da.zcoord, A=5, B=None)`\r\n    The names of the `**params` kwargs will be used to name the output variables.\r\n\r\n    Returns\r\n    -------\r\n    fit_results - A single dataset which contains the variables (for each parameter in the fitting function):\r\n    `param1`\r\n        The optimised fit coefficients for parameter one.\r\n    `param1_residuals`\r\n        The residuals of the fit for parameter one.\r\n    ...\r\n    \"\"\"\r\n\r\n```\r\n\r\n## Questions\r\n\r\n1) Should it wrap `scipy.optimise.curve_fit`, or reimplement it? \r\n\r\n    Wrapping it is simpler, but as it just calls `least_squares` [under the hood](https://github.com/scipy/scipy/blob/v1.5.2/scipy/optimize/minpack.py#L532-L834) then reimplementing it would mean we could use the dask-powered version of `least_squares` (like [`da.polyfit does`](https://github.com/pydata/xarray/blob/9058114f70d07ef04654d1d60718442d0555b84b/xarray/core/dataset.py#L5987)).\r\n\r\n2) What form should we expect the curve-defining function to come in?\r\n\r\n    `scipy.optimize.curve_fit` expects the curve to act as `ydata = f(xdata, *params) + eps`, but in xarray then `xdata` could be one or multiple coords or dims, not necessarily a single array. Might it work to require a signature like `result_da = f(da.xcoord, da.ycoord, da.zcoord, ..., **params)`? Then the `.fit` method would be work out how many coords to pass to `f` based on the dimension of the `da` and the `fit_along` argument. But then the order of coord arguments in the signature of `f` would matter, which doesn't seem very xarray-like.\r\n\r\n3) Is it okay to inspect parameters of the curve-defining function?\r\n\r\n    If we tell the user the curve-defining function has to have a signature like `da = func(*coords, **params)`, then we could read the names of the parameters by inspecting the function kwargs. Is that a good idea or might it end up being unreliable? Is the `inspect` standard library module the right thing to use for that? This could also be used to provide default guesses for the fitting parameters.", "{\"url\": \"https://api.github.com/repos/pydata/xarray/issues/4300/reactions\", \"total_count\": 4, \"+1\": 3, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 1}", null, "completed", 13221727, "issue"]], "truncated": false, "filtered_table_rows_count": 1, "expanded_columns": [], "expandable_columns": [[{"column": "repo", "other_table": "repos", "other_column": "id"}, "name"], [{"column": "milestone", "other_table": "milestones", "other_column": "id"}, "title"], [{"column": "assignee", "other_table": "users", "other_column": "id"}, "login"], [{"column": "user", "other_table": "users", "other_column": "id"}, "login"]], "columns": ["id", "node_id", "number", "title", "user", "state", "locked", "assignee", "milestone", "comments", "created_at", "updated_at", "closed_at", "author_association", "active_lock_reason", "draft", "pull_request", "body", "reactions", "performed_via_github_app", "state_reason", "repo", "type"], "primary_keys": ["id"], "units": {}, "query": {"sql": "select id, node_id, number, title, user, state, locked, assignee, milestone, comments, created_at, updated_at, closed_at, author_association, active_lock_reason, draft, pull_request, body, reactions, performed_via_github_app, state_reason, repo, type from issues where \"state_reason\" = :p0 and date(\"updated_at\") = :p1 and \"user\" = :p2 order by updated_at desc limit 101", "params": {"p0": "completed", "p1": "2021-03-31", "p2": "35968931"}}, "facet_results": {"state": {"name": "state", "type": "column", "hideable": false, "toggle_url": "/github/issues.json?state_reason=completed&updated_at__date=2021-03-31&user=35968931", "results": [{"value": "closed", "label": "closed", "count": 1, "toggle_url": "http://xarray-datasette.fly.dev/github/issues.json?state_reason=completed&updated_at__date=2021-03-31&user=35968931&state=closed", "selected": false}], "truncated": false}, "repo": {"name": "repo", "type": "column", "hideable": false, "toggle_url": "/github/issues.json?state_reason=completed&updated_at__date=2021-03-31&user=35968931", "results": [{"value": 13221727, "label": "xarray", "count": 1, "toggle_url": "http://xarray-datasette.fly.dev/github/issues.json?state_reason=completed&updated_at__date=2021-03-31&user=35968931&repo=13221727", "selected": false}], "truncated": false}, "type": {"name": "type", "type": "column", "hideable": false, "toggle_url": "/github/issues.json?state_reason=completed&updated_at__date=2021-03-31&user=35968931", "results": [{"value": "issue", "label": "issue", "count": 1, "toggle_url": "http://xarray-datasette.fly.dev/github/issues.json?state_reason=completed&updated_at__date=2021-03-31&user=35968931&type=issue", "selected": false}], "truncated": false}}, "suggested_facets": [{"name": "created_at", "type": "date", "toggle_url": "http://xarray-datasette.fly.dev/github/issues.json?state_reason=completed&updated_at__date=2021-03-31&user=35968931&_facet_date=created_at"}, {"name": "updated_at", "type": "date", "toggle_url": "http://xarray-datasette.fly.dev/github/issues.json?state_reason=completed&updated_at__date=2021-03-31&user=35968931&_facet_date=updated_at"}, {"name": "closed_at", "type": "date", "toggle_url": "http://xarray-datasette.fly.dev/github/issues.json?state_reason=completed&updated_at__date=2021-03-31&user=35968931&_facet_date=closed_at"}], "next": null, "next_url": null, "private": false, "allow_execute_sql": true, "query_ms": 30.466917902231216}