home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1043746973

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1043746973 PR_kwDOAMm_X84uC1vs 5933 Reimplement `.polyfit()` with `apply_ufunc` 39069044 open 0     6 2021-11-03T15:29:58Z 2022-10-06T21:42:09Z   CONTRIBUTOR   0 pydata/xarray/pulls/5933
  • [x] Closes #4554
  • [x] Closes #5629
  • [x] Closes #5644
  • [ ] Tests added
  • [x] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst

Reimplement polyfit using apply_ufunc rather than dask.array.linalg.lstsq. This should solve a number of issues with memory usage and chunking that were reported on the current version of polyfit. The main downside is that variables chunked along the fitting dimension cannot be handled with this approach.

There is a bunch of fiddly code here for handling the differing outputs from np.polyfit depending on the values of the full and cov args. Depending on the performance implications, we could simplify some by keeping these in apply_ufunc and dropping later. Much of this parsing would still be required though, because the only way to get the covariances is to set cov=True, full=False.

A few minor departures from the previous implementation: 1. The rank and singular_values diagnostic variables returned by np.polyfit are now returned on a pointwise basis, since these can change depending on skipped nans. np.polyfit also returns the rcond used for each fit which I've included here. 2. As mentioned above, this breaks fitting done along a chunked dimension. To avoid regression, we could set allow_rechunk=True and warn about memory implications. 3. Changed default skipna=True, since the previous behavior seemed to be a limitation of the computational method. 4. For consistency with the previous version, I included a transpose operation to put degree as the first dimension. This is arbitrary though, and actually the opposite of how curvefit returns ordering. So we could match up with curvefit but it would be breaking for polyfit.

No new tests have been added since the previous suite was fairly comprehensive. Would be great to get some performance reports on real-world data such as the climate model detrending application in #5629.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5933/reactions",
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 6 rows from issue in issue_comments
Powered by Datasette · Queries took 0.819ms · About: xarray-datasette