home / github / pull_requests

Menu
  • Search all tables
  • GraphQL API

pull_requests: 772496364

This data as json

id node_id number state locked title user body created_at updated_at closed_at merged_at merge_commit_sha assignee milestone draft head base author_association auto_merge repo url merged_by
772496364 PR_kwDOAMm_X84uC1vs 5933 open 0 Reimplement `.polyfit()` with `apply_ufunc` 39069044 - [x] Closes #4554 - [x] Closes #5629 - [x] Closes #5644 - [ ] Tests added - [x] Passes `pre-commit run --all-files` - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` Reimplement `polyfit` using `apply_ufunc` rather than `dask.array.linalg.lstsq`. This should solve a number of issues with memory usage and chunking that were reported on the current version of `polyfit`. The main downside is that variables chunked along the fitting dimension cannot be handled with this approach. There is a bunch of fiddly code here for handling the differing outputs from `np.polyfit` depending on the values of the `full` and `cov` args. Depending on the performance implications, we could simplify some by keeping these in `apply_ufunc` and dropping later. Much of this parsing would still be required though, because the only way to get the covariances is to set `cov=True, full=False`. A few minor departures from the previous implementation: 1. The `rank` and `singular_values` diagnostic variables returned by `np.polyfit` are now returned on a pointwise basis, since these can change depending on skipped nans. `np.polyfit` also returns the `rcond` used for each fit which I've included here. 2. As mentioned above, this breaks fitting done along a chunked dimension. To avoid regression, we could set `allow_rechunk=True` and warn about memory implications. 3. Changed default `skipna=True`, since the previous behavior seemed to be a limitation of the computational method. 4. For consistency with the previous version, I included a `transpose` operation to put `degree` as the first dimension. This is arbitrary though, and actually the opposite of how `curvefit` returns ordering. So we could match up with `curvefit` but it would be breaking for polyfit. No new tests have been added since the previous suite was fairly comprehensive. Would be great to get some performance reports on real-world data such as the climate model detrending application in #5629. 2021-11-03T15:29:58Z 2022-10-06T21:42:09Z     3381c5aacaaeecc8a00357896b29f32a95be0b20     0 62b4637d4fcc688a8e2e2c5eece80f64c0605229 d1e4164f3961d7bbb3eb79037e96cae14f7182f8 CONTRIBUTOR   13221727 https://github.com/pydata/xarray/pull/5933  

Links from other tables

  • 3 rows from pull_requests_id in labels_pull_requests
Powered by Datasette · Queries took 0.772ms