pull_requests: 772496364
This data as json
id | node_id | number | state | locked | title | user | body | created_at | updated_at | closed_at | merged_at | merge_commit_sha | assignee | milestone | draft | head | base | author_association | auto_merge | repo | url | merged_by |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
772496364 | PR_kwDOAMm_X84uC1vs | 5933 | open | 0 | Reimplement `.polyfit()` with `apply_ufunc` | 39069044 | - [x] Closes #4554 - [x] Closes #5629 - [x] Closes #5644 - [ ] Tests added - [x] Passes `pre-commit run --all-files` - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` Reimplement `polyfit` using `apply_ufunc` rather than `dask.array.linalg.lstsq`. This should solve a number of issues with memory usage and chunking that were reported on the current version of `polyfit`. The main downside is that variables chunked along the fitting dimension cannot be handled with this approach. There is a bunch of fiddly code here for handling the differing outputs from `np.polyfit` depending on the values of the `full` and `cov` args. Depending on the performance implications, we could simplify some by keeping these in `apply_ufunc` and dropping later. Much of this parsing would still be required though, because the only way to get the covariances is to set `cov=True, full=False`. A few minor departures from the previous implementation: 1. The `rank` and `singular_values` diagnostic variables returned by `np.polyfit` are now returned on a pointwise basis, since these can change depending on skipped nans. `np.polyfit` also returns the `rcond` used for each fit which I've included here. 2. As mentioned above, this breaks fitting done along a chunked dimension. To avoid regression, we could set `allow_rechunk=True` and warn about memory implications. 3. Changed default `skipna=True`, since the previous behavior seemed to be a limitation of the computational method. 4. For consistency with the previous version, I included a `transpose` operation to put `degree` as the first dimension. This is arbitrary though, and actually the opposite of how `curvefit` returns ordering. So we could match up with `curvefit` but it would be breaking for polyfit. No new tests have been added since the previous suite was fairly comprehensive. Would be great to get some performance reports on real-world data such as the climate model detrending application in #5629. | 2021-11-03T15:29:58Z | 2022-10-06T21:42:09Z | 3381c5aacaaeecc8a00357896b29f32a95be0b20 | 0 | 62b4637d4fcc688a8e2e2c5eece80f64c0605229 | d1e4164f3961d7bbb3eb79037e96cae14f7182f8 | CONTRIBUTOR | 13221727 | https://github.com/pydata/xarray/pull/5933 |
Links from other tables
- 3 rows from pull_requests_id in labels_pull_requests