home / github / pull_requests

Menu
  • GraphQL API
  • Search all tables

pull_requests: 799424119

This data as json

id node_id number state locked title user body created_at updated_at closed_at merged_at merge_commit_sha assignee milestone draft head base author_association auto_merge repo url merged_by
799424119 PR_kwDOAMm_X84vpj53 6059 closed 0 Weighted quantile 488992 - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` This is a follow-up to https://github.com/pydata/xarray/pull/5870/, which adds a weighted `quantile` function. The question of how to precisely define the weighted quantile function is surprisingly complex, and this implementation offers a compromise in terms of simplicity and compatibility: * The only interpolation method supported is the so-called "Type 7", as explained in https://aakinshin.net/posts/weighted-quantiles/, which proposes an R implementation, that I have adapted * It turns out that Type 7 is apparently the most "popular" one, at least in the Python world: it corresponds to the default `linear` interpolation option of `numpy.quantile` (https://numpy.org/doc/stable/reference/generated/numpy.quantile.html) which is also the basis of xarray's already existing non-weighted quantile function * I have taken care in making sure that the results of this new function, with equal weights, are equivalent to the ones of the already existing, non-weighted function (when used with its default interporlation option) The interpolation question is so complex and confusing that entire articles have been written about it, as mentioned in the blog post above, in particular this one, which establishes the "nine types" taxoxomy, used, implicitly or not, by many software packages: https://doi.org/10.2307/2684934. The situation seems even more complex in the NumPy world, where many discussions and suggestions are aimed toward trying to improve the consistency of the API. The current non-weighted situation has the 9 options, as well as 4 extra legacy ones: https://github.com/numpy/numpy/blob/376ad691fe4df77e502108d279872f56b30376dc/numpy/lib/function_base.py#L4177-L4203 This PR cuts the Gordian knot by offering only one interpolation option, but.. given that its implementation is based on `apply_ufunc` (in a very similar way to xarray's already existing non-weighted `quantile` function, which is also using `apply_ufunc` with `np.quantile`), in the event that `np.quantile` ever gains a `weights` keyword argument, it would be very easy to swap it. That way, xarray's weighted `quantile` could lose a little bit of code, and gain a plethora of interpolation options. 2021-12-10T01:11:36Z 2022-03-27T20:36:22Z 2022-03-27T20:36:22Z 2022-03-27T20:36:22Z 8a2fbb89ea6880ef43362281792dce2005b6d08a     0 c298bd0b63e8502e06ed219a39e4f1eebd14dba4 8f42bfd3a5fd0b1a351b535be207ed4771b02c8b CONTRIBUTOR   13221727 https://github.com/pydata/xarray/pull/6059  

Links from other tables

  • 2 rows from pull_requests_id in labels_pull_requests
Powered by Datasette · Queries took 1.02ms