home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 221366244

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
221366244 MDU6SXNzdWUyMjEzNjYyNDQ= 1371 Weighted quantile 5572303 open 0     8 2017-04-12T19:29:04Z 2019-03-20T22:34:22Z   CONTRIBUTOR      

For our work we frequently need to compute weighted quantiles. This is especially important when we need to weigh data from recent years more heavily in making predictions.

I've put together a function (called weighted_quantile) largely based on the source code of np.percentile. It allows one to input weights along a single dimension, as a dict w_dict. Below are some manual tests:

When all weights = 1, it's identical to using np.nanpercentile: ```

ar0 <xarray.DataArray (x: 3, y: 4)> array([[3, 4, 8, 1], [5, 3, 7, 9], [4, 9, 6, 2]]) Coordinates: * x (x) |S1 'a' 'b' 'c' * y (y) int64 0 1 2 3 ar0.quantile(q=[0.25, 0.5, 0.75], dim='y') <xarray.DataArray (quantile: 3, x: 3)> array([[ 2.5 , 4.5 , 3.5 ], [ 3.5 , 6. , 5. ], [ 5. , 7.5 , 6.75]]) Coordinates: * x (x) |S1 'a' 'b' 'c' * quantile (quantile) float64 0.25 0.5 0.75 weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,1,1,1]}) <xarray.DataArray (quantile: 3, x: 3)> array([[ 2.5 , 4.5 , 3.5 ], [ 3.5 , 6. , 5. ], [ 5. , 7.5 , 6.75]]) Coordinates: * x (x) |S1 'a' 'b' 'c' * quantile (quantile) float64 0.25 0.5 0.75 ```

Now different weights: ```

weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,2,3,4.0]}) <xarray.DataArray (quantile: 3, x: 3)> array([[ 3.25 , 5.666667, 4.333333], [ 4. , 7. , 5.333333], [ 6. , 8. , 6.75 ]]) Coordinates: * x (x) |S1 'a' 'b' 'c' * quantile (quantile) float64 0.25 0.5 0.75 ```

Also handles nan values like np.nanpercentile: ```

ar <xarray.DataArray (x: 2, y: 2, z: 2)> array([[[ nan, 3.], [ nan, 5.]],

   [[  8.,   1.],
    [ nan,   0.]]])

Coordinates: * x (x) |S1 'a' 'b' * y (y) int64 0 1 * z (z) int64 8 9

da_stacked = ar.stack(mi=['x', 'y']) out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]}) out <xarray.DataArray (quantile: 3, z: 2)> array([[ 8. , 0.75], [ 8. , 2. ], [ 8. , 3.5 ]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 da_stacked.quantile(q=[0.25, 0.5, 0.75], dim='mi') <xarray.DataArray (quantile: 3, z: 2)> array([[ 8. , 0.75], [ 8. , 2. ], [ 8. , 3.5 ]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 ```

Lastly, different interpolation schemes are consistent: ```

out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]}, interpolation='nearest') out <xarray.DataArray (quantile: 3, z: 2)> array([[ 8., 1.], [ 8., 3.], [ 8., 3.]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 da_stacked.quantile(q=[0.25, 0.5, 0.75], dim='mi', interpolation='nearest') <xarray.DataArray (quantile: 3, z: 2)> array([[ 8., 1.], [ 8., 3.], [ 8., 3.]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 ```

We wonder if it's ok to make this part of xarray. If so, the most logical place to implement it would seem to be in Variable.quantile(). Another option is to make it a utility function, to be called as xr.weighted_quantile().

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1371/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 8 rows from issue in issue_comments
Powered by Datasette · Queries took 0.837ms · About: xarray-datasette