issues: 221366244

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
221366244	MDU6SXNzdWUyMjEzNjYyNDQ=	1371	Weighted quantile	5572303	open	0			8	2017-04-12T19:29:04Z	2019-03-20T22:34:22Z		CONTRIBUTOR				For our work we frequently need to compute weighted quantiles. This is especially important when we need to weigh data from recent years more heavily in making predictions. I've put together a function (called `weighted_quantile`) largely based on the source code of `np.percentile`. It allows one to input weights along a single dimension, as a dict `w_dict`. Below are some manual tests: When all weights = 1, it's identical to using `np.nanpercentile`: ``` ar0 <xarray.DataArray (x: 3, y: 4)> array([[3, 4, 8, 1], [5, 3, 7, 9], [4, 9, 6, 2]]) Coordinates: * x (x) \|S1 'a' 'b' 'c' * y (y) int64 0 1 2 3 ar0.quantile(q=[0.25, 0.5, 0.75], dim='y') <xarray.DataArray (quantile: 3, x: 3)> array([[ 2.5 , 4.5 , 3.5 ], [ 3.5 , 6. , 5. ], [ 5. , 7.5 , 6.75]]) Coordinates: * x (x) \|S1 'a' 'b' 'c' * quantile (quantile) float64 0.25 0.5 0.75 weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,1,1,1]}) <xarray.DataArray (quantile: 3, x: 3)> array([[ 2.5 , 4.5 , 3.5 ], [ 3.5 , 6. , 5. ], [ 5. , 7.5 , 6.75]]) Coordinates: * x (x) \|S1 'a' 'b' 'c' * quantile (quantile) float64 0.25 0.5 0.75 ``` Now different weights: ``` weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,2,3,4.0]}) <xarray.DataArray (quantile: 3, x: 3)> array([[ 3.25 , 5.666667, 4.333333], [ 4. , 7. , 5.333333], [ 6. , 8. , 6.75 ]]) Coordinates: * x (x) \|S1 'a' 'b' 'c' * quantile (quantile) float64 0.25 0.5 0.75 ``` Also handles nan values like `np.nanpercentile`: ``` ar <xarray.DataArray (x: 2, y: 2, z: 2)> array([[[ nan, 3.], [ nan, 5.]], `[[ 8., 1.], [ nan, 0.]]])` Coordinates: * x (x) \|S1 'a' 'b' * y (y) int64 0 1 * z (z) int64 8 9 da_stacked = ar.stack(mi=['x', 'y']) out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]}) out <xarray.DataArray (quantile: 3, z: 2)> array([[ 8. , 0.75], [ 8. , 2. ], [ 8. , 3.5 ]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 da_stacked.quantile(q=[0.25, 0.5, 0.75], dim='mi') <xarray.DataArray (quantile: 3, z: 2)> array([[ 8. , 0.75], [ 8. , 2. ], [ 8. , 3.5 ]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 ``` Lastly, different interpolation schemes are consistent: ``` out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]}, interpolation='nearest') out <xarray.DataArray (quantile: 3, z: 2)> array([[ 8., 1.], [ 8., 3.], [ 8., 3.]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 da_stacked.quantile(q=[0.25, 0.5, 0.75], dim='mi', interpolation='nearest') <xarray.DataArray (quantile: 3, z: 2)> array([[ 8., 1.], [ 8., 3.], [ 8., 3.]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 ``` We wonder if it's ok to make this part of xarray. If so, the most logical place to implement it would seem to be in `Variable.quantile()`. Another option is to make it a utility function, to be called as `xr.weighted_quantile()`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1371/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			13221727	issue

Links from other tables

0 rows from issues_id in issues_labels
8 rows from issue in issue_comments