html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3810#issuecomment-973623524,https://api.github.com/repos/pydata/xarray/issues/3810,973623524,IC_kwDOAMm_X846CFDk,25071375,2021-11-19T01:00:11Z,2021-11-19T15:09:10Z,CONTRIBUTOR,"Is it possible to add the option of modifying what happens when there is a tie in the rank? (If you want I can create a separate issue for this)
I think this can be done using the scipy rankdata function instead of the bottleneck rank (but also I think that adding the method option for the bottleneck package is also possible).
Small example:
```py
arr = xarray.DataArray(
dask.array.random.random((11, 10), chunks=(3, 2)),
coords={'a': list(range(11)), 'b': list(range(10))}
)
def rank(x: xarray.DataArray, dim: str, method: str):
# This option generate less tasks, I don't know why
axis = x.dims.index(dim)
return xarray.DataArray(
dask.array.apply_along_axis(
rankdata,
axis,
x.data,
dtype=float,
shape=(x.sizes[dim], ),
method=method
),
coords=x.coords,
dims=x.dims
)
def rank2(x: xarray.DataArray, dim: str, method: str):
from scipy.stats import rankdata
axis = x.dims.index(dim)
return xarray.apply_ufunc(
rankdata,
x.chunk({dim: x.sizes[dim]}),
dask='parallelized',
kwargs={'method': method, 'axis': axis},
meta=x.data._meta
)
arr_rank1 = rank(arr, 'a', 'ordinal')
arr_rank2 = rank2(arr, 'a', 'ordinal')
assert arr_rank1.equals(arr_rank2)
```
```py
# Probably this can work for ranking arrays with nan values
def _nanrankdata1(a, method):
y = np.empty(a.shape, dtype=np.float64)
y.fill(np.nan)
idx = ~np.isnan(a)
y[idx] = rankdata(a[idx], method=method)
return y
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480