html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3810#issuecomment-973623524,https://api.github.com/repos/pydata/xarray/issues/3810,973623524,IC_kwDOAMm_X846CFDk,25071375,2021-11-19T01:00:11Z,2021-11-19T15:09:10Z,CONTRIBUTOR,"Is it possible to add the option of modifying what happens when there is a tie in the rank? (If you want I can create a separate issue for this)
I think this can be done using the scipy rankdata function instead of the bottleneck rank (but also I think that adding the method option for the bottleneck package is also possible).
Small example:
```py
arr = xarray.DataArray(
dask.array.random.random((11, 10), chunks=(3, 2)),
coords={'a': list(range(11)), 'b': list(range(10))}
)
def rank(x: xarray.DataArray, dim: str, method: str):
# This option generate less tasks, I don't know why
axis = x.dims.index(dim)
return xarray.DataArray(
dask.array.apply_along_axis(
rankdata,
axis,
x.data,
dtype=float,
shape=(x.sizes[dim], ),
method=method
),
coords=x.coords,
dims=x.dims
)
def rank2(x: xarray.DataArray, dim: str, method: str):
from scipy.stats import rankdata
axis = x.dims.index(dim)
return xarray.apply_ufunc(
rankdata,
x.chunk({dim: x.sizes[dim]}),
dask='parallelized',
kwargs={'method': method, 'axis': axis},
meta=x.data._meta
)
arr_rank1 = rank(arr, 'a', 'ordinal')
arr_rank2 = rank2(arr, 'a', 'ordinal')
assert arr_rank1.equals(arr_rank2)
```
```py
# Probably this can work for ranking arrays with nan values
def _nanrankdata1(a, method):
y = np.empty(a.shape, dtype=np.float64)
y.fill(np.nan)
idx = ~np.isnan(a)
y[idx] = rankdata(a[idx], method=method)
return y
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480
https://github.com/pydata/xarray/issues/3810#issuecomment-592738965,https://api.github.com/repos/pydata/xarray/issues/3810,592738965,MDEyOklzc3VlQ29tbWVudDU5MjczODk2NQ==,5635139,2020-02-28T21:33:35Z,2020-02-28T21:33:35Z,MEMBER,"Yeah, unfortunately I'm fairly confident about this; have a go with moderately large arrays for `sum` and you'll quickly see the performance cliff ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480
https://github.com/pydata/xarray/issues/3810#issuecomment-592737661,https://api.github.com/repos/pydata/xarray/issues/3810,592737661,MDEyOklzc3VlQ29tbWVudDU5MjczNzY2MQ==,7441788,2020-02-28T21:29:58Z,2020-02-28T21:31:31Z,CONTRIBUTOR,"Note that with the `apply_ufunc` implementation we're only reshaping `dims`-sized `ndarray`s, not (necessarily) the whole DataArray, so maybe it's not too bad? It might be better to first sort `dims` to be in the same order as `self.dims`. i.e. `dims = [dim_ for dim_ in self.dims if dim_ in dims]`. But I'm just speculating.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480
https://github.com/pydata/xarray/issues/3810#issuecomment-592721162,https://api.github.com/repos/pydata/xarray/issues/3810,592721162,MDEyOklzc3VlQ29tbWVudDU5MjcyMTE2Mg==,5635139,2020-02-28T20:47:33Z,2020-02-28T20:47:33Z,MEMBER,"Great -- that's cool and a good implementation of `apply_ufunc`. As above, we wouldn't want to replace `rank` with that given the reshaping (we'd need a function that computes over multiple dimensions)
We could use something similar for groupbys though?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480
https://github.com/pydata/xarray/issues/3810#issuecomment-592715925,https://api.github.com/repos/pydata/xarray/issues/3810,592715925,MDEyOklzc3VlQ29tbWVudDU5MjcxNTkyNQ==,7441788,2020-02-28T20:33:43Z,2020-02-28T20:35:57Z,CONTRIBUTOR,"A few minor tweaks needed:
```
In [20]: import bottleneck
In [21]: xr.apply_ufunc(
...: lambda x: bottleneck.rankdata(x).reshape(x.shape),
...: d,
...: input_core_dims=[['xyz', 'abc']],
...: output_core_dims=[['xyz', 'abc']],
...: vectorize=True
...: ).transpose(*d.dims)
Out[21]:
array([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[10., 11., 12.]])
Dimensions without coordinates: abc, xyz
```
Despite what the docs say, `bottleneck.{nan}rankdata(a)` returns a 1-dimensional ndarray, not an array with the same shape as `a`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480
https://github.com/pydata/xarray/issues/3810#issuecomment-592708353,https://api.github.com/repos/pydata/xarray/issues/3810,592708353,MDEyOklzc3VlQ29tbWVudDU5MjcwODM1Mw==,5635139,2020-02-28T20:13:51Z,2020-02-28T20:13:51Z,MEMBER,Could you try running that?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480
https://github.com/pydata/xarray/issues/3810#issuecomment-592672463,https://api.github.com/repos/pydata/xarray/issues/3810,592672463,MDEyOklzc3VlQ29tbWVudDU5MjY3MjQ2Mw==,7441788,2020-02-28T18:51:18Z,2020-02-28T18:52:29Z,CONTRIBUTOR,"What's wrong with the following? (Still need to deal with `pct` and `keep_attrs`.)
````
apply_ufunc(
bottleneck.{nan}rankdata,
self,
input_core_dims=[dims],
output_core_dims=[dims],
vectorize=True
)
````
Per https://kwgoodman.github.io/bottleneck-doc/reference.html#bottleneck.rankdata, ""The default (axis=None) is to rank the elements of the flattened array.""","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480
https://github.com/pydata/xarray/issues/3810#issuecomment-592665711,https://api.github.com/repos/pydata/xarray/issues/3810,592665711,MDEyOklzc3VlQ29tbWVudDU5MjY2NTcxMQ==,5635139,2020-02-28T18:34:44Z,2020-02-28T18:34:44Z,MEMBER,"Yes, we can always reshape as a way of running numerical operations over multiple dimensions. But reshaping can be an expensive operation, so doing it as part of a numerical operation can cause surprises. (if you're interested, try running a sum over multiple dimensions and comparing to a reshape + a sum over the single reshaped dimension).
Instead, users can do this themselves, giving them context and control.
Reshaping is OK to do in `groupby` though (I think), so adding `rank` to groupby would be one way of accomplishing this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480
https://github.com/pydata/xarray/issues/3810#issuecomment-592654794,https://api.github.com/repos/pydata/xarray/issues/3810,592654794,MDEyOklzc3VlQ29tbWVudDU5MjY1NDc5NA==,7441788,2020-02-28T18:06:57Z,2020-02-28T18:06:57Z,CONTRIBUTOR,"Assuming `dims` is a non-empty list of dimensions, the following code seems to work:
```
temp_dim = '__temp_dim__'
return da.stack(**{temp_dim: dims}).\
rank(temp_dim, pct=pct, keep_attrs=keep_attrs).\
unstack(temp_dim).transpose(*da.dims).\
drop_vars([dim_ for dim_ in dims if dim_ not in da.coords])
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480
https://github.com/pydata/xarray/issues/3810#issuecomment-592645335,https://api.github.com/repos/pydata/xarray/issues/3810,592645335,MDEyOklzc3VlQ29tbWVudDU5MjY0NTMzNQ==,5635139,2020-02-28T17:43:05Z,2020-02-28T17:43:05Z,MEMBER,"This would be great. The underlying numerical library we use, bottleneck, [doesn't support multiple dimensions](https://kwgoodman.github.io/bottleneck-doc/reference.html#bottleneck.rankdata). If there were another option, or someone wanted to write one in numbagg, that would be a welcome addition.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480