html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/3810#issuecomment-973623524,https://api.github.com/repos/pydata/xarray/issues/3810,973623524,IC_kwDOAMm_X846CFDk,25071375,2021-11-19T01:00:11Z,2021-11-19T15:09:10Z,CONTRIBUTOR,"Is it possible to add the option of modifying what happens when there is a tie in the rank? (If you want I can create a separate issue for this) I think this can be done using the scipy rankdata function instead of the bottleneck rank (but also I think that adding the method option for the bottleneck package is also possible). Small example: ```py arr = xarray.DataArray( dask.array.random.random((11, 10), chunks=(3, 2)), coords={'a': list(range(11)), 'b': list(range(10))} ) def rank(x: xarray.DataArray, dim: str, method: str): # This option generate less tasks, I don't know why axis = x.dims.index(dim) return xarray.DataArray( dask.array.apply_along_axis( rankdata, axis, x.data, dtype=float, shape=(x.sizes[dim], ), method=method ), coords=x.coords, dims=x.dims ) def rank2(x: xarray.DataArray, dim: str, method: str): from scipy.stats import rankdata axis = x.dims.index(dim) return xarray.apply_ufunc( rankdata, x.chunk({dim: x.sizes[dim]}), dask='parallelized', kwargs={'method': method, 'axis': axis}, meta=x.data._meta ) arr_rank1 = rank(arr, 'a', 'ordinal') arr_rank2 = rank2(arr, 'a', 'ordinal') assert arr_rank1.equals(arr_rank2) ``` ```py # Probably this can work for ranking arrays with nan values def _nanrankdata1(a, method): y = np.empty(a.shape, dtype=np.float64) y.fill(np.nan) idx = ~np.isnan(a) y[idx] = rankdata(a[idx], method=method) return y ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480 https://github.com/pydata/xarray/issues/3810#issuecomment-592738965,https://api.github.com/repos/pydata/xarray/issues/3810,592738965,MDEyOklzc3VlQ29tbWVudDU5MjczODk2NQ==,5635139,2020-02-28T21:33:35Z,2020-02-28T21:33:35Z,MEMBER,"Yeah, unfortunately I'm fairly confident about this; have a go with moderately large arrays for `sum` and you'll quickly see the performance cliff ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480 https://github.com/pydata/xarray/issues/3810#issuecomment-592737661,https://api.github.com/repos/pydata/xarray/issues/3810,592737661,MDEyOklzc3VlQ29tbWVudDU5MjczNzY2MQ==,7441788,2020-02-28T21:29:58Z,2020-02-28T21:31:31Z,CONTRIBUTOR,"Note that with the `apply_ufunc` implementation we're only reshaping `dims`-sized `ndarray`s, not (necessarily) the whole DataArray, so maybe it's not too bad? It might be better to first sort `dims` to be in the same order as `self.dims`. i.e. `dims = [dim_ for dim_ in self.dims if dim_ in dims]`. But I'm just speculating.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480 https://github.com/pydata/xarray/issues/3810#issuecomment-592721162,https://api.github.com/repos/pydata/xarray/issues/3810,592721162,MDEyOklzc3VlQ29tbWVudDU5MjcyMTE2Mg==,5635139,2020-02-28T20:47:33Z,2020-02-28T20:47:33Z,MEMBER,"Great -- that's cool and a good implementation of `apply_ufunc`. As above, we wouldn't want to replace `rank` with that given the reshaping (we'd need a function that computes over multiple dimensions) We could use something similar for groupbys though?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480 https://github.com/pydata/xarray/issues/3810#issuecomment-592715925,https://api.github.com/repos/pydata/xarray/issues/3810,592715925,MDEyOklzc3VlQ29tbWVudDU5MjcxNTkyNQ==,7441788,2020-02-28T20:33:43Z,2020-02-28T20:35:57Z,CONTRIBUTOR,"A few minor tweaks needed: ``` In [20]: import bottleneck In [21]: xr.apply_ufunc( ...: lambda x: bottleneck.rankdata(x).reshape(x.shape), ...: d, ...: input_core_dims=[['xyz', 'abc']], ...: output_core_dims=[['xyz', 'abc']], ...: vectorize=True ...: ).transpose(*d.dims) Out[21]: array([[ 1., 2., 3.], [ 4., 5., 6.], [ 7., 8., 9.], [10., 11., 12.]]) Dimensions without coordinates: abc, xyz ``` Despite what the docs say, `bottleneck.{nan}rankdata(a)` returns a 1-dimensional ndarray, not an array with the same shape as `a`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480 https://github.com/pydata/xarray/issues/3810#issuecomment-592708353,https://api.github.com/repos/pydata/xarray/issues/3810,592708353,MDEyOklzc3VlQ29tbWVudDU5MjcwODM1Mw==,5635139,2020-02-28T20:13:51Z,2020-02-28T20:13:51Z,MEMBER,Could you try running that?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480 https://github.com/pydata/xarray/issues/3810#issuecomment-592672463,https://api.github.com/repos/pydata/xarray/issues/3810,592672463,MDEyOklzc3VlQ29tbWVudDU5MjY3MjQ2Mw==,7441788,2020-02-28T18:51:18Z,2020-02-28T18:52:29Z,CONTRIBUTOR,"What's wrong with the following? (Still need to deal with `pct` and `keep_attrs`.) ```` apply_ufunc( bottleneck.{nan}rankdata, self, input_core_dims=[dims], output_core_dims=[dims], vectorize=True ) ```` Per https://kwgoodman.github.io/bottleneck-doc/reference.html#bottleneck.rankdata, ""The default (axis=None) is to rank the elements of the flattened array.""","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480 https://github.com/pydata/xarray/issues/3810#issuecomment-592665711,https://api.github.com/repos/pydata/xarray/issues/3810,592665711,MDEyOklzc3VlQ29tbWVudDU5MjY2NTcxMQ==,5635139,2020-02-28T18:34:44Z,2020-02-28T18:34:44Z,MEMBER,"Yes, we can always reshape as a way of running numerical operations over multiple dimensions. But reshaping can be an expensive operation, so doing it as part of a numerical operation can cause surprises. (if you're interested, try running a sum over multiple dimensions and comparing to a reshape + a sum over the single reshaped dimension). Instead, users can do this themselves, giving them context and control. Reshaping is OK to do in `groupby` though (I think), so adding `rank` to groupby would be one way of accomplishing this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480 https://github.com/pydata/xarray/issues/3810#issuecomment-592654794,https://api.github.com/repos/pydata/xarray/issues/3810,592654794,MDEyOklzc3VlQ29tbWVudDU5MjY1NDc5NA==,7441788,2020-02-28T18:06:57Z,2020-02-28T18:06:57Z,CONTRIBUTOR,"Assuming `dims` is a non-empty list of dimensions, the following code seems to work: ``` temp_dim = '__temp_dim__' return da.stack(**{temp_dim: dims}).\ rank(temp_dim, pct=pct, keep_attrs=keep_attrs).\ unstack(temp_dim).transpose(*da.dims).\ drop_vars([dim_ for dim_ in dims if dim_ not in da.coords]) ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480 https://github.com/pydata/xarray/issues/3810#issuecomment-592645335,https://api.github.com/repos/pydata/xarray/issues/3810,592645335,MDEyOklzc3VlQ29tbWVudDU5MjY0NTMzNQ==,5635139,2020-02-28T17:43:05Z,2020-02-28T17:43:05Z,MEMBER,"This would be great. The underlying numerical library we use, bottleneck, [doesn't support multiple dimensions](https://kwgoodman.github.io/bottleneck-doc/reference.html#bottleneck.rankdata). If there were another option, or someone wanted to write one in numbagg, that would be a welcome addition.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572875480