home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "CONTRIBUTOR" and issue = 572875480 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • seth-p 4
  • josephnowak 1

issue 1

  • {DataArray,Dataset}.rank() should support an optional list of dimensions · 5 ✖

author_association 1

  • CONTRIBUTOR · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
973623524 https://github.com/pydata/xarray/issues/3810#issuecomment-973623524 https://api.github.com/repos/pydata/xarray/issues/3810 IC_kwDOAMm_X846CFDk josephnowak 25071375 2021-11-19T01:00:11Z 2021-11-19T15:09:10Z CONTRIBUTOR

Is it possible to add the option of modifying what happens when there is a tie in the rank? (If you want I can create a separate issue for this)

I think this can be done using the scipy rankdata function instead of the bottleneck rank (but also I think that adding the method option for the bottleneck package is also possible).

Small example: ```py

arr = xarray.DataArray( dask.array.random.random((11, 10), chunks=(3, 2)), coords={'a': list(range(11)), 'b': list(range(10))} )

def rank(x: xarray.DataArray, dim: str, method: str): # This option generate less tasks, I don't know why

axis = x.dims.index(dim)
return xarray.DataArray(
    dask.array.apply_along_axis(
        rankdata,
        axis,
        x.data,
        dtype=float,
        shape=(x.sizes[dim], ),
        method=method
    ),
    coords=x.coords,
    dims=x.dims
)

def rank2(x: xarray.DataArray, dim: str, method: str): from scipy.stats import rankdata

axis = x.dims.index(dim)
return xarray.apply_ufunc(
    rankdata,
    x.chunk({dim: x.sizes[dim]}),
    dask='parallelized',
    kwargs={'method': method, 'axis': axis},
    meta=x.data._meta
)

arr_rank1 = rank(arr, 'a', 'ordinal') arr_rank2 = rank2(arr, 'a', 'ordinal')

assert arr_rank1.equals(arr_rank2) ```

```py

Probably this can work for ranking arrays with nan values

def _nanrankdata1(a, method): y = np.empty(a.shape, dtype=np.float64) y.fill(np.nan) idx = ~np.isnan(a) y[idx] = rankdata(a[idx], method=method) return y

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  {DataArray,Dataset}.rank() should support an optional list of dimensions 572875480
592737661 https://github.com/pydata/xarray/issues/3810#issuecomment-592737661 https://api.github.com/repos/pydata/xarray/issues/3810 MDEyOklzc3VlQ29tbWVudDU5MjczNzY2MQ== seth-p 7441788 2020-02-28T21:29:58Z 2020-02-28T21:31:31Z CONTRIBUTOR

Note that with the apply_ufunc implementation we're only reshaping dims-sized ndarrays, not (necessarily) the whole DataArray, so maybe it's not too bad? It might be better to first sort dims to be in the same order as self.dims. i.e. dims = [dim_ for dim_ in self.dims if dim_ in dims]. But I'm just speculating.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  {DataArray,Dataset}.rank() should support an optional list of dimensions 572875480
592715925 https://github.com/pydata/xarray/issues/3810#issuecomment-592715925 https://api.github.com/repos/pydata/xarray/issues/3810 MDEyOklzc3VlQ29tbWVudDU5MjcxNTkyNQ== seth-p 7441788 2020-02-28T20:33:43Z 2020-02-28T20:35:57Z CONTRIBUTOR

A few minor tweaks needed: ``` In [20]: import bottleneck

In [21]: xr.apply_ufunc( ...: lambda x: bottleneck.rankdata(x).reshape(x.shape), ...: d, ...: input_core_dims=[['xyz', 'abc']], ...: output_core_dims=[['xyz', 'abc']], ...: vectorize=True ...: ).transpose(*d.dims)
Out[21]: <xarray.DataArray (abc: 4, xyz: 3)> array([[ 1., 2., 3.], [ 4., 5., 6.], [ 7., 8., 9.], [10., 11., 12.]]) Dimensions without coordinates: abc, xyz ```

Despite what the docs say, bottleneck.{nan}rankdata(a) returns a 1-dimensional ndarray, not an array with the same shape as a.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  {DataArray,Dataset}.rank() should support an optional list of dimensions 572875480
592672463 https://github.com/pydata/xarray/issues/3810#issuecomment-592672463 https://api.github.com/repos/pydata/xarray/issues/3810 MDEyOklzc3VlQ29tbWVudDU5MjY3MjQ2Mw== seth-p 7441788 2020-02-28T18:51:18Z 2020-02-28T18:52:29Z CONTRIBUTOR

What's wrong with the following? (Still need to deal with pct and keep_attrs.) apply_ufunc( bottleneck.{nan}rankdata, self, input_core_dims=[dims], output_core_dims=[dims], vectorize=True )

Per https://kwgoodman.github.io/bottleneck-doc/reference.html#bottleneck.rankdata, "The default (axis=None) is to rank the elements of the flattened array."

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  {DataArray,Dataset}.rank() should support an optional list of dimensions 572875480
592654794 https://github.com/pydata/xarray/issues/3810#issuecomment-592654794 https://api.github.com/repos/pydata/xarray/issues/3810 MDEyOklzc3VlQ29tbWVudDU5MjY1NDc5NA== seth-p 7441788 2020-02-28T18:06:57Z 2020-02-28T18:06:57Z CONTRIBUTOR

Assuming dims is a non-empty list of dimensions, the following code seems to work: temp_dim = '__temp_dim__' return da.stack(**{temp_dim: dims}).\ rank(temp_dim, pct=pct, keep_attrs=keep_attrs).\ unstack(temp_dim).transpose(*da.dims).\ drop_vars([dim_ for dim_ in dims if dim_ not in da.coords])

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  {DataArray,Dataset}.rank() should support an optional list of dimensions 572875480

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.591ms · About: xarray-datasette