home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 785329941 and user = 10194086 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • mathause · 3 ✖

issue 1

  • Improve performance of xarray.corr() on big datasets · 3 ✖

author_association 1

  • MEMBER 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
759780514 https://github.com/pydata/xarray/issues/4804#issuecomment-759780514 https://api.github.com/repos/pydata/xarray/issues/4804 MDEyOklzc3VlQ29tbWVudDc1OTc4MDUxNA== mathause 10194086 2021-01-13T22:32:47Z 2021-01-14T01:15:02Z MEMBER

@aaronspring I had a quick look at your version - do you have an idea why it is is faster? Does yours also work for dask arrays?

  • In a, b = xr.broadcast(a, b, exclude=dim) why can you exclude dim?
  • I think you could also use a, b = xr.align(a, b, exclude=dim) (broadcast has join="outer" which fills it with NA which then get ignored; align uses join="inner")
  • Does your version work if the weights contain NA?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improve performance of xarray.corr() on big datasets 785329941
759795213 https://github.com/pydata/xarray/issues/4804#issuecomment-759795213 https://api.github.com/repos/pydata/xarray/issues/4804 MDEyOklzc3VlQ29tbWVudDc1OTc5NTIxMw== mathause 10194086 2021-01-13T22:52:19Z 2021-01-13T22:52:19Z MEMBER

Another possibility is to replace

https://github.com/pydata/xarray/blob/cc53a77ff0c8aaf8686f0b0bd7f75985b74e2054/xarray/core/computation.py#L1327

with xr.dot. However, to do so, you need to replace NA with 0 (and I am not sure if that's worth it). Also the min_count needs to be addressed (but that should not be too difficult).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improve performance of xarray.corr() on big datasets 785329941
759745055 https://github.com/pydata/xarray/issues/4804#issuecomment-759745055 https://api.github.com/repos/pydata/xarray/issues/4804 MDEyOklzc3VlQ29tbWVudDc1OTc0NTA1NQ== mathause 10194086 2021-01-13T21:17:34Z 2021-01-13T21:17:34Z MEMBER

Yes if not valid_values.all() is not lazy. That's the same problem as #4541 and therefore #4559 can be an inspiration how to tackle this. It would be good to test if the check also makes this slower for numpy arrays? Then it could also be removed entirely. That would be counter-intuitive for me, but it seems to be faster for dask arrays...

Other improvements * I am not sure if /= avoids a copy but if so, that's also a possibility to make it faster. * We could add a short-cut for skipna=False (would require adding this option) or dtypes that cannot have NA values as follows:

```python if skipna: # 2. Ignore the nans valid_values = da_a.notnull() & da_b.notnull()

if not valid_values.all():
    da_a = da_a.where(valid_values)
    da_b = da_b.where(valid_values)

valid_count = valid_values.sum(dim) - ddof

else: # shortcut for skipna=False # da_a and da_b are aligned, so the have the same dims and shape axis = da_a.get_axis_num(dim) valid_count = np.take(da_a.shape, axis).prod() - ddof

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improve performance of xarray.corr() on big datasets 785329941

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 78.753ms · About: xarray-datasette