home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "CONTRIBUTOR", issue = 1575938277 and user = 1492047 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • Thomas-Z · 4 ✖

issue 1

  • Dataset.where performances regression. · 4 ✖

author_association 1

  • CONTRIBUTOR · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1532601237 https://github.com/pydata/xarray/issues/7516#issuecomment-1532601237 https://api.github.com/repos/pydata/xarray/issues/7516 IC_kwDOAMm_X85bWaOV Thomas-Z 1492047 2023-05-03T07:58:22Z 2023-05-03T07:58:22Z CONTRIBUTOR

Hello,

I'm not sure performances problematics were fully addressed (we're now forced to fully compute/load the selection expression) but changes made in the last versions makes this issue irrelevant and I think we can close it.

Thank you!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.where performances regression. 1575938277
1451754167 https://github.com/pydata/xarray/issues/7516#issuecomment-1451754167 https://api.github.com/repos/pydata/xarray/issues/7516 IC_kwDOAMm_X85WiAK3 Thomas-Z 1492047 2023-03-02T11:59:47Z 2023-03-02T11:59:47Z CONTRIBUTOR

The .variable computation is fast but it cannot be directly used like you suggest: ``` dsx.where(sel.variable, drop=True)

TypeError: cond argument is <xarray.Variable (num_lines: 5761870, num_pixels: 71)> ... but must be a <class 'xarray.core.dataset.Dataset'> or <class 'xarray.core.dataarray.DataArray'> ```

Doing it like this seems to be working correctly (and is fast enough): dsx["x"]= sel.variable.compute() dsx.where(dsx["x"], drop=True)

_nadir variables have the same chunks and are way faster to read than the other ones (lot smaller).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.where performances regression. 1575938277
1449714522 https://github.com/pydata/xarray/issues/7516#issuecomment-1449714522 https://api.github.com/repos/pydata/xarray/issues/7516 IC_kwDOAMm_X85WaONa Thomas-Z 1492047 2023-03-01T09:43:27Z 2023-03-01T09:43:27Z CONTRIBUTOR

sel = (dsx["longitude"] > 0) & (dsx["longitude"] < 100) sel.compute() This "compute" finishes and takes more than 80sec on both versions with a huge memory consumption (it loads the 4 coordinates and the result itself).

I know xarray has to keep more information regarding coordinates and dimensions but doing this (just dask arrays) : sel2 = (dsx["longitude"].data > 0) & (dsx["longitude"].data < 100) sel2.compute() Takes less than 6 seconds.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.where performances regression. 1575938277
1447798846 https://github.com/pydata/xarray/issues/7516#issuecomment-1447798846 https://api.github.com/repos/pydata/xarray/issues/7516 IC_kwDOAMm_X85WS6g- Thomas-Z 1492047 2023-02-28T08:54:16Z 2023-02-28T11:24:11Z CONTRIBUTOR

Just tried it and it does not seem identical at all to what was happening earlier.

This is the kind of dataset I'm working

With this selection: sel = (dsx["longitude"] > 0) & (dsx["longitude"] < 100)

Old xarray takes a little less that 1 minute and less than 6GB of memory. New xarray with compute did not finish and had to be stopped before consuming my 16GB of memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.where performances regression. 1575938277

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.125ms · About: xarray-datasette