home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "NONE", issue = 416962458 and user = 61931826 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • openSourcerer9000 · 2 ✖

issue 1

  • Performance: numpy indexes small amounts of data 1000 faster than xarray · 2 ✖

author_association 1

  • NONE · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1306386310 https://github.com/pydata/xarray/issues/2799#issuecomment-1306386310 https://api.github.com/repos/pydata/xarray/issues/2799 IC_kwDOAMm_X85N3d-G openSourcerer9000 61931826 2022-11-07T23:53:17Z 2022-11-07T23:53:17Z NONE

So a workaround I was able to use was to load the whole thing into a np array (18GB!) in 1 minute da.values, index 15 nodes in 0.4 seconds (was taking ~5min in xarray), then load it back into a dataarray. Not accommodating for our friends of differently-abled memory cards, but it worked in my case

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
1306300937 https://github.com/pydata/xarray/issues/2799#issuecomment-1306300937 https://api.github.com/repos/pydata/xarray/issues/2799 IC_kwDOAMm_X85N3JIJ openSourcerer9000 61931826 2022-11-07T22:16:55Z 2022-11-07T22:16:55Z NONE

I'm really not understanding why indexing is so slow. My dataarray has 2 dims, one axis 1.5 million long ('node') and the other 1500 ('time'). Trying to pull a single timeseries by indexing 1 node takes 16 seconds. the Variable workaround or playing around with chunking doesn't change anything. The only thing loading into memory should be array of 1500 values.

Not sure what's going on under the hood but there may be a way to specify that you're only looking to optimize indexing along 1 dim. Once it gets indexed it becomes a very tiny data set. I would think chunks={'node':1} would do exactly this but I guess not.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.095ms · About: xarray-datasette