home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where author_association = "MEMBER", issue = 331668890 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 6 ✖

issue 1

  • Slow performance of isel · 6 ✖

author_association 1

  • MEMBER · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1464180874 https://github.com/pydata/xarray/issues/2227#issuecomment-1464180874 https://api.github.com/repos/pydata/xarray/issues/2227 IC_kwDOAMm_X85XRaCK shoyer 1217238 2023-03-10T18:04:23Z 2023-03-10T18:04:23Z MEMBER

@dschwoerer are you sure that you are actually calculating the same thing in both cases? What exactly do the values of slc[d] look like? I would test thing on smaller inputs to verify. My guess is that you are inadvertently calculating something different, recalling that Xarray's broadcasting rules differ slightly from NumPy's.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
533193480 https://github.com/pydata/xarray/issues/2227#issuecomment-533193480 https://api.github.com/repos/pydata/xarray/issues/2227 MDEyOklzc3VlQ29tbWVudDUzMzE5MzQ4MA== shoyer 1217238 2019-09-19T15:49:24Z 2019-09-19T15:49:24Z MEMBER

Yes, align checks index.equals(other) first, which has a shortcut for the same object.

The real mystery here is why time_filter.indexes['time'] and ds.indexes['time'] are not the same object. I guess this is likely due to lazy initialization of indexes, and should be fixed eventually by the explicit indexes refactor.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
532804542 https://github.com/pydata/xarray/issues/2227#issuecomment-532804542 https://api.github.com/repos/pydata/xarray/issues/2227 MDEyOklzc3VlQ29tbWVudDUzMjgwNDU0Mg== shoyer 1217238 2019-09-18T18:17:22Z 2019-09-18T18:17:22Z MEMBER

https://github.com/pydata/xarray/pull/3319 gives us about a 2x performance boost. It could likely be much faster, but at least this fixes the regression.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
532787342 https://github.com/pydata/xarray/issues/2227#issuecomment-532787342 https://api.github.com/repos/pydata/xarray/issues/2227 MDEyOklzc3VlQ29tbWVudDUzMjc4NzM0Mg== shoyer 1217238 2019-09-18T17:33:38Z 2019-09-18T17:33:38Z MEMBER

Yes, I'm seeing similar numbers, about 10x slower indexing in a DataArray. This seems to have gotten slower over time. It would be good to track this down and add a benchmark!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
424549023 https://github.com/pydata/xarray/issues/2227#issuecomment-424549023 https://api.github.com/repos/pydata/xarray/issues/2227 MDEyOklzc3VlQ29tbWVudDQyNDU0OTAyMw== shoyer 1217238 2018-09-26T00:54:24Z 2018-09-26T00:54:24Z MEMBER

@WeatherGod does adding something like da = da.chunk({'time': 1}) reproduce this with your example?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
396725591 https://github.com/pydata/xarray/issues/2227#issuecomment-396725591 https://api.github.com/repos/pydata/xarray/issues/2227 MDEyOklzc3VlQ29tbWVudDM5NjcyNTU5MQ== shoyer 1217238 2018-06-12T20:38:47Z 2018-06-12T20:38:47Z MEMBER

My measurements: ```

%timeit ds.a.isel(time=time_filter) 1 loop, best of 3: 906 ms per loop %timeit ds.a.isel(time=time_filter.values) 1 loop, best of 3: 447 ms per loop %timeit ds.a.values[time_filter] 10 loops, best of 3: 169 ms per loop ```

Given the size of this gap, I suspect this could be improved with some investigation and profiling, but there is certainly an upper-limit on the possible performance gain.

One simple example is that indexing the dataset needs to index both 'a' and 'time', so it's going to be at least twice as slow as only indexing 'a'. So the second indexing expression ds.a.isel(time=time_filter.values) is only 447/(169*2) = 1.32 times slower than the best case scenario.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 188.077ms · About: xarray-datasette