home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 462049420 and user = 6213168 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • crusaderky · 3 ✖

issue 1

  • Flat iteration over DataArray · 3 ✖

author_association 1

  • MEMBER 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
520048334 https://github.com/pydata/xarray/pull/3054#issuecomment-520048334 https://api.github.com/repos/pydata/xarray/issues/3054 MDEyOklzc3VlQ29tbWVudDUyMDA0ODMzNA== crusaderky 6213168 2019-08-09T20:10:29Z 2019-08-09T20:29:23Z MEMBER

Mh. Actually it looks like ndarray.flat is the fastest way to iterate over numpy. Still considerably slower than a CPython iterator though

```python import numpy

N = 1000000 a = numpy.arange(N)

def exhaust(it): for _ in it: pass

%timeit exhaust(a) 24.8 ms ± 723 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit exhaust(a.flat) 20.4 ms ± 701 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit exhaust(a.tolist()) 27.2 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit exhaust(range(N)) 10.5 ms ± 234 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flat iteration over DataArray 462049420
520016328 https://github.com/pydata/xarray/pull/3054#issuecomment-520016328 https://api.github.com/repos/pydata/xarray/issues/3054 MDEyOklzc3VlQ29tbWVudDUyMDAxNjMyOA== crusaderky 6213168 2019-08-09T18:19:05Z 2019-08-09T18:19:05Z MEMBER

@yohai Iterating point by point in pure python over numpy data is horribly slow. numpy.ndarray.flat is there mostly to be used within cython/numba code. In a DataArray it's much worse than in a plain ndarray, because every time you invoke the slice operator to fetch a single element it's being applied to all coordinates too.

If you just need to iterate over the values of a DataArray, then DataArray.values.ravel().tolist() is the fastest option that I know of (ndarray.tolist() is much faster than list(ndarray)!). If you need the coords as well, then I suspect you may be doing it wrong - could you show a simple example of your use case?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flat iteration over DataArray 462049420
519875542 https://github.com/pydata/xarray/pull/3054#issuecomment-519875542 https://api.github.com/repos/pydata/xarray/issues/3054 MDEyOklzc3VlQ29tbWVudDUxOTg3NTU0Mg== crusaderky 6213168 2019-08-09T10:53:59Z 2019-08-09T10:58:00Z MEMBER

Indeed this is extremely inefficient. I'm afraid it's a -1 from me.

You can get the same with a much faster one-liner: a.stack(__flat=a.dims).reset_index('__flat') (although admittedly it's more RAM-intensive). On related notes, - stack() could use a set_index=True optional parameter that avoids you from going through a MultiIndex if you don't need one - stack() should accept non-string hashables; this would allow avoiding potential collisions (e.g. flat = object()) - there is an issue with the stack -> reset_index round-trip where it converts unicode variables to object #907

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flat iteration over DataArray 462049420

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 38.552ms · About: xarray-datasette