home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 462049420 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • crusaderky 3
  • coroa 2
  • yohai 1

author_association 2

  • CONTRIBUTOR 3
  • MEMBER 3

issue 1

  • Flat iteration over DataArray · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
520210672 https://github.com/pydata/xarray/pull/3054#issuecomment-520210672 https://api.github.com/repos/pydata/xarray/issues/3054 MDEyOklzc3VlQ29tbWVudDUyMDIxMDY3Mg== coroa 2552981 2019-08-11T08:36:19Z 2019-08-11T08:36:41Z CONTRIBUTOR

@yohai : In short, no. It does not make sense to add a built-in function for iteration, if it is unable to augment the low-level functionality.

I'd recommend closing this PR!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flat iteration over DataArray 462049420
520048334 https://github.com/pydata/xarray/pull/3054#issuecomment-520048334 https://api.github.com/repos/pydata/xarray/issues/3054 MDEyOklzc3VlQ29tbWVudDUyMDA0ODMzNA== crusaderky 6213168 2019-08-09T20:10:29Z 2019-08-09T20:29:23Z MEMBER

Mh. Actually it looks like ndarray.flat is the fastest way to iterate over numpy. Still considerably slower than a CPython iterator though

```python import numpy

N = 1000000 a = numpy.arange(N)

def exhaust(it): for _ in it: pass

%timeit exhaust(a) 24.8 ms ± 723 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit exhaust(a.flat) 20.4 ms ± 701 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit exhaust(a.tolist()) 27.2 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit exhaust(range(N)) 10.5 ms ± 234 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flat iteration over DataArray 462049420
520016328 https://github.com/pydata/xarray/pull/3054#issuecomment-520016328 https://api.github.com/repos/pydata/xarray/issues/3054 MDEyOklzc3VlQ29tbWVudDUyMDAxNjMyOA== crusaderky 6213168 2019-08-09T18:19:05Z 2019-08-09T18:19:05Z MEMBER

@yohai Iterating point by point in pure python over numpy data is horribly slow. numpy.ndarray.flat is there mostly to be used within cython/numba code. In a DataArray it's much worse than in a plain ndarray, because every time you invoke the slice operator to fetch a single element it's being applied to all coordinates too.

If you just need to iterate over the values of a DataArray, then DataArray.values.ravel().tolist() is the fastest option that I know of (ndarray.tolist() is much faster than list(ndarray)!). If you need the coords as well, then I suspect you may be doing it wrong - could you show a simple example of your use case?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flat iteration over DataArray 462049420
519932051 https://github.com/pydata/xarray/pull/3054#issuecomment-519932051 https://api.github.com/repos/pydata/xarray/issues/3054 MDEyOklzc3VlQ29tbWVudDUxOTkzMjA1MQ== yohai 6164157 2019-08-09T14:04:57Z 2019-08-09T14:04:57Z CONTRIBUTOR

@crusaderky @corora Thanks for your comments, glad to see that there's a more efficient way to do it. The question is do you think it's useful enough to justify adding it as a built in function. I end up using my solution quite often

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flat iteration over DataArray 462049420
519875542 https://github.com/pydata/xarray/pull/3054#issuecomment-519875542 https://api.github.com/repos/pydata/xarray/issues/3054 MDEyOklzc3VlQ29tbWVudDUxOTg3NTU0Mg== crusaderky 6213168 2019-08-09T10:53:59Z 2019-08-09T10:58:00Z MEMBER

Indeed this is extremely inefficient. I'm afraid it's a -1 from me.

You can get the same with a much faster one-liner: a.stack(__flat=a.dims).reset_index('__flat') (although admittedly it's more RAM-intensive). On related notes, - stack() could use a set_index=True optional parameter that avoids you from going through a MultiIndex if you don't need one - stack() should accept non-string hashables; this would allow avoiding potential collisions (e.g. flat = object()) - there is an issue with the stack -> reset_index round-trip where it converts unicode variables to object #907

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flat iteration over DataArray 462049420
508407631 https://github.com/pydata/xarray/pull/3054#issuecomment-508407631 https://api.github.com/repos/pydata/xarray/issues/3054 MDEyOklzc3VlQ29tbWVudDUwODQwNzYzMQ== coroa 2552981 2019-07-04T09:15:14Z 2019-07-04T09:15:14Z CONTRIBUTOR

@yohai It's a lot more efficient to simply iterate over the underlying array, ie. da.values.flat, if you can afford to hold everything in memory.

If you are instead using streaming computation based on dask, then you would have to do something similar on per-chunk basis.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flat iteration over DataArray 462049420

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.509ms · About: xarray-datasette