home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 416962458 and user = 90008 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • hmaarrfk · 4 ✖

issue 1

  • Performance: numpy indexes small amounts of data 1000 faster than xarray · 4 ✖

author_association 1

  • CONTRIBUTOR 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1306327743 https://github.com/pydata/xarray/issues/2799#issuecomment-1306327743 https://api.github.com/repos/pydata/xarray/issues/2799 IC_kwDOAMm_X85N3Pq_ hmaarrfk 90008 2022-11-07T22:45:07Z 2022-11-07T22:45:07Z CONTRIBUTOR

As I've been recently going down this performance rabbit hole, I think the discussion around https://github.com/pydata/xarray/issues/7045 is relevant and provides some additional historical context as to "why" this performance penalty might be happening.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
786813358 https://github.com/pydata/xarray/issues/2799#issuecomment-786813358 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDc4NjgxMzM1OA== hmaarrfk 90008 2021-02-26T18:19:28Z 2021-02-26T18:19:28Z CONTRIBUTOR

I hope the following can help users that struggle with the speed of xarray:

I've found that when doing numerical computation, I often use the xarray to grab all the metadata relevant to my computation. Scale, chromaticity, experimental information.

Eventually, i create a function that acts as a barrier: - Xarray input (high level experimental data) - Computation parameters output (low level implementation detail relevant information).

The low level implementation can operate on the fast numpy arrays. I've found this to be the struggle with creating high level APIs that do things like sanitize inputs (xarray routines like _validate_indexers and _broadcast_indexes) and low level APIs that are simply interested in moving and computing data.

For the example that @nbren12 brought up originally, it might be better to create xarray routines (if they don't exist already) that can create fast iterators for the underlying numpy arrays given a set of dimensions that the user cares about.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552652019 https://github.com/pydata/xarray/issues/2799#issuecomment-552652019 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDU1MjY1MjAxOQ== hmaarrfk 90008 2019-11-11T22:47:47Z 2019-11-11T22:47:47Z CONTRIBUTOR

Sure, I just wanted to make the note that this operation should be more or less constant time, as opposed to dependent on the size of the array. Somebody had mentionned it should increase with the size of the array.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552619589 https://github.com/pydata/xarray/issues/2799#issuecomment-552619589 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDU1MjYxOTU4OQ== hmaarrfk 90008 2019-11-11T21:16:36Z 2019-11-11T21:16:36Z CONTRIBUTOR

Hmm, slicing should basically be a no-op.

The fact that xarray makes it about 100x slower is a real killer. It seems from this conversation that it might be hard to workaround

```python import xarray as xr import numpy as np n = np.zeros(shape=(1024, 1024)) x = xr.DataArray(n, dims=('y', 'x')) the_slice = np.s_[256:512, 256:512] %timeit n[the_slice] %timeit x[the_slice] 186 ns ± 0.778 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) 70.3 µs ± 593 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2800.399ms · About: xarray-datasette