home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 365973662 and user = 5635139 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • max-sixty · 3 ✖

issue 1

  • Stack + to_array before to_xarray is much faster that a simple to_xarray · 3 ✖

author_association 1

  • MEMBER 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
426646669 https://github.com/pydata/xarray/issues/2459#issuecomment-426646669 https://api.github.com/repos/pydata/xarray/issues/2459 MDEyOklzc3VlQ29tbWVudDQyNjY0NjY2OQ== max-sixty 5635139 2018-10-03T13:55:40Z 2018-10-03T16:13:41Z MEMBER

My working hypothesis is that pandas has a set of fast routines in C, such that it can stack without reindexing to the full index. The routines only work in 1-2 dimensions.

So without some hackery (i.e. converting multi-dimensional arrays to pandas' size and back), the current implementation is reasonable*. Next step would be to write our own routines that can operate on multiple dimensions (numbagg!).

Is that consistent with others' views, particularly those who know this area well?

'* one small fix that would improve performance of series.to_xarray() only, is the comment above. Lmk if you think worth making that change

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stack + to_array before to_xarray is much faster that a simple to_xarray 365973662
426483497 https://github.com/pydata/xarray/issues/2459#issuecomment-426483497 https://api.github.com/repos/pydata/xarray/issues/2459 MDEyOklzc3VlQ29tbWVudDQyNjQ4MzQ5Nw== max-sixty 5635139 2018-10-03T01:30:07Z 2018-10-03T01:30:07Z MEMBER

It's 3x faster to unstack & stack all-but-one level, vs reindexing over a filled-out index (and I think always produces the same result).

Our current code takes the slow path.

I could make that change, but that strongly feels like I don't understand the root cause. I haven't spent much time with reshaping code - lmk if anyone has ideas.

```python

idx = cropped.index full_idx = pd.MultiIndex.from_product(idx.levels, names=idx.names)

reindexed = cropped.reindex(full_idx)

%timeit reindexed = cropped.reindex(full_idx)

1 loop, best of 3: 278 ms per loop

%%timeit stack_unstack = ( cropped .unstack(list('yz')) .stack(list('yz'),dropna=False) )

10 loops, best of 3: 80.8 ms per loop

stack_unstack.equals(reindexed)

True

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stack + to_array before to_xarray is much faster that a simple to_xarray 365973662
426408924 https://github.com/pydata/xarray/issues/2459#issuecomment-426408924 https://api.github.com/repos/pydata/xarray/issues/2459 MDEyOklzc3VlQ29tbWVudDQyNjQwODkyNA== max-sixty 5635139 2018-10-02T19:57:20Z 2018-10-02T19:57:20Z MEMBER

When I stepped through, it was by-and-large all taken up by https://github.com/pydata/xarray/blob/master/xarray/core/dataset.py#L3121. That's where the boxing & unboxing of the datetimes is from.

I haven't yet discovered how the alternative path avoids this work. If anyone has priors please lmk!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stack + to_array before to_xarray is much faster that a simple to_xarray 365973662

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 136.041ms · About: xarray-datasette