issue_comments
3 rows where issue = 365973662 and user = 5635139 sorted by updated_at descending
This data as json, CSV (advanced)
These facets timed out: author_association
issue 1
- Stack + to_array before to_xarray is much faster that a simple to_xarray · 3 ✖
| id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 426646669 | https://github.com/pydata/xarray/issues/2459#issuecomment-426646669 | https://api.github.com/repos/pydata/xarray/issues/2459 | MDEyOklzc3VlQ29tbWVudDQyNjY0NjY2OQ== | max-sixty 5635139 | 2018-10-03T13:55:40Z | 2018-10-03T16:13:41Z | MEMBER | My working hypothesis is that pandas has a set of fast routines in C, such that it can stack without reindexing to the full index. The routines only work in 1-2 dimensions. So without some hackery (i.e. converting multi-dimensional arrays to pandas' size and back), the current implementation is reasonable*. Next step would be to write our own routines that can operate on multiple dimensions (numbagg!). Is that consistent with others' views, particularly those who know this area well? '* one small fix that would improve performance of |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Stack + to_array before to_xarray is much faster that a simple to_xarray 365973662 | |
| 426483497 | https://github.com/pydata/xarray/issues/2459#issuecomment-426483497 | https://api.github.com/repos/pydata/xarray/issues/2459 | MDEyOklzc3VlQ29tbWVudDQyNjQ4MzQ5Nw== | max-sixty 5635139 | 2018-10-03T01:30:07Z | 2018-10-03T01:30:07Z | MEMBER | It's 3x faster to unstack & stack all-but-one level, vs reindexing over a filled-out index (and I think always produces the same result). Our current code takes the slow path. I could make that change, but that strongly feels like I don't understand the root cause. I haven't spent much time with reshaping code - lmk if anyone has ideas. ```python idx = cropped.index full_idx = pd.MultiIndex.from_product(idx.levels, names=idx.names) reindexed = cropped.reindex(full_idx) %timeit reindexed = cropped.reindex(full_idx) 1 loop, best of 3: 278 ms per loop%%timeit stack_unstack = ( cropped .unstack(list('yz')) .stack(list('yz'),dropna=False) ) 10 loops, best of 3: 80.8 ms per loopstack_unstack.equals(reindexed) True``` |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Stack + to_array before to_xarray is much faster that a simple to_xarray 365973662 | |
| 426408924 | https://github.com/pydata/xarray/issues/2459#issuecomment-426408924 | https://api.github.com/repos/pydata/xarray/issues/2459 | MDEyOklzc3VlQ29tbWVudDQyNjQwODkyNA== | max-sixty 5635139 | 2018-10-02T19:57:20Z | 2018-10-02T19:57:20Z | MEMBER | When I stepped through, it was by-and-large all taken up by https://github.com/pydata/xarray/blob/master/xarray/core/dataset.py#L3121. That's where the boxing & unboxing of the datetimes is from. I haven't yet discovered how the alternative path avoids this work. If anyone has priors please lmk! |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Stack + to_array before to_xarray is much faster that a simple to_xarray 365973662 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] (
[html_url] TEXT,
[issue_url] TEXT,
[id] INTEGER PRIMARY KEY,
[node_id] TEXT,
[user] INTEGER REFERENCES [users]([id]),
[created_at] TEXT,
[updated_at] TEXT,
[author_association] TEXT,
[body] TEXT,
[reactions] TEXT,
[performed_via_github_app] TEXT,
[issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
ON [issue_comments] ([user]);
user 1