issue_comments
3 rows where issue = 365973662 and user = 5635139 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Stack + to_array before to_xarray is much faster that a simple to_xarray · 3 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
426646669 | https://github.com/pydata/xarray/issues/2459#issuecomment-426646669 | https://api.github.com/repos/pydata/xarray/issues/2459 | MDEyOklzc3VlQ29tbWVudDQyNjY0NjY2OQ== | max-sixty 5635139 | 2018-10-03T13:55:40Z | 2018-10-03T16:13:41Z | MEMBER | My working hypothesis is that pandas has a set of fast routines in C, such that it can stack without reindexing to the full index. The routines only work in 1-2 dimensions. So without some hackery (i.e. converting multi-dimensional arrays to pandas' size and back), the current implementation is reasonable*. Next step would be to write our own routines that can operate on multiple dimensions (numbagg!). Is that consistent with others' views, particularly those who know this area well? '* one small fix that would improve performance of |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Stack + to_array before to_xarray is much faster that a simple to_xarray 365973662 | |
426483497 | https://github.com/pydata/xarray/issues/2459#issuecomment-426483497 | https://api.github.com/repos/pydata/xarray/issues/2459 | MDEyOklzc3VlQ29tbWVudDQyNjQ4MzQ5Nw== | max-sixty 5635139 | 2018-10-03T01:30:07Z | 2018-10-03T01:30:07Z | MEMBER | It's 3x faster to unstack & stack all-but-one level, vs reindexing over a filled-out index (and I think always produces the same result). Our current code takes the slow path. I could make that change, but that strongly feels like I don't understand the root cause. I haven't spent much time with reshaping code - lmk if anyone has ideas. ```python idx = cropped.index full_idx = pd.MultiIndex.from_product(idx.levels, names=idx.names) reindexed = cropped.reindex(full_idx) %timeit reindexed = cropped.reindex(full_idx) 1 loop, best of 3: 278 ms per loop%%timeit stack_unstack = ( cropped .unstack(list('yz')) .stack(list('yz'),dropna=False) ) 10 loops, best of 3: 80.8 ms per loopstack_unstack.equals(reindexed) True``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Stack + to_array before to_xarray is much faster that a simple to_xarray 365973662 | |
426408924 | https://github.com/pydata/xarray/issues/2459#issuecomment-426408924 | https://api.github.com/repos/pydata/xarray/issues/2459 | MDEyOklzc3VlQ29tbWVudDQyNjQwODkyNA== | max-sixty 5635139 | 2018-10-02T19:57:20Z | 2018-10-02T19:57:20Z | MEMBER | When I stepped through, it was by-and-large all taken up by https://github.com/pydata/xarray/blob/master/xarray/core/dataset.py#L3121. That's where the boxing & unboxing of the datetimes is from. I haven't yet discovered how the alternative path avoids this work. If anyone has priors please lmk! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Stack + to_array before to_xarray is much faster that a simple to_xarray 365973662 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1