issue_comments
11 rows where issue = 255989233 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- DataArray.unstack taking unreasonable amounts of memory · 11 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
412407141 | https://github.com/pydata/xarray/issues/1560#issuecomment-412407141 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDQxMjQwNzE0MQ== | shoyer 1217238 | 2018-08-13T04:44:09Z | 2018-08-13T04:44:09Z | MEMBER | @maahn yes, that would look fine to me. Please add an ASV benchmark so we can monitor this for regressions: https://github.com/pydata/xarray/tree/master/asv_bench/benchmarks It would be nice to push this up this optimization into |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
411815694 | https://github.com/pydata/xarray/issues/1560#issuecomment-411815694 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDQxMTgxNTY5NA== | maahn 222557 | 2018-08-09T16:21:41Z | 2018-08-09T16:21:41Z | NONE | What about a quick fix with
the modified routine takes 5.75 s in comparison to 6min 40s with xr 0.10.7 and pd 0.23.3. Not sure whether this is related to a newer version, but
or
it will fall back to the old method with |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
327896138 | https://github.com/pydata/xarray/issues/1560#issuecomment-327896138 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5NjEzOA== | shoyer 1217238 | 2017-09-07T19:12:50Z | 2017-09-07T19:12:50Z | MEMBER | Though possibly we should just be using |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
327895477 | https://github.com/pydata/xarray/issues/1560#issuecomment-327895477 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5NTQ3Nw== | shoyer 1217238 | 2017-09-07T19:10:03Z | 2017-09-07T19:10:03Z | MEMBER | @davidh-ssec Yes, but we need it for |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
327894887 | https://github.com/pydata/xarray/issues/1560#issuecomment-327894887 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5NDg4Nw== | djhoese 1828519 | 2017-09-07T19:07:40Z | 2017-09-07T19:07:40Z | CONTRIBUTOR | @shoyer As for the equals shortcut, isn't that what this line is doing: https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/multi.py#L1864 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
327891893 | https://github.com/pydata/xarray/issues/1560#issuecomment-327891893 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5MTg5Mw== | mraspaud 167802 | 2017-09-07T18:55:39Z | 2017-09-07T18:55:39Z | CONTRIBUTOR | Yes, I have the latest version, still takes some time with a 9000x9000 array:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
327890644 | https://github.com/pydata/xarray/issues/1560#issuecomment-327890644 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5MDY0NA== | shoyer 1217238 | 2017-09-07T18:50:50Z | 2017-09-07T18:50:50Z | MEMBER | The MultiIndex speed/memory improvements seem to be around even in pandas 0.20.3, the latest release. So definitely make sure your pandas install is up to date here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
327886998 | https://github.com/pydata/xarray/issues/1560#issuecomment-327886998 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg4Njk5OA== | shoyer 1217238 | 2017-09-07T18:36:48Z | 2017-09-07T18:36:48Z | MEMBER | This is still somewhat annoyingly slow, but for a 8000 x 9000 MultiIndex on pandas 0.21-dev, I measure 41 seconds for So a fast-path might still be a good idea, but to get to truly interactive speeds, we might need a faster way to validate a MultiIndex as equal to the outer-product of its levels. Potentially we could save some metadata in |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
327884467 | https://github.com/pydata/xarray/issues/1560#issuecomment-327884467 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg4NDQ2Nw== | shoyer 1217238 | 2017-09-07T18:27:27Z | 2017-09-07T18:27:27Z | MEMBER | Actually, the timings above were with pandas 0.19. It's still somewhat slow using the dev version of pandas, but it's more like 10x slower rather than 100x slower: ``` In [4]: idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: %time idx1.get_indexer(idx2) ...: CPU times: user 215 ms, sys: 81.8 ms, total: 297 ms Wall time: 319 ms Out[4]: array([ 0, 1, 2, ..., 999997, 999998, 999999]) In [5]: idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: %time idx1.equals(idx2) ...: CPU times: user 19.8 ms, sys: 9.29 ms, total: 29.1 ms Wall time: 32.1 ms Out[5]: True ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
327882144 | https://github.com/pydata/xarray/issues/1560#issuecomment-327882144 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg4MjE0NA== | shoyer 1217238 | 2017-09-07T18:19:03Z | 2017-09-07T18:19:03Z | MEMBER | Indeed, unstack does seem to be quite slow on large dimensions. For 1000x1000, I measure only 10ms to stack, but 4 seconds to unstack:
Profiling suggests the culprit is the And, in turn, the call to CPU times: user 4.1 s, sys: 128 ms, total: 4.23 sWall time: 4.41 s``` We do need this reindex for correctness, but we should have a separate fast-path of some sort (either here or in pandas) to speed this up when the two indexes are identical. For example, note: ``` idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) %time idx1.equals(idx2) CPU times: user 19 ms, sys: 0 ns, total: 19 msWall time: 18.5 ms``` I'll file an issue on the pandas tracker. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
327849071 | https://github.com/pydata/xarray/issues/1560#issuecomment-327849071 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg0OTA3MQ== | djhoese 1828519 | 2017-09-07T16:15:06Z | 2017-09-07T16:15:06Z | CONTRIBUTOR | I was able to reproduce this on my mac by watching Activity Monitor and saw a peak of ~8GB of memory during the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.unstack taking unreasonable amounts of memory 255989233 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4