issue_comments
11 rows where issue = 255989233 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- DataArray.unstack taking unreasonable amounts of memory · 11 ✖
| id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue | 
|---|---|---|---|---|---|---|---|---|---|---|---|
| 412407141 | https://github.com/pydata/xarray/issues/1560#issuecomment-412407141 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDQxMjQwNzE0MQ== | shoyer 1217238 | 2018-08-13T04:44:09Z | 2018-08-13T04:44:09Z | MEMBER | @maahn yes, that would look fine to me. Please add an ASV benchmark so we can monitor this for regressions: https://github.com/pydata/xarray/tree/master/asv_bench/benchmarks It would be nice to push this up this optimization into  | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 411815694 | https://github.com/pydata/xarray/issues/1560#issuecomment-411815694 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDQxMTgxNTY5NA== | maahn 222557 | 2018-08-09T16:21:41Z | 2018-08-09T16:21:41Z | NONE | What about a quick fix with  the modified routine takes 5.75 s in comparison to 6min 40s with xr 0.10.7 and pd 0.23.3. Not sure whether this is related to a newer version, but  or it will fall back to the old method with  | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327896138 | https://github.com/pydata/xarray/issues/1560#issuecomment-327896138 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5NjEzOA== | shoyer 1217238 | 2017-09-07T19:12:50Z | 2017-09-07T19:12:50Z | MEMBER | Though possibly we should just be using  | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327895477 | https://github.com/pydata/xarray/issues/1560#issuecomment-327895477 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5NTQ3Nw== | shoyer 1217238 | 2017-09-07T19:10:03Z | 2017-09-07T19:10:03Z | MEMBER | @davidh-ssec Yes, but we need it for  | {
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327894887 | https://github.com/pydata/xarray/issues/1560#issuecomment-327894887 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5NDg4Nw== | djhoese 1828519 | 2017-09-07T19:07:40Z | 2017-09-07T19:07:40Z | CONTRIBUTOR | @shoyer As for the equals shortcut, isn't that what this line is doing: https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/multi.py#L1864 | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327891893 | https://github.com/pydata/xarray/issues/1560#issuecomment-327891893 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5MTg5Mw== | mraspaud 167802 | 2017-09-07T18:55:39Z | 2017-09-07T18:55:39Z | CONTRIBUTOR | Yes, I have the latest version, still takes some time with a 9000x9000 array:
 | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327890644 | https://github.com/pydata/xarray/issues/1560#issuecomment-327890644 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5MDY0NA== | shoyer 1217238 | 2017-09-07T18:50:50Z | 2017-09-07T18:50:50Z | MEMBER | The MultiIndex speed/memory improvements seem to be around even in pandas 0.20.3, the latest release. So definitely make sure your pandas install is up to date here. | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327886998 | https://github.com/pydata/xarray/issues/1560#issuecomment-327886998 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg4Njk5OA== | shoyer 1217238 | 2017-09-07T18:36:48Z | 2017-09-07T18:36:48Z | MEMBER | This is still somewhat annoyingly slow, but for a 8000 x 9000 MultiIndex on pandas 0.21-dev, I measure 41 seconds for  So a fast-path might still be a good idea, but to get to truly interactive speeds, we might need a faster way to validate a MultiIndex as equal to the outer-product of its levels. Potentially we could save some metadata in  | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327884467 | https://github.com/pydata/xarray/issues/1560#issuecomment-327884467 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg4NDQ2Nw== | shoyer 1217238 | 2017-09-07T18:27:27Z | 2017-09-07T18:27:27Z | MEMBER | Actually, the timings above were with pandas 0.19. It's still somewhat slow using the dev version of pandas, but it's more like 10x slower rather than 100x slower: ``` In [4]: idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: %time idx1.get_indexer(idx2) ...: CPU times: user 215 ms, sys: 81.8 ms, total: 297 ms Wall time: 319 ms Out[4]: array([ 0, 1, 2, ..., 999997, 999998, 999999]) In [5]: idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: %time idx1.equals(idx2) ...: CPU times: user 19.8 ms, sys: 9.29 ms, total: 29.1 ms Wall time: 32.1 ms Out[5]: True ``` | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327882144 | https://github.com/pydata/xarray/issues/1560#issuecomment-327882144 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg4MjE0NA== | shoyer 1217238 | 2017-09-07T18:19:03Z | 2017-09-07T18:19:03Z | MEMBER | Indeed, unstack does seem to be quite slow on large dimensions. For 1000x1000, I measure only 10ms to stack, but 4 seconds to unstack:
 Profiling suggests the culprit is the  And, in turn, the call to  CPU times: user 4.1 s, sys: 128 ms, total: 4.23 sWall time: 4.41 s``` We do need this reindex for correctness, but we should have a separate fast-path of some sort (either here or in pandas) to speed this up when the two indexes are identical. For example, note: ``` idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) %time idx1.equals(idx2) CPU times: user 19 ms, sys: 0 ns, total: 19 msWall time: 18.5 ms``` I'll file an issue on the pandas tracker. | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327849071 | https://github.com/pydata/xarray/issues/1560#issuecomment-327849071 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg0OTA3MQ== | djhoese 1828519 | 2017-09-07T16:15:06Z | 2017-09-07T16:15:06Z | CONTRIBUTOR | I was able to reproduce this on my mac by watching Activity Monitor and saw a peak of ~8GB of memory during the  | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | DataArray.unstack taking unreasonable amounts of memory 255989233 | 
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
user 4