issue_comments
7 rows where author_association = "MEMBER", issue = 255989233 and user = 1217238 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- DataArray.unstack taking unreasonable amounts of memory · 7 ✖
| id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 412407141 | https://github.com/pydata/xarray/issues/1560#issuecomment-412407141 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDQxMjQwNzE0MQ== | shoyer 1217238 | 2018-08-13T04:44:09Z | 2018-08-13T04:44:09Z | MEMBER | @maahn yes, that would look fine to me. Please add an ASV benchmark so we can monitor this for regressions: https://github.com/pydata/xarray/tree/master/asv_bench/benchmarks It would be nice to push this up this optimization into |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327896138 | https://github.com/pydata/xarray/issues/1560#issuecomment-327896138 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5NjEzOA== | shoyer 1217238 | 2017-09-07T19:12:50Z | 2017-09-07T19:12:50Z | MEMBER | Though possibly we should just be using |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327895477 | https://github.com/pydata/xarray/issues/1560#issuecomment-327895477 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5NTQ3Nw== | shoyer 1217238 | 2017-09-07T19:10:03Z | 2017-09-07T19:10:03Z | MEMBER | @davidh-ssec Yes, but we need it for |
{
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327890644 | https://github.com/pydata/xarray/issues/1560#issuecomment-327890644 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg5MDY0NA== | shoyer 1217238 | 2017-09-07T18:50:50Z | 2017-09-07T18:50:50Z | MEMBER | The MultiIndex speed/memory improvements seem to be around even in pandas 0.20.3, the latest release. So definitely make sure your pandas install is up to date here. |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327886998 | https://github.com/pydata/xarray/issues/1560#issuecomment-327886998 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg4Njk5OA== | shoyer 1217238 | 2017-09-07T18:36:48Z | 2017-09-07T18:36:48Z | MEMBER | This is still somewhat annoyingly slow, but for a 8000 x 9000 MultiIndex on pandas 0.21-dev, I measure 41 seconds for So a fast-path might still be a good idea, but to get to truly interactive speeds, we might need a faster way to validate a MultiIndex as equal to the outer-product of its levels. Potentially we could save some metadata in |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327884467 | https://github.com/pydata/xarray/issues/1560#issuecomment-327884467 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg4NDQ2Nw== | shoyer 1217238 | 2017-09-07T18:27:27Z | 2017-09-07T18:27:27Z | MEMBER | Actually, the timings above were with pandas 0.19. It's still somewhat slow using the dev version of pandas, but it's more like 10x slower rather than 100x slower: ``` In [4]: idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: %time idx1.get_indexer(idx2) ...: CPU times: user 215 ms, sys: 81.8 ms, total: 297 ms Wall time: 319 ms Out[4]: array([ 0, 1, 2, ..., 999997, 999998, 999999]) In [5]: idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: %time idx1.equals(idx2) ...: CPU times: user 19.8 ms, sys: 9.29 ms, total: 29.1 ms Wall time: 32.1 ms Out[5]: True ``` |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
DataArray.unstack taking unreasonable amounts of memory 255989233 | |
| 327882144 | https://github.com/pydata/xarray/issues/1560#issuecomment-327882144 | https://api.github.com/repos/pydata/xarray/issues/1560 | MDEyOklzc3VlQ29tbWVudDMyNzg4MjE0NA== | shoyer 1217238 | 2017-09-07T18:19:03Z | 2017-09-07T18:19:03Z | MEMBER | Indeed, unstack does seem to be quite slow on large dimensions. For 1000x1000, I measure only 10ms to stack, but 4 seconds to unstack:
Profiling suggests the culprit is the And, in turn, the call to CPU times: user 4.1 s, sys: 128 ms, total: 4.23 sWall time: 4.41 s``` We do need this reindex for correctness, but we should have a separate fast-path of some sort (either here or in pandas) to speed this up when the two indexes are identical. For example, note: ``` idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) %time idx1.equals(idx2) CPU times: user 19 ms, sys: 0 ns, total: 19 msWall time: 18.5 ms``` I'll file an issue on the pandas tracker. |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
DataArray.unstack taking unreasonable amounts of memory 255989233 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] (
[html_url] TEXT,
[issue_url] TEXT,
[id] INTEGER PRIMARY KEY,
[node_id] TEXT,
[user] INTEGER REFERENCES [users]([id]),
[created_at] TEXT,
[updated_at] TEXT,
[author_association] TEXT,
[body] TEXT,
[reactions] TEXT,
[performed_via_github_app] TEXT,
[issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
ON [issue_comments] ([user]);
user 1