github: issue_comments: 11 rows where issue = 255989233 sorted by updated

11 rows where issue = 255989233 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
412407141	https://github.com/pydata/xarray/issues/1560#issuecomment-412407141	https://api.github.com/repos/pydata/xarray/issues/1560	MDEyOklzc3VlQ29tbWVudDQxMjQwNzE0MQ==	shoyer 1217238	2018-08-13T04:44:09Z	2018-08-13T04:44:09Z	MEMBER	@maahn yes, that would look fine to me. Please add an ASV benchmark so we can monitor this for regressions: https://github.com/pydata/xarray/tree/master/asv_bench/benchmarks It would be nice to push this up this optimization into `reindex_variables`, but it's not necessary (and I'm not even sure it could be done as efficiently as the equals check in `unstack`).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.unstack taking unreasonable amounts of memory 255989233
411815694	https://github.com/pydata/xarray/issues/1560#issuecomment-411815694	https://api.github.com/repos/pydata/xarray/issues/1560	MDEyOklzc3VlQ29tbWVudDQxMTgxNTY5NA==	maahn 222557	2018-08-09T16:21:41Z	2018-08-09T16:21:41Z	NONE	What about a quick fix with `index.equals` like this (without the prints of course): https://github.com/maahn/xarray/commit/cf83991a161fbd89af2029a69cb50f1e09a5ed45. For the example above `arr = xr.DataArray(np.empty([1, 8996, 9223])) arr = arr.stack(flat_dim=['dim_1', 'dim_2']) %time arr.unstack('flat_dim')` the modified routine takes 5.75 s in comparison to 6min 40s with xr 0.10.7 and pd 0.23.3. Not sure whether this is related to a newer version, but `index.equals(full_idx)` takes actually only 2e-4 s in that example. When slicing or reordering is applied to the MultiIndex `arr = xr.DataArray(np.arange(20).reshape((1, 10, 2))).stack(flat_dim=['dim_1', 'dim_2']) arr.isel(flat_dim = [1,2]).unstack('flat_dim')` or `arr = xr.DataArray(np.arange(20).reshape((1, 10, 2))).stack(flat_dim=['dim_1', 'dim_2']) arr[:,::-1].unstack('flat_dim')` it will fall back to the old method with `reindex`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.unstack taking unreasonable amounts of memory 255989233
327896138	https://github.com/pydata/xarray/issues/1560#issuecomment-327896138	https://api.github.com/repos/pydata/xarray/issues/1560	MDEyOklzc3VlQ29tbWVudDMyNzg5NjEzOA==	shoyer 1217238	2017-09-07T19:12:50Z	2017-09-07T19:12:50Z	MEMBER	Though possibly we should just be using `Index.reindex` directly inside `reindex_variables` (in `xarray/core/alignment.py`) instead of calling `get_indexer`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.unstack taking unreasonable amounts of memory 255989233
327895477	https://github.com/pydata/xarray/issues/1560#issuecomment-327895477	https://api.github.com/repos/pydata/xarray/issues/1560	MDEyOklzc3VlQ29tbWVudDMyNzg5NTQ3Nw==	shoyer 1217238	2017-09-07T19:10:03Z	2017-09-07T19:10:03Z	MEMBER	@davidh-ssec Yes, but we need it for `MultiIndex.get_indexer`, not `MultiIndex.reindex`: https://github.com/pandas-dev/pandas/blob/ee6185e2fb9461632949f3ba52a28b37a1f7296e/pandas/core/indexes/multi.py#L1781	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.unstack taking unreasonable amounts of memory 255989233
327894887	https://github.com/pydata/xarray/issues/1560#issuecomment-327894887	https://api.github.com/repos/pydata/xarray/issues/1560	MDEyOklzc3VlQ29tbWVudDMyNzg5NDg4Nw==	djhoese 1828519	2017-09-07T19:07:40Z	2017-09-07T19:07:40Z	CONTRIBUTOR	@shoyer As for the equals shortcut, isn't that what this line is doing: https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/multi.py#L1864	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.unstack taking unreasonable amounts of memory 255989233
327891893	https://github.com/pydata/xarray/issues/1560#issuecomment-327891893	https://api.github.com/repos/pydata/xarray/issues/1560	MDEyOklzc3VlQ29tbWVudDMyNzg5MTg5Mw==	mraspaud 167802	2017-09-07T18:55:39Z	2017-09-07T18:55:39Z	CONTRIBUTOR	Yes, I have the latest version, still takes some time with a 9000x9000 array: `In [4]: %time arr.unstack('flat_dim') CPU times: user 26.1 s, sys: 7.8 s, total: 33.9 s Wall time: 35.3 s`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.unstack taking unreasonable amounts of memory 255989233
327890644	https://github.com/pydata/xarray/issues/1560#issuecomment-327890644	https://api.github.com/repos/pydata/xarray/issues/1560	MDEyOklzc3VlQ29tbWVudDMyNzg5MDY0NA==	shoyer 1217238	2017-09-07T18:50:50Z	2017-09-07T18:50:50Z	MEMBER	The MultiIndex speed/memory improvements seem to be around even in pandas 0.20.3, the latest release. So definitely make sure your pandas install is up to date here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.unstack taking unreasonable amounts of memory 255989233
327886998	https://github.com/pydata/xarray/issues/1560#issuecomment-327886998	https://api.github.com/repos/pydata/xarray/issues/1560	MDEyOklzc3VlQ29tbWVudDMyNzg4Njk5OA==	shoyer 1217238	2017-09-07T18:36:48Z	2017-09-07T18:36:48Z	MEMBER	This is still somewhat annoyingly slow, but for a 8000 x 9000 MultiIndex on pandas 0.21-dev, I measure 41 seconds for `get_indexer()` vs 3.8 seconds for `equals()`. So a fast-path might still be a good idea, but to get to truly interactive speeds, we might need a faster way to validate a MultiIndex as equal to the outer-product of its levels. Potentially we could save some metadata in `PandasIndexAdapter` as part of `stack()` to indicate that the levels are from an outer product: https://github.com/pydata/xarray/blob/98a05f11c6f38489c82e86c9e9df796e7fb65fd2/xarray/core/indexing.py#L502-L505	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.unstack taking unreasonable amounts of memory 255989233
327884467	https://github.com/pydata/xarray/issues/1560#issuecomment-327884467	https://api.github.com/repos/pydata/xarray/issues/1560	MDEyOklzc3VlQ29tbWVudDMyNzg4NDQ2Nw==	shoyer 1217238	2017-09-07T18:27:27Z	2017-09-07T18:27:27Z	MEMBER	Actually, the timings above were with pandas 0.19. It's still somewhat slow using the dev version of pandas, but it's more like 10x slower rather than 100x slower: ``` In [4]: idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: %time idx1.get_indexer(idx2) ...: CPU times: user 215 ms, sys: 81.8 ms, total: 297 ms Wall time: 319 ms Out[4]: array([ 0, 1, 2, ..., 999997, 999998, 999999]) In [5]: idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) ...: %time idx1.equals(idx2) ...: CPU times: user 19.8 ms, sys: 9.29 ms, total: 29.1 ms Wall time: 32.1 ms Out[5]: True ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.unstack taking unreasonable amounts of memory 255989233
327882144	https://github.com/pydata/xarray/issues/1560#issuecomment-327882144	https://api.github.com/repos/pydata/xarray/issues/1560	MDEyOklzc3VlQ29tbWVudDMyNzg4MjE0NA==	shoyer 1217238	2017-09-07T18:19:03Z	2017-09-07T18:19:03Z	MEMBER	Indeed, unstack does seem to be quite slow on large dimensions. For 1000x1000, I measure only 10ms to stack, but 4 seconds to unstack: `%time arr = DataArray(np.empty([1, 1000, 1000])).stack(flat_dim=['dim_1', 'dim_2']) %time arr.unstack('flat_dim')` Profiling suggests the culprit is the `reindex` call in `unstack()`: https://github.com/pydata/xarray/blob/98a05f11c6f38489c82e86c9e9df796e7fb65fd2/xarray/core/dataset.py#L1896 And, in turn, the call to `pandas.MultiIndex.get_indexer()`. To reproduce with pure pandas: ``` idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) %time idx1.get_indexer(idx2) CPU times: user 4.1 s, sys: 128 ms, total: 4.23 s Wall time: 4.41 s ``` We do need this reindex for correctness, but we should have a separate fast-path of some sort (either here or in pandas) to speed this up when the two indexes are identical. For example, note: ``` idx1 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) idx2 = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)]) %time idx1.equals(idx2) CPU times: user 19 ms, sys: 0 ns, total: 19 ms Wall time: 18.5 ms ``` I'll file an issue on the pandas tracker.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.unstack taking unreasonable amounts of memory 255989233
327849071	https://github.com/pydata/xarray/issues/1560#issuecomment-327849071	https://api.github.com/repos/pydata/xarray/issues/1560	MDEyOklzc3VlQ29tbWVudDMyNzg0OTA3MQ==	djhoese 1828519	2017-09-07T16:15:06Z	2017-09-07T16:15:06Z	CONTRIBUTOR	I was able to reproduce this on my mac by watching Activity Monitor and saw a peak of ~8GB of memory during the `unstack` call.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.unstack taking unreasonable amounts of memory 255989233

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

11 rows where issue = 255989233 sorted by updated_at descending

CPU times: user 4.1 s, sys: 128 ms, total: 4.23 s

Wall time: 4.41 s

CPU times: user 19 ms, sys: 0 ns, total: 19 ms

Wall time: 18.5 ms

Advanced export