github: issue_comments: 5 rows where issue = 675482176 sorted by updated

5 rows where issue = 675482176 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1507213204	https://github.com/pydata/xarray/issues/4325#issuecomment-1507213204	https://api.github.com/repos/pydata/xarray/issues/4325	IC_kwDOAMm_X85Z1j-U	dcherian 2448579	2023-04-13T15:56:51Z	2023-04-13T15:56:51Z	MEMBER	Over in https://github.com/pydata/xarray/issues/7344#issuecomment-1336299057 @shoyer That said -- we could also switch to smarter NumPy based algorithms to implement most moving window calculations, e.g,. using np.nancumsum for moving window means. After some digging, this would involve using "summed area tables" which have been generalized to nD, and can be used to compute all our built-in reductions (except median). Basically we'd store the summed area table (repeated `np.cumsum`) and then calculate reductions using binary ops (mostly subtraction) on those tables. This would be an intermediate level project but we could implement it incrementally (start with `sum` for example). One downside is the potential for floating point inaccuracies because we're taking differences of potentially large numbers. cc @aulemahal	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Optimize ndrolling nanreduce 675482176
1507201606	https://github.com/pydata/xarray/issues/4325#issuecomment-1507201606	https://api.github.com/repos/pydata/xarray/issues/4325	IC_kwDOAMm_X85Z1hJG	tbloch1 34276374	2023-04-13T15:48:31Z	2023-04-13T15:48:31Z	NONE	I think I may have found a way to make the variance/standard deviation calculation more memory efficient, but I don't know enough about writing the sort of code that would be needed for a PR. I basically wrote out the calculation for variance trying to only use the functions that have already been optimsed. Derived from: $$ var = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2 $$ $$ var = \frac{1}{n} \left( (x_1 - \mu)^2 + (x_2 - \mu)^2 + (x_3 - \mu)^2 + ... \right) $$ $$ var = \frac{1}{n} \left(x_1^2 -2x_1\mu + \mu^2 + \ x_2^2 -2x_2\mu + \mu^2 + \ x_3^2 -2x_3\mu + \mu^2 + ... \right) $$ $$ var = \frac{1}{n} \left( \sum_{i=1}^{n} x_i^2 - 2\mu\sum_{i=1}^{n} x_i + n\mu^2 \right)$$ I coded this up and demonstrate that it uses approximately 10% of the memory as the current `.var()` implementation: ```python %load_ext memory_profiler import numpy as np import xarray as xr temp = xr.DataArray(np.random.randint(0, 10, (5000, 500)), dims=("x", "y")) def new_var(da, x=10, y=20): # Defining the re-used parts roll = da.rolling(x=x, y=y) mean = roll.mean() count = roll.count() # First term: sum of squared values term1 = (da2).rolling(x=x, y=y).sum() # Second term cross term sum term2 = -2 * mean * roll.sum() # Third term 'sum' of squared means term3 = count * mean2 # Combining into the variance var = (term1 + term2 + term3) / count return var def old_var(da, x=10, y=20): roll = da.rolling(x=x, y=y) var = roll.var() return var %memit new_var(temp) %memit old_var(temp) ``` `peak memory: 429.77 MiB, increment: 134.92 MiB peak memory: 5064.07 MiB, increment: 4768.45 MiB` I wanted to double check that the calculation was working correctly: `python print((var_o.where(~np.isnan(var_o), 0) == var_n.where(~np.isnan(var_n), 0)).all().values) print(np.allclose(var_o, var_n, equal_nan = True))` `False True` I think the difference here is just due to floating point errors, but maybe someone who knows how to check that in more detail could have a look. The standard deviation can be trivially implemented from this if the approach works.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Optimize ndrolling nanreduce 675482176
716399575	https://github.com/pydata/xarray/issues/4325#issuecomment-716399575	https://api.github.com/repos/pydata/xarray/issues/4325	MDEyOklzc3VlQ29tbWVudDcxNjM5OTU3NQ==	mathause 10194086	2020-10-26T08:40:51Z	2021-02-18T15:39:40Z	MEMBER	This is already done for `counts`, correct? Here: https://github.com/pydata/xarray/blob/1597e3a91eaf96626725987d23bbda2a80d2bae7/xarray/core/rolling.py#L370-L382 This should work for most of the reductions (and is a bit similar to what is done in `weighted` for `mean` and `sum`): [x] `count`: `isnull()` -> `rolling` -> `sum` [x] `argmax`: `fillna(-inf)` -> `rolling` -> `argmax` [x] `argmin`: `fillna(inf)` -> `rolling` -> `argmin` [x] `max`: `fillna(-inf)` -> `rolling` -> `max` (not sure about this one, need to be careful with the dtype) [x] `min`: `fillna(inf)` -> `rolling` -> `min` (dito) [x] `mean`: `fillna(0)` -> `rolling` -> `sum / count` (ensure nan if `count == 0`) [x] `prod`: `fillna(1)` -> `rolling` -> `prod` [x] `sum`: `fillna(0)` -> `rolling` -> `sum` [ ] `var`: `fillna(0)` -> `rolling` -> possible (?) but a bit more involved [ ] `std`: `sqrt(var)` [ ] `median`: probably not possible I think this should not be too difficult, the thing is that rolling itself is already quite complicated	{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 2, "rocket": 0, "eyes": 0 }	Optimize ndrolling nanreduce 675482176
741734592	https://github.com/pydata/xarray/issues/4325#issuecomment-741734592	https://api.github.com/repos/pydata/xarray/issues/4325	MDEyOklzc3VlQ29tbWVudDc0MTczNDU5Mg==	mathause 10194086	2020-12-09T12:17:15Z	2020-12-09T12:17:15Z	MEMBER	I just saw that numpy 1.20 introduces `stride_tricks.sliding_window_view`. I have not looked at this yet. Just leaving this here for reference. https://numpy.org/devdocs/reference/generated/numpy.lib.stride_tricks.sliding_window_view.html#numpy.lib.stride_tricks.sliding_window_view https://numpy.org/devdocs/release/1.20.0-notes.html#sliding-window-view-provides-a-sliding-window-view-for-numpy-arrays https://github.com/numpy/numpy/pull/17394	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Optimize ndrolling nanreduce 675482176
717572036	https://github.com/pydata/xarray/issues/4325#issuecomment-717572036	https://api.github.com/repos/pydata/xarray/issues/4325	MDEyOklzc3VlQ29tbWVudDcxNzU3MjAzNg==	fujiisoup 6815844	2020-10-27T22:14:41Z	2020-10-27T22:14:41Z	MEMBER	@mathause Oh, I missed this issue. Yes, this is implemented only for count. the thing is that rolling itself is already quite complicated Agreed. We need to clean this up. One possible option would be to drop support of bottleneck. This does not work for nd-rolling and if we implement the nd-nanreduce, the speed should be comparable with bottleneck.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Optimize ndrolling nanreduce 675482176

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);