issue_comments
5 rows where issue = 675482176 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Optimize ndrolling nanreduce · 5 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1507213204 | https://github.com/pydata/xarray/issues/4325#issuecomment-1507213204 | https://api.github.com/repos/pydata/xarray/issues/4325 | IC_kwDOAMm_X85Z1j-U | dcherian 2448579 | 2023-04-13T15:56:51Z | 2023-04-13T15:56:51Z | MEMBER | Over in https://github.com/pydata/xarray/issues/7344#issuecomment-1336299057 @shoyer
After some digging, this would involve using "summed area tables" which have been generalized to nD, and can be used to compute all our built-in reductions (except median). Basically we'd store the summed area table (repeated This would be an intermediate level project but we could implement it incrementally (start with cc @aulemahal |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Optimize ndrolling nanreduce 675482176 | |
1507201606 | https://github.com/pydata/xarray/issues/4325#issuecomment-1507201606 | https://api.github.com/repos/pydata/xarray/issues/4325 | IC_kwDOAMm_X85Z1hJG | tbloch1 34276374 | 2023-04-13T15:48:31Z | 2023-04-13T15:48:31Z | NONE | I think I may have found a way to make the variance/standard deviation calculation more memory efficient, but I don't know enough about writing the sort of code that would be needed for a PR. I basically wrote out the calculation for variance trying to only use the functions that have already been optimsed. Derived from: $$ var = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2 $$ $$ var = \frac{1}{n} \left( (x_1 - \mu)^2 + (x_2 - \mu)^2 + (x_3 - \mu)^2 + ... \right) $$ $$ var = \frac{1}{n} \left(x_1^2 -2x_1\mu + \mu^2 + \ x_2^2 -2x_2\mu + \mu^2 + \ x_3^2 -2x_3\mu + \mu^2 + ... \right) $$ $$ var = \frac{1}{n} \left( \sum_{i=1}^{n} x_i^2 - 2\mu\sum_{i=1}^{n} x_i + n\mu^2 \right)$$ I coded this up and demonstrate that it uses approximately 10% of the memory as the current ```python %load_ext memory_profiler import numpy as np import xarray as xr temp = xr.DataArray(np.random.randint(0, 10, (5000, 500)), dims=("x", "y")) def new_var(da, x=10, y=20): # Defining the re-used parts roll = da.rolling(x=x, y=y) mean = roll.mean() count = roll.count() # First term: sum of squared values term1 = (da2).rolling(x=x, y=y).sum() # Second term cross term sum term2 = -2 * mean * roll.sum() # Third term 'sum' of squared means term3 = count * mean2 # Combining into the variance var = (term1 + term2 + term3) / count return var def old_var(da, x=10, y=20): roll = da.rolling(x=x, y=y) var = roll.var() return var %memit new_var(temp) %memit old_var(temp) ```
I wanted to double check that the calculation was working correctly:
I think the difference here is just due to floating point errors, but maybe someone who knows how to check that in more detail could have a look. The standard deviation can be trivially implemented from this if the approach works. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Optimize ndrolling nanreduce 675482176 | |
716399575 | https://github.com/pydata/xarray/issues/4325#issuecomment-716399575 | https://api.github.com/repos/pydata/xarray/issues/4325 | MDEyOklzc3VlQ29tbWVudDcxNjM5OTU3NQ== | mathause 10194086 | 2020-10-26T08:40:51Z | 2021-02-18T15:39:40Z | MEMBER | This is already done for This should work for most of the reductions (and is a bit similar to what is done in
I think this should not be too difficult, the thing is that rolling itself is already quite complicated |
{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 2, "rocket": 0, "eyes": 0 } |
Optimize ndrolling nanreduce 675482176 | |
741734592 | https://github.com/pydata/xarray/issues/4325#issuecomment-741734592 | https://api.github.com/repos/pydata/xarray/issues/4325 | MDEyOklzc3VlQ29tbWVudDc0MTczNDU5Mg== | mathause 10194086 | 2020-12-09T12:17:15Z | 2020-12-09T12:17:15Z | MEMBER | I just saw that numpy 1.20 introduces |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Optimize ndrolling nanreduce 675482176 | |
717572036 | https://github.com/pydata/xarray/issues/4325#issuecomment-717572036 | https://api.github.com/repos/pydata/xarray/issues/4325 | MDEyOklzc3VlQ29tbWVudDcxNzU3MjAzNg== | fujiisoup 6815844 | 2020-10-27T22:14:41Z | 2020-10-27T22:14:41Z | MEMBER | @mathause Oh, I missed this issue. Yes, this is implemented only for count.
Agreed. We need to clean this up. One possible option would be to drop support of bottleneck. This does not work for nd-rolling and if we implement the nd-nanreduce, the speed should be comparable with bottleneck. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Optimize ndrolling nanreduce 675482176 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4