github: issue_comments: 22 rows where issue = 437765416 sorted by updated

22 rows where issue = 437765416 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
601612380	https://github.com/pydata/xarray/pull/2922#issuecomment-601612380	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTYxMjM4MA==	mathause 10194086	2020-03-20T09:45:23Z	2020-10-27T14:47:22Z	MEMBER	tldr: if someone knows how to do memory profiling with reasonable effort this can still be changed It's certainly not too late to change the "backend" of the weighting functions. I once tried to profile the memory usage but I gave up at some point (I think I would have needed to annotate a ton of functions, also in numpy). @fujiisoup suggested using `xr.dot(a, b)` (instead of `(a * b).sum()`) to ameliorate part of the memory footprint. ~Which is done, however, this comes at a performance penalty. So an alternative code path might actually be beneficial.~ (edit: I now think `xr.dot` is actually faster, except for very small arrays). Also `mask` is an array of dtype bool, which should help. It think it should not be very difficult to write something that can be passed to `apply_ufinc`, probably similar to: https://github.com/pydata/xarray/blob/e8a284f341645a63a4d83676a6b268394c721bbc/xarray/tests/test_weighted.py#L161 So there would be three possibilities: (1) the current implementation (using `xr.dot(a, b)`) (2) something similar to `expected_weighted` (using `(a * b).sum()`) (3) `xr.apply_ufunc(a, b, expected_weighted, ...)`. I assume (2) is fastest with the largest memory footprint, but I cannot tell about (1) and (3).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601885539	https://github.com/pydata/xarray/pull/2922#issuecomment-601885539	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTg4NTUzOQ==	seth-p 7441788	2020-03-20T19:57:54Z	2020-03-20T20:00:20Z	CONTRIBUTOR	All good points: What could be done, though is to only do da = da.fillna(0.0) if da contains NaNs. Good idea, though I don't know what the performance hit would be of the extra check (in the case that da does contain NaNs, so the check is for naught). I assume so. I don't know what kind of temporary variables np.einsum creates. Also np.einsum is wrapped in xr.apply_ufunc so all kinds of magic is going on. Well, `(da * weights)` will be at least as large as `da`. I'm not certain, but I don't think np.einsum creates huge temporary arrays. Do you want to leave it away for performance reasons? Because it was a deliberate decision to not support NaNs in the weights and I don't think this is going to change. Yes. You can continue not supporting NaNs in the weights, yet not explicitly check that there are no NaNs (optionally, if the caller assures you that there are no NaNs). None of your suggested functions support NaNs so they won't work. Correct. These have nothing to do with the NaNs issue. For profiling memory usage, I use `psutil.Process(os.getpid()).memory_info().rss` for current usage and `resource.getusage(resource.RUSAGE_SElF).ru_maxrss` for peak usage (on linux).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601824129	https://github.com/pydata/xarray/pull/2922#issuecomment-601824129	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTgyNDEyOQ==	mathause 10194086	2020-03-20T17:31:15Z	2020-03-20T17:31:15Z	MEMBER	There is some stuff I can do to reduce the memory footprint if `skipna=False` or `not da.isnull().any()`. Also, the functions should support `dask` arrays out of the box. ideally `dot()` would support `skipna`, so you could eliminate the `da = da.fillna(0.0)` and pass the `skipna` down the line. But alas it doesn't... Yes, this would be nice. `xr.dot` uses `np.einsum` which is quite a beast that I don't entirely see through. I don't expect it to support `NaN`s any time soon. What could be done, though is to only do `da = da.fillna(0.0)` if `da` contains `NaN`s. `(da * weights).sum(dim=dim, skipna=skipna)` would likely make things worse, I think, as it would necessarily create a temporary array of sized at least `da`, no? I assume so. I don't know what kind of temporary variables `np.einsum` creates. Also `np.einsum` is wrapped in `xr.apply_ufunc` so all kinds of magic is going on. Either way, this only addresses the `da = da.fillna(0.0)`, not the `mask = da.notnull()`. Again this could be avoided if `skipna=False` or if (and only if) there are no `NaN`s in da. Also, perhaps the test `if weights.isnull().any()` in `Weighted.__init__()` should be optional? Do you want to leave it away for performance reasons? Because it was a deliberate decision to not support `NaN`s in the weights and I don't think this is going to change. Maybe I'm more sensitive to this than others, but I regularly deal with 10-100GB arrays. No it's important to make sure this stuff works for large arrays. However, using `xr.dot` already gives quite a performance penalty, which I am not super happy about. have you considered using these functions? [...] None of your suggested functions support `NaNs` so they won't work. I am all in to support more functions, but currently I am happy we got a weighted sum and mean into xarray after 5(!) years! Further libraries that support weighted operations: esmlab (xarray-based, supports NaN) statsmodels (numpy-based, does not support NaN)	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601709733	https://github.com/pydata/xarray/pull/2922#issuecomment-601709733	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTcwOTczMw==	seth-p 7441788	2020-03-20T13:47:39Z	2020-03-20T16:31:14Z	CONTRIBUTOR	@mathause, have you considered using these functions? - np.average() to calculate weighted `mean()`. - np.cov() to calculate weighted `cov()`, `var()`, and `std()`. - sp.stats.cumfreq() to calculate weighted `median()` (I haven't thought this through). - sp.spatial.distance.correlation() to calculate weighted `corrcoef()`. (Of course one could also calculate this from weighted `cov()` (see above), but first need to mask the two arrays simultaneously.) - sklearn.utils.extmath.weighted_mode() to calculate weighted `mode()`. - gmisclib.weighted_percentile.{wp,wtd_median}() to calculate weighted `quantile()` and `median()`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601708110	https://github.com/pydata/xarray/pull/2922#issuecomment-601708110	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTcwODExMA==	seth-p 7441788	2020-03-20T13:44:03Z	2020-03-20T13:52:06Z	CONTRIBUTOR	@mathause, ideally `dot()` would support `skipna`, so you could eliminate the `da = da.fillna(0.0)` and pass the `skipna` down the line. But alas it doesn't... `(da * weights).sum(dim=dim, skipna=skipna)` would likely make things worse, I think, as it would necessarily create a temporary array of sized at least `da`, no? Either way, this only addresses the `da = da.fillna(0.0)`, not the `mask = da.notnull()`. Also, perhaps the test `if weights.isnull().any()` in `Weighted.__init__()` should be optional? Maybe I'm more sensitive to this than others, but I regularly deal with 10-100GB arrays.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601699091	https://github.com/pydata/xarray/pull/2922#issuecomment-601699091	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTY5OTA5MQ==	seth-p 7441788	2020-03-20T13:25:21Z	2020-03-20T13:25:21Z	CONTRIBUTOR	@max-sixty, I wish I could, but I'm afraid that I cannot submit code due to employer limitations.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601514904	https://github.com/pydata/xarray/pull/2922#issuecomment-601514904	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTUxNDkwNA==	max-sixty 5635139	2020-03-20T04:01:34Z	2020-03-20T04:01:34Z	MEMBER	We do those sorts of operations fairly frequently, so it's not unique here. Generally users should expect to have available ~3x the memory of an array for most operations. @seth-p it's great you've taken an interest in the project! Is there any chance we could harness that into some contributions? 😄	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601496897	https://github.com/pydata/xarray/pull/2922#issuecomment-601496897	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTQ5Njg5Nw==	seth-p 7441788	2020-03-20T02:11:53Z	2020-03-20T02:12:24Z	CONTRIBUTOR	I realize this is a bit late, but I'm still concerned about memory usage, specifically in https://github.com/pydata/xarray/blob/master/xarray/core/weighted.py#L130 and https://github.com/pydata/xarray/blob/master/xarray/core/weighted.py#L143. If `da.sizes = {'dim_0': 100000, 'dim_1': 100000}`, the two lines above will cause `da.weighted(weights).mean('dim_0')` to create two simultaneous temporary 100000x100000 arrays, which could be problematic. I would have implemented this using `apply_ufunc`, so that one creates these temporary variables only on as small an array as absolutely necessary -- in this case just of size `sizes['dim_0'] = 100000`. (Much as I would like to, I'm afraid I'm not able to contribute code.) Of course this won't help in the case one is summing over all dimensions, but might as well minimize memory usage in some cases even if not in all.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601377953	https://github.com/pydata/xarray/pull/2922#issuecomment-601377953	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTM3Nzk1Mw==	max-sixty 5635139	2020-03-19T19:34:42Z	2020-03-19T19:34:42Z	MEMBER	422 was opened in June of 2015, amazing. 😂 @mathause props for the persistence...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601298407	https://github.com/pydata/xarray/pull/2922#issuecomment-601298407	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTI5ODQwNw==	jhamman 2443309	2020-03-19T16:58:57Z	2020-03-19T16:58:57Z	MEMBER	Big time!!!! Thanks @mathause! #422 was opened in June of 2015, amazing.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601283025	https://github.com/pydata/xarray/pull/2922#issuecomment-601283025	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTI4MzAyNQ==	max-sixty 5635139	2020-03-19T16:37:43Z	2020-03-19T16:37:43Z	MEMBER	Thanks @mathause !	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601214104	https://github.com/pydata/xarray/pull/2922#issuecomment-601214104	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTIxNDEwNA==	mathause 10194086	2020-03-19T14:35:25Z	2020-03-19T14:35:25Z	MEMBER	Great! Thanks for all the feedback and support!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601210885	https://github.com/pydata/xarray/pull/2922#issuecomment-601210885	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTIxMDg4NQ==	dcherian 2448579	2020-03-19T14:29:42Z	2020-03-19T14:29:42Z	MEMBER	This is going in. Thanks @mathause. This is a major contribution!	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
487130921	https://github.com/pydata/xarray/pull/2922#issuecomment-487130921	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDQ4NzEzMDkyMQ==	pep8speaks 24736507	2019-04-26T17:09:07Z	2020-03-18T01:42:06Z	NONE	Hello @mathause! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers: Comment last updated at 2020-03-18 01:42:05 UTC	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
595373665	https://github.com/pydata/xarray/pull/2922#issuecomment-595373665	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDU5NTM3MzY2NQ==	mathause 10194086	2020-03-05T18:18:22Z	2020-03-05T18:18:22Z	MEMBER	I updated this once more. Mostly moved the example to a notebook.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
562206026	https://github.com/pydata/xarray/pull/2922#issuecomment-562206026	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDU2MjIwNjAyNg==	mathause 10194086	2019-12-05T16:29:51Z	2019-12-05T16:29:51Z	MEMBER	This is now ready for a full review. I added tests for weighted reductions over several dimensions and docs.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
545512847	https://github.com/pydata/xarray/pull/2922#issuecomment-545512847	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDU0NTUxMjg0Nw==	mathause 10194086	2019-10-23T15:55:35Z	2019-10-23T15:55:35Z	MEMBER	I decided to replace all NaN in the weights with 0. Can we raise an error instead? It should be easy for the user to do weights.fillna(0) instead of relying on xarray's magical behaviour. I agree, requiring valid weights is a sensible choice. if weights sum to 0 it returns NaN (and not inf) Should we raise an error here? Im not sure... Assume I want to do a meridional mean and only have data over land, this would then raise an error, which is not what I want.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
545200082	https://github.com/pydata/xarray/pull/2922#issuecomment-545200082	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDU0NTIwMDA4Mg==	dcherian 2448579	2019-10-22T23:35:52Z	2019-10-22T23:35:52Z	MEMBER	I decided to replace all NaN in the weights with 0. Can we raise an error instead? It should be easy for the user to do `weights.fillna(0)` instead of relying on xarray's magical behaviour. if weights sum to 0 it returns NaN (and not inf) Should we raise an error here? The following returns NaN (could be 1) I think NaN is fine since that's the result of `1 + 0*np.nan`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
543358453	https://github.com/pydata/xarray/pull/2922#issuecomment-543358453	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDU0MzM1ODQ1Mw==	mathause 10194086	2019-10-17T20:56:32Z	2019-10-17T20:59:08Z	MEMBER	I finally made some time to work on this - altough I feel far from finished... added a `DatasetWeighted` class for this I pulled the functionality our of `DataArrayWeighted` class in to own functions taking `da` and `weights` as input the tests need more work implanted the functionality using `xr.dot` -> this makes the logic a bit more complicated I think the failure in `Linux py37-upstream-dev` is unrelated Questions: * does this implementation look reasonable to you? * `xr.dot` does not have a `axis` keyword is it fine if I leave it out in my functions? * flake8 fails because I use `@overload` for typing -> should I remove this? * Currently I have the functionality 3-times: once as `_weighted_sum`, once as `da.weighted.sum()` and once as `ds.weighted().sum`: how do I best test this?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
512243216	https://github.com/pydata/xarray/pull/2922#issuecomment-512243216	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDUxMjI0MzIxNg==	mathause 10194086	2019-07-17T12:59:16Z	2019-07-17T12:59:16Z	MEMBER	Thanks, I am still very interested to get this in. I don't think I'll manage before my holidays, though.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
511002355	https://github.com/pydata/xarray/pull/2922#issuecomment-511002355	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDUxMTAwMjM1NQ==	rabernat 1197350	2019-07-12T19:16:16Z	2019-07-12T19:16:16Z	MEMBER	Hi @mathause - We really appreciate your contribution. Sorry your PR has stalled! Do you think you can respond to @fujiisoup's review and add documentation? Then we can get this merged.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
488031173	https://github.com/pydata/xarray/pull/2922#issuecomment-488031173	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDQ4ODAzMTE3Mw==	mathause 10194086	2019-04-30T16:57:05Z	2019-04-30T16:57:05Z	MEMBER	I updated the PR * added a weighted `sum` method * tried to clean the tests Before I continue, it would be nice to get some feedback. Currently I only do very simple tests - is that enough? How could these be generalized without re-implementing the functions to obtain `expected`. Eventually it would be cool to be able to do `da.groupby(regions).weighted(weights).mean()` - do you see a possibility for this with the current implementation? As mentioned by @aaronspring, esmlab already implemented weighted statistic functions. Similarly, statsmodels for 1D data without handling of NaNs (docs / code). Thus it should be feasible to implement further statistics here (weighted `std`).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

22 rows where issue = 437765416 sorted by updated_at descending

422 was opened in June of 2015, amazing.

Comment last updated at 2020-03-18 01:42:05 UTC

Advanced export