github: issue_comments: 9 rows where author_association = "MEMBER", issue = 437765416 and user = 10194086 sorted by updated

9 rows where author_association = "MEMBER", issue = 437765416 and user = 10194086 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
601612380	https://github.com/pydata/xarray/pull/2922#issuecomment-601612380	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTYxMjM4MA==	mathause 10194086	2020-03-20T09:45:23Z	2020-10-27T14:47:22Z	MEMBER	tldr: if someone knows how to do memory profiling with reasonable effort this can still be changed It's certainly not too late to change the "backend" of the weighting functions. I once tried to profile the memory usage but I gave up at some point (I think I would have needed to annotate a ton of functions, also in numpy). @fujiisoup suggested using `xr.dot(a, b)` (instead of `(a * b).sum()`) to ameliorate part of the memory footprint. ~Which is done, however, this comes at a performance penalty. So an alternative code path might actually be beneficial.~ (edit: I now think `xr.dot` is actually faster, except for very small arrays). Also `mask` is an array of dtype bool, which should help. It think it should not be very difficult to write something that can be passed to `apply_ufinc`, probably similar to: https://github.com/pydata/xarray/blob/e8a284f341645a63a4d83676a6b268394c721bbc/xarray/tests/test_weighted.py#L161 So there would be three possibilities: (1) the current implementation (using `xr.dot(a, b)`) (2) something similar to `expected_weighted` (using `(a * b).sum()`) (3) `xr.apply_ufunc(a, b, expected_weighted, ...)`. I assume (2) is fastest with the largest memory footprint, but I cannot tell about (1) and (3).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601824129	https://github.com/pydata/xarray/pull/2922#issuecomment-601824129	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTgyNDEyOQ==	mathause 10194086	2020-03-20T17:31:15Z	2020-03-20T17:31:15Z	MEMBER	There is some stuff I can do to reduce the memory footprint if `skipna=False` or `not da.isnull().any()`. Also, the functions should support `dask` arrays out of the box. ideally `dot()` would support `skipna`, so you could eliminate the `da = da.fillna(0.0)` and pass the `skipna` down the line. But alas it doesn't... Yes, this would be nice. `xr.dot` uses `np.einsum` which is quite a beast that I don't entirely see through. I don't expect it to support `NaN`s any time soon. What could be done, though is to only do `da = da.fillna(0.0)` if `da` contains `NaN`s. `(da * weights).sum(dim=dim, skipna=skipna)` would likely make things worse, I think, as it would necessarily create a temporary array of sized at least `da`, no? I assume so. I don't know what kind of temporary variables `np.einsum` creates. Also `np.einsum` is wrapped in `xr.apply_ufunc` so all kinds of magic is going on. Either way, this only addresses the `da = da.fillna(0.0)`, not the `mask = da.notnull()`. Again this could be avoided if `skipna=False` or if (and only if) there are no `NaN`s in da. Also, perhaps the test `if weights.isnull().any()` in `Weighted.__init__()` should be optional? Do you want to leave it away for performance reasons? Because it was a deliberate decision to not support `NaN`s in the weights and I don't think this is going to change. Maybe I'm more sensitive to this than others, but I regularly deal with 10-100GB arrays. No it's important to make sure this stuff works for large arrays. However, using `xr.dot` already gives quite a performance penalty, which I am not super happy about. have you considered using these functions? [...] None of your suggested functions support `NaNs` so they won't work. I am all in to support more functions, but currently I am happy we got a weighted sum and mean into xarray after 5(!) years! Further libraries that support weighted operations: esmlab (xarray-based, supports NaN) statsmodels (numpy-based, does not support NaN)	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
601214104	https://github.com/pydata/xarray/pull/2922#issuecomment-601214104	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDYwMTIxNDEwNA==	mathause 10194086	2020-03-19T14:35:25Z	2020-03-19T14:35:25Z	MEMBER	Great! Thanks for all the feedback and support!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
595373665	https://github.com/pydata/xarray/pull/2922#issuecomment-595373665	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDU5NTM3MzY2NQ==	mathause 10194086	2020-03-05T18:18:22Z	2020-03-05T18:18:22Z	MEMBER	I updated this once more. Mostly moved the example to a notebook.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
562206026	https://github.com/pydata/xarray/pull/2922#issuecomment-562206026	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDU2MjIwNjAyNg==	mathause 10194086	2019-12-05T16:29:51Z	2019-12-05T16:29:51Z	MEMBER	This is now ready for a full review. I added tests for weighted reductions over several dimensions and docs.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
545512847	https://github.com/pydata/xarray/pull/2922#issuecomment-545512847	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDU0NTUxMjg0Nw==	mathause 10194086	2019-10-23T15:55:35Z	2019-10-23T15:55:35Z	MEMBER	I decided to replace all NaN in the weights with 0. Can we raise an error instead? It should be easy for the user to do weights.fillna(0) instead of relying on xarray's magical behaviour. I agree, requiring valid weights is a sensible choice. if weights sum to 0 it returns NaN (and not inf) Should we raise an error here? Im not sure... Assume I want to do a meridional mean and only have data over land, this would then raise an error, which is not what I want.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
543358453	https://github.com/pydata/xarray/pull/2922#issuecomment-543358453	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDU0MzM1ODQ1Mw==	mathause 10194086	2019-10-17T20:56:32Z	2019-10-17T20:59:08Z	MEMBER	I finally made some time to work on this - altough I feel far from finished... added a `DatasetWeighted` class for this I pulled the functionality our of `DataArrayWeighted` class in to own functions taking `da` and `weights` as input the tests need more work implanted the functionality using `xr.dot` -> this makes the logic a bit more complicated I think the failure in `Linux py37-upstream-dev` is unrelated Questions: * does this implementation look reasonable to you? * `xr.dot` does not have a `axis` keyword is it fine if I leave it out in my functions? * flake8 fails because I use `@overload` for typing -> should I remove this? * Currently I have the functionality 3-times: once as `_weighted_sum`, once as `da.weighted.sum()` and once as `ds.weighted().sum`: how do I best test this?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
512243216	https://github.com/pydata/xarray/pull/2922#issuecomment-512243216	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDUxMjI0MzIxNg==	mathause 10194086	2019-07-17T12:59:16Z	2019-07-17T12:59:16Z	MEMBER	Thanks, I am still very interested to get this in. I don't think I'll manage before my holidays, though.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416
488031173	https://github.com/pydata/xarray/pull/2922#issuecomment-488031173	https://api.github.com/repos/pydata/xarray/issues/2922	MDEyOklzc3VlQ29tbWVudDQ4ODAzMTE3Mw==	mathause 10194086	2019-04-30T16:57:05Z	2019-04-30T16:57:05Z	MEMBER	I updated the PR * added a weighted `sum` method * tried to clean the tests Before I continue, it would be nice to get some feedback. Currently I only do very simple tests - is that enough? How could these be generalized without re-implementing the functions to obtain `expected`. Eventually it would be cool to be able to do `da.groupby(regions).weighted(weights).mean()` - do you see a possibility for this with the current implementation? As mentioned by @aaronspring, esmlab already implemented weighted statistic functions. Similarly, statsmodels for 1D data without handling of NaNs (docs / code). Thus it should be feasible to implement further statistics here (weighted `std`).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/weighted 437765416

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);