home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 437765416 and user = 7441788 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • seth-p · 5 ✖

issue 1

  • Feature/weighted · 5 ✖

author_association 1

  • CONTRIBUTOR 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
601885539 https://github.com/pydata/xarray/pull/2922#issuecomment-601885539 https://api.github.com/repos/pydata/xarray/issues/2922 MDEyOklzc3VlQ29tbWVudDYwMTg4NTUzOQ== seth-p 7441788 2020-03-20T19:57:54Z 2020-03-20T20:00:20Z CONTRIBUTOR

All good points:

What could be done, though is to only do da = da.fillna(0.0) if da contains NaNs.

Good idea, though I don't know what the performance hit would be of the extra check (in the case that da does contain NaNs, so the check is for naught).

I assume so. I don't know what kind of temporary variables np.einsum creates. Also np.einsum is wrapped in xr.apply_ufunc so all kinds of magic is going on.

Well, (da * weights) will be at least as large as da. I'm not certain, but I don't think np.einsum creates huge temporary arrays.

Do you want to leave it away for performance reasons? Because it was a deliberate decision to not support NaNs in the weights and I don't think this is going to change.

Yes. You can continue not supporting NaNs in the weights, yet not explicitly check that there are no NaNs (optionally, if the caller assures you that there are no NaNs).

None of your suggested functions support NaNs so they won't work.

Correct. These have nothing to do with the NaNs issue.

For profiling memory usage, I use psutil.Process(os.getpid()).memory_info().rss for current usage and resource.getusage(resource.RUSAGE_SElF).ru_maxrss for peak usage (on linux).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/weighted 437765416
601709733 https://github.com/pydata/xarray/pull/2922#issuecomment-601709733 https://api.github.com/repos/pydata/xarray/issues/2922 MDEyOklzc3VlQ29tbWVudDYwMTcwOTczMw== seth-p 7441788 2020-03-20T13:47:39Z 2020-03-20T16:31:14Z CONTRIBUTOR

@mathause, have you considered using these functions? - np.average() to calculate weighted mean(). - np.cov() to calculate weighted cov(), var(), and std(). - sp.stats.cumfreq() to calculate weighted median() (I haven't thought this through). - sp.spatial.distance.correlation() to calculate weighted corrcoef(). (Of course one could also calculate this from weighted cov() (see above), but first need to mask the two arrays simultaneously.) - sklearn.utils.extmath.weighted_mode() to calculate weighted mode(). - gmisclib.weighted_percentile.{wp,wtd_median}() to calculate weighted quantile() and median().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/weighted 437765416
601708110 https://github.com/pydata/xarray/pull/2922#issuecomment-601708110 https://api.github.com/repos/pydata/xarray/issues/2922 MDEyOklzc3VlQ29tbWVudDYwMTcwODExMA== seth-p 7441788 2020-03-20T13:44:03Z 2020-03-20T13:52:06Z CONTRIBUTOR

@mathause, ideally dot() would support skipna, so you could eliminate the da = da.fillna(0.0) and pass the skipna down the line. But alas it doesn't...

(da * weights).sum(dim=dim, skipna=skipna) would likely make things worse, I think, as it would necessarily create a temporary array of sized at least da, no?

Either way, this only addresses the da = da.fillna(0.0), not the mask = da.notnull().

Also, perhaps the test if weights.isnull().any() in Weighted.__init__() should be optional?

Maybe I'm more sensitive to this than others, but I regularly deal with 10-100GB arrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/weighted 437765416
601699091 https://github.com/pydata/xarray/pull/2922#issuecomment-601699091 https://api.github.com/repos/pydata/xarray/issues/2922 MDEyOklzc3VlQ29tbWVudDYwMTY5OTA5MQ== seth-p 7441788 2020-03-20T13:25:21Z 2020-03-20T13:25:21Z CONTRIBUTOR

@max-sixty, I wish I could, but I'm afraid that I cannot submit code due to employer limitations.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/weighted 437765416
601496897 https://github.com/pydata/xarray/pull/2922#issuecomment-601496897 https://api.github.com/repos/pydata/xarray/issues/2922 MDEyOklzc3VlQ29tbWVudDYwMTQ5Njg5Nw== seth-p 7441788 2020-03-20T02:11:53Z 2020-03-20T02:12:24Z CONTRIBUTOR

I realize this is a bit late, but I'm still concerned about memory usage, specifically in https://github.com/pydata/xarray/blob/master/xarray/core/weighted.py#L130 and https://github.com/pydata/xarray/blob/master/xarray/core/weighted.py#L143. If da.sizes = {'dim_0': 100000, 'dim_1': 100000}, the two lines above will cause da.weighted(weights).mean('dim_0') to create two simultaneous temporary 100000x100000 arrays, which could be problematic.

I would have implemented this using apply_ufunc, so that one creates these temporary variables only on as small an array as absolutely necessary -- in this case just of size sizes['dim_0'] = 100000. (Much as I would like to, I'm afraid I'm not able to contribute code.) Of course this won't help in the case one is summing over all dimensions, but might as well minimize memory usage in some cases even if not in all.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/weighted 437765416

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.765ms · About: xarray-datasette