home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where author_association = "CONTRIBUTOR" and user = 488992 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 4

  • Weighted quantile 4
  • Cache files for different CachingFileManager objects separately 2
  • Add var and std to weighted computations 2
  • to_netcdf() doesn't work with multiprocessing scheduler 1

user 1

  • cjauvin · 9 ✖

author_association 1

  • CONTRIBUTOR · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1043334868 https://github.com/pydata/xarray/pull/6059#issuecomment-1043334868 https://api.github.com/repos/pydata/xarray/issues/6059 IC_kwDOAMm_X84-MAbU cjauvin 488992 2022-02-17T19:27:23Z 2022-02-17T19:27:23Z CONTRIBUTOR

I have added a test to verify that using equal weights with the different interpolation methods that this PR supports would work (i.e. would yield the same results as np.quantile, with the corresponding methods). It is skipped however because the method argument is not currently exposed in the API (it would be in future work, ideally).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Weighted quantile 1076265104
1011136209 https://github.com/pydata/xarray/pull/6059#issuecomment-1011136209 https://api.github.com/repos/pydata/xarray/issues/6059 IC_kwDOAMm_X848RLbR cjauvin 488992 2022-01-12T15:03:31Z 2022-01-12T15:03:31Z CONTRIBUTOR

@huard's latest commit modifies the algorithm so that it uses Kish's effective sample size, as described in the blog where the algorithm comes from: https://aakinshin.net/posts/kish-ess-weighted-quantiles/, which seems to solve the problem mentioned by @mathause.

Also he adds support for the interpolation types 4 to 9 (those that share a common way of computing Qp, as described here: https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Weighted quantile 1076265104
1005179153 https://github.com/pydata/xarray/pull/5870#issuecomment-1005179153 https://api.github.com/repos/pydata/xarray/issues/5870 IC_kwDOAMm_X8476dER cjauvin 488992 2022-01-04T21:20:58Z 2022-01-04T21:20:58Z CONTRIBUTOR

@dgilford I have worked on it: https://github.com/pydata/xarray/pull/6059, but unfortunately the PR is currently stuck in limbo, for technical reasons.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add var and std to weighted computations 1027640127
992713850 https://github.com/pydata/xarray/pull/6059#issuecomment-992713850 https://api.github.com/repos/pydata/xarray/issues/6059 IC_kwDOAMm_X847K5x6 cjauvin 488992 2021-12-13T17:39:03Z 2021-12-13T17:39:03Z CONTRIBUTOR

@mathause About this:

I did some tries and got an unexpected result:

```python data = xr.DataArray([0, 1, 2, 3]) weights = xr.DataArray([1, 0, 1, 0]) data.weighted(weights).quantile([0.75])

np.quantile([0, 2], 0.75) ```

Can you double-check? Or do I misunderstand something?

My latest commit should fix (and test) this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Weighted quantile 1076265104
991714854 https://github.com/pydata/xarray/pull/6059#issuecomment-991714854 https://api.github.com/repos/pydata/xarray/issues/6059 IC_kwDOAMm_X847HF4m cjauvin 488992 2021-12-11T17:08:55Z 2021-12-11T17:08:55Z CONTRIBUTOR

@mathause Thanks for the many excellent suggestions! After having removed the for loop the way you suggested, I tried to address this:

The algorithm is quite clever but it multiplies all elements (except 2) with 0 - this could maybe be sped up by only using the relevant elements.

At first I thought that something like this could work:

python w = np.diff(v) nz = np.nonzero(w) d2 = np.tile(data, (n, 1)) r = w[nz] * d2[nz] r = r[::2] + r[1::2]

The problem however is that it turns out that w's rows sometimes have one element only, instead of two (which is when an h coincides exactly with a weight value, instead of lying between two). Given that difficulty, my impression is that it's not really solvable, or at least not in a way that would result in a more efficient version.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Weighted quantile 1076265104
952039300 https://github.com/pydata/xarray/pull/5870#issuecomment-952039300 https://api.github.com/repos/pydata/xarray/issues/5870 IC_kwDOAMm_X844vveE cjauvin 488992 2021-10-26T15:11:30Z 2021-10-26T15:11:30Z CONTRIBUTOR

Thanks for the feedback @mathause! If you are ok with postponing the addition of an extra ddof param for a later PR, as you have suggested, then this PR is indeed complete from my perspective. I will try to propose an implementation for weighted quantiles soon.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add var and std to weighted computations 1027640127
927141507 https://github.com/pydata/xarray/issues/3781#issuecomment-927141507 https://api.github.com/repos/pydata/xarray/issues/3781 IC_kwDOAMm_X843Qw6D cjauvin 488992 2021-09-25T16:01:02Z 2021-09-25T16:02:41Z CONTRIBUTOR

I'm currently studying this problem in depth and I noticed that while the threaded scheduler uses a lock that is defined in function of the file name (as the key):

https://github.com/pydata/xarray/blob/8d23032ecf20545cd320cfb552d8febef73cd69c/xarray/backends/locks.py#L24-L32

the process-based scheduler throws away the key:

https://github.com/pydata/xarray/blob/8d23032ecf20545cd320cfb552d8febef73cd69c/xarray/backends/locks.py#L35-L39

I'm not sure yet what are the consequences and logical interpretation of that, but I would like to reraise @bcbnz's question above: should this scenario simply raise a NotImplemented error because it cannot be supported?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() doesn't work with multiprocessing scheduler 567678992
775489828 https://github.com/pydata/xarray/pull/4879#issuecomment-775489828 https://api.github.com/repos/pydata/xarray/issues/4879 MDEyOklzc3VlQ29tbWVudDc3NTQ4OTgyOA== cjauvin 488992 2021-02-08T21:56:21Z 2021-02-08T21:56:21Z CONTRIBUTOR

As my colleague @huard suggested, I have written an additional test which demonstrates the problem (essentially the same idea I proposed in my initial issue):

https://github.com/pydata/xarray/compare/master...cjauvin:add-netcdf-refresh-test

As I explained in the issue I have a potential fix for the problem:

https://github.com/pydata/xarray/compare/master...cjauvin:netcdf-caching-bug

but the problem is that it feels a bit weird to have to that, so I suspect that there's a better way to solve it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cache files for different CachingFileManager objects separately 803068773
774781361 https://github.com/pydata/xarray/pull/4879#issuecomment-774781361 https://api.github.com/repos/pydata/xarray/issues/4879 MDEyOklzc3VlQ29tbWVudDc3NDc4MTM2MQ== cjauvin 488992 2021-02-07T22:40:44Z 2021-02-07T22:42:11Z CONTRIBUTOR

Thank you for the feedback! I quickly tested your suggested fix against the script I refered to in my original issue, and it's still behaving the same if I'm not mistaken. I looked very quickly so perhaps I'm wrong, but what I seem to understand is that your fix is similar to an idea my colleague @huard had, which was to make the cached item more granular by adding a call to Path(..).stat() in the cache key tuple (the idea being that if the file has changed on disk between the two open calls, this will detect it). It doesn't work because (I think) it doesn't change the fact that the underlying netcdf file is never explicitly close, that is, this line is never called:

https://github.com/pydata/xarray/blob/a5f53e203c52a7605d5db799864046471115d04f/xarray/backends/file_manager.py#L222

Sorry in advance if something in my analysis is wrong, which is very likely!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cache files for different CachingFileManager objects separately 803068773

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1276.025ms · About: xarray-datasette