home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where issue = 1221848774 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • headtr1ck 8
  • max-sixty 3
  • dcherian 2

author_association 2

  • COLLABORATOR 8
  • MEMBER 5

issue 1

  • polyval: Use Horner's algorithm + support chunked inputs · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1118973967 https://github.com/pydata/xarray/pull/6548#issuecomment-1118973967 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85CsjAP max-sixty 5635139 2022-05-05T19:33:53Z 2022-05-05T19:33:53Z MEMBER

Thanks @headtr1ck !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774
1118960785 https://github.com/pydata/xarray/pull/6548#issuecomment-1118960785 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85CsfyR dcherian 2448579 2022-05-05T19:17:24Z 2022-05-05T19:17:24Z MEMBER

Forcing the user to be explicit reduces bugs and user support requests :) so we like to do that.

Thanks again @headtr1ck this is a great PR!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774
1117620357 https://github.com/pydata/xarray/pull/6548#issuecomment-1117620357 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85CnYiF headtr1ck 43316012 2022-05-04T17:33:07Z 2022-05-04T17:33:37Z COLLABORATOR

Personally I would allow coeffs without explicit index since I am a lazy person and would like to do coeffs = xr.DataArray([1,2], dims="degree"). But I guess with the new indexing system you want to encourage people to use them.

But I am happy with this code and look forward to use it in my projects :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774
1115830683 https://github.com/pydata/xarray/pull/6548#issuecomment-1115830683 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85Cgjmb headtr1ck 43316012 2022-05-03T07:57:46Z 2022-05-03T08:06:29Z COLLABORATOR

One minor open point: what to do with a non-integer "degree" index? Float type could be cast to integer (thats what is happening now). But (nonsense) datetime etc. should raise an error?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774
1115117344 https://github.com/pydata/xarray/pull/6548#issuecomment-1115117344 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85Cd1cg dcherian 2448579 2022-05-02T16:52:59Z 2022-05-02T16:52:59Z MEMBER

Nice this looks like an improvement for everything other than dask arrays with only 100 elements, which is not a good use-case for dask.

I was slightly concerned that the recrusive algorithm wouldn't work well with dask but it does seem to work better.

``` python def other_polyval(coord, coeffs, degree_dim="degree"): x = coord.data

deg_coord = coeffs[degree_dim]
N = int(deg_coord.max()) + 1

lhs = xr.DataArray(
    np.stack([x ** (N - 1 - i) for i in range(N)], axis=1),
    dims=(coord.name, degree_dim),
    coords={
        coord.name: coord.data,
        degree_dim: np.arange(deg_coord.max() + 1)[::-1],
    },
)
return xr.dot(lhs, coeffs, dims=degree_dim)

coeffs = xr.DataArray(np.random.randn(2), dims="degree") da = xr.DataArray(dask.array.random.random((10**6), chunks=(10000)), dims=("x")) print(len(da.data.dask)) print(len(xr.polyval(da, coeffs).data.dask)) print(len(other_polyval(da, coeffs).data.dask)) 100 502 1005 ```

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774
1114539954 https://github.com/pydata/xarray/pull/6548#issuecomment-1114539954 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85Cboey headtr1ck 43316012 2022-05-02T06:30:21Z 2022-05-02T09:50:49Z COLLABORATOR

Edit: nvmd, was only confusing output when the benchmark was failing. Now the benchmark looks good :)

First time working with asv... It seems that module level variables affect all other peakmem tests (i guess memory usage of the phyton process is measured,).

We should refactor all dataarrays into the setup functions, otherwise O(n) memory algos will show wrong numbers and adding new tests will show regression on other tests.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774
1114366068 https://github.com/pydata/xarray/pull/6548#issuecomment-1114366068 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85Ca-B0 max-sixty 5635139 2022-05-01T23:24:56Z 2022-05-01T23:24:56Z MEMBER

Benchmark did not succeed since the inputs are not compatible with the old algorith... Do we change it such that it is compatible?

If you're confident this is faster and it's not a trivial amount of work to adjust them, I would leave it...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774
1114297460 https://github.com/pydata/xarray/pull/6548#issuecomment-1114297460 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85CatR0 headtr1ck 43316012 2022-05-01T18:00:07Z 2022-05-01T18:00:07Z COLLABORATOR

Benchmark did not succeed since the inputs are not compatible with the old algorith... Do we change it such that it is compatible?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774
1114002143 https://github.com/pydata/xarray/pull/6548#issuecomment-1114002143 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85CZlLf headtr1ck 43316012 2022-04-30T14:57:58Z 2022-05-01T11:50:12Z COLLABORATOR

Several open points still:

  1. [x] Unittests for datetime values are failing, I might need some help with that since I have no idea what this means for polynomials.
  2. [x] Algorithm should work also with Datasets (any combination of DataArray and Dataset for coord and coeffs inputs). Still needs to be checked and tested (How does one define typing for such cases? I.e. DataArray + Dataset -> Dataset but DataArray + DataArray -> DataArray?)
  3. [x] It uses Horners method instead of Vandermonde matrix, should be faster and consume less memory (unless the overhead of sorting index, isel etc. is too large). Maybe some performance comparisons should be done.
  4. [ ] Instead of coord the input should simply be called x or similar, however this would break backwards compatibility, so maybe we just leave it.
  5. [x] I had to add a copy(deep=True) since the broadcast returned a read-only DataArray. Any better ideas?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774
1114212076 https://github.com/pydata/xarray/pull/6548#issuecomment-1114212076 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85CaYbs headtr1ck 43316012 2022-05-01T11:41:54Z 2022-05-01T11:41:54Z COLLABORATOR

I added a rough support for datetime values. Someone with more knowledge of handling them should take a look, the code seems too complicated and I am sure there is a more clever solution (I could not use get_clean_interp_index since it is not an index anymore).

I agree I'm not sure whether we need to support them.

I think keeping support is nice, since they are a commonly occuring coordinates and we do not want to break anything if possible.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774
1114196391 https://github.com/pydata/xarray/pull/6548#issuecomment-1114196391 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85CaUmn headtr1ck 43316012 2022-05-01T10:27:52Z 2022-05-01T10:27:52Z COLLABORATOR

Some performance comparison: With 5th order polynomial and 10 x-values: old: 1.05 ms ± 15.8 µs per loop new: 1.41 ms ± 11.6 µs per loop

With 5th order polynomial and 10000 x-values: old: 1.46 ms ± 10.5 µs per loop new: 1.41 ms ± 14.5 µs per loop

With 5th order polynomial and 1mio x-values: old: 65.1 ms ± 332 µs per loop new: 6.99 ms ± 168 µs per loop

As expected for small arrays the new method creates some overhead, but for larger arrays the speedup is quite nice. Also, it uses in-place operations with much less memory usage.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774
1114054278 https://github.com/pydata/xarray/pull/6548#issuecomment-1114054278 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85CZx6G max-sixty 5635139 2022-04-30T21:07:10Z 2022-04-30T21:07:10Z MEMBER

This looks like excellent code @headtr1ck , especially so for a first PR. Welcome!

so I used a "hack" of adding an 0-valued DataArray/Dataset.

I think it's fine to do the hack and reference that issue; we can clean it up then (though if others have ideas then great).

I had to add a copy(deep=True) since the broadcast returned a read-only DataArray. Any better ideas?

IIRC we generally leave this up to the caller (and generally discourage mutation).


Someone who knows better should review the algo & opine on the datetime issue; I agree I'm not sure whether we need to support them.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774
1114028796 https://github.com/pydata/xarray/pull/6548#issuecomment-1114028796 https://api.github.com/repos/pydata/xarray/issues/6548 IC_kwDOAMm_X85CZrr8 headtr1ck 43316012 2022-04-30T18:01:10Z 2022-04-30T18:01:10Z COLLABORATOR

I noticed that broadcasting Datasets behaves weird, see https://github.com/pydata/xarray/issues/6549, so I used a "hack" of adding an 0-valued DataArray/Dataset. Anyone got a better idea?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  polyval: Use Horner's algorithm + support chunked inputs 1221848774

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 85.87ms · About: xarray-datasette