home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

23 rows where author_association = "CONTRIBUTOR" and issue = 638909879 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • pums974 22
  • chrisroat 1

issue 1

  • Implement interp for interpolating between chunks of data (dask) · 23 ✖

author_association 1

  • CONTRIBUTOR · 23 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
674585106 https://github.com/pydata/xarray/pull/4155#issuecomment-674585106 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY3NDU4NTEwNg== pums974 1005109 2020-08-16T22:20:37Z 2020-08-16T22:21:43Z CONTRIBUTOR

And I forgot to take into account that your interpolation only need 48² points of the input array, so the input array will be reduced at the start of the process (you can replace every 100 by 48 in my previous answers)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
674584614 https://github.com/pydata/xarray/pull/4155#issuecomment-674584614 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY3NDU4NDYxNA== pums974 1005109 2020-08-16T22:14:55Z 2020-08-16T22:16:05Z CONTRIBUTOR

I forgot to take into account that the interpolations are orthogonal So in sequential we are doing 2 interpolation first x then y In parallel we do the same: The fist interpolation will have 20 000 tasks, each task will have the totality of the input array, and compute an interpolation of 5 point of the output (x) producing an array of 5x100 per task or 100 000x100 full result as an intermediate array. The second interpolation will have 20 000² tasks each task will have a block of 5x100 point of the intermediate array and compute an interpolation on 5 point of the output (y) resulting in a 5² array per task and the 100 000² full result.

So plenty of room for overhead...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
674582930 https://github.com/pydata/xarray/pull/4155#issuecomment-674582930 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY3NDU4MjkzMA== pums974 1005109 2020-08-16T21:56:45Z 2020-08-16T22:05:21Z CONTRIBUTOR

In your case, each task (20 000²) will have the entire input (100²), and interpolate a few points (5²).

Maybe the overhead comes with duplicating the input array 20 000² times, maybe it comes with the fact that you are doing 20 000² small interpolation instead of 1 big interpolation

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
674579280 https://github.com/pydata/xarray/pull/4155#issuecomment-674579280 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY3NDU3OTI4MA== pums974 1005109 2020-08-16T21:18:36Z 2020-08-16T21:18:36Z CONTRIBUTOR

Do this answer your question?

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
674578943 https://github.com/pydata/xarray/pull/4155#issuecomment-674578943 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY3NDU3ODk0Mw== pums974 1005109 2020-08-16T21:15:42Z 2020-08-16T21:16:41Z CONTRIBUTOR

If the input array is chunked in the interpolated dimension, the chunks will be merged during the interpolation.

This may induce a large memory cost at some point, but I do not know how to avoid it...

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
674578524 https://github.com/pydata/xarray/pull/4155#issuecomment-674578524 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY3NDU3ODUyNA== pums974 1005109 2020-08-16T21:11:49Z 2020-08-16T21:11:49Z CONTRIBUTOR

@cyhsu I can answer this question.

For best performance you should chunk the input array on the non interpolated dimensions and chunk the destination. Aka :

``` datax = xr.DataArray(data=np.arange(0, 4), coords={"x": np.linspace(0, 1, 4)}, dims="x") datay = xr.DataArray(data=da.from_array(np.arange(0, 4), chunks=2), coords={"y": np.linspace(0, 1, 4)}, dims="y") data = datax * datay

x = xr.DataArray(data = da.from_array(np.linspace(0,1), chunks=2), dims='x')

res = data.interp(x=x) ```

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
672880182 https://github.com/pydata/xarray/pull/4155#issuecomment-672880182 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY3Mjg4MDE4Mg== pums974 1005109 2020-08-12T13:45:40Z 2020-08-12T13:45:40Z CONTRIBUTOR

You're welcome :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
667299944 https://github.com/pydata/xarray/pull/4155#issuecomment-667299944 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY2NzI5OTk0NA== pums974 1005109 2020-07-31T18:55:48Z 2020-07-31T18:55:48Z CONTRIBUTOR

Hi.

I agree, part of this work might belong in dask. But I don't know dask internals enough to go there. In this case, everything was already in place.

Moreover I do think that there is room for optimization. In particular, in this implementation, the work is distributed along chunks corresponding to destination. This means that one may have big intermediate array. For example interpolating one value in a chunked vector will load the full vector in memory (first localization aside). In my previous implementation (and uglier), the interpolation was done with the chunks of the starting array. This might be a better choice sometimes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
667255046 https://github.com/pydata/xarray/pull/4155#issuecomment-667255046 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY2NzI1NTA0Ng== chrisroat 1053153 2020-07-31T17:56:15Z 2020-07-31T17:56:15Z CONTRIBUTOR

Hi! This work is interesting to me, as I was implementing in dask an image processing algo which needs an intermediate 1-d linear interpolation step. This bottlenecks the calculation through a single node. Your work here on distributed interpolation is intriguing, and I'm wondering if it would be useful in my work and if it could possibly become part of dask itself.

Here is the particular function, which you'll note has a dask.delayed wrapper around np.interp.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
667216168 https://github.com/pydata/xarray/pull/4155#issuecomment-667216168 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY2NzIxNjE2OA== pums974 1005109 2020-07-31T16:33:08Z 2020-07-31T16:33:08Z CONTRIBUTOR

OK, I'm happy with the results now (better than my first submission of course).

I did not add so much tests since the result replace what was done before, thus the previous tests applies.

I'm going for some holidays so I won't work that much for the time being. But I'll be able to answer any questions.

Thanks for the reviewing and pushing me into doing a much better job.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
666565278 https://github.com/pydata/xarray/pull/4155#issuecomment-666565278 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY2NjU2NTI3OA== pums974 1005109 2020-07-30T17:57:51Z 2020-07-30T17:57:51Z CONTRIBUTOR

FYI, don't merge yet. I fixed a bug today, but did not push it. And there is some work to do on the testing side.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
665751218 https://github.com/pydata/xarray/pull/4155#issuecomment-665751218 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY2NTc1MTIxOA== pums974 1005109 2020-07-29T15:59:36Z 2020-07-29T15:59:36Z CONTRIBUTOR

Since I was on it, I extended the decomposition of orthogonal interpolation. If you want I can break this into two PR.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
665665786 https://github.com/pydata/xarray/pull/4155#issuecomment-665665786 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY2NTY2NTc4Ng== pums974 1005109 2020-07-29T13:30:50Z 2020-07-29T13:30:50Z CONTRIBUTOR

Guys, I got it. I managed to use da.blockwise which allows me to overcome all the previous limitations.

The result is much more simple, much more reliable.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
664245609 https://github.com/pydata/xarray/pull/4155#issuecomment-664245609 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY2NDI0NTYwOQ== pums974 1005109 2020-07-27T09:45:40Z 2020-07-27T09:45:40Z CONTRIBUTOR

While at it, I added the missing bit to make it work with cubic or quadratic method. I'm not touching the code anymore, waiting for review.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
663580990 https://github.com/pydata/xarray/pull/4155#issuecomment-663580990 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY2MzU4MDk5MA== pums974 1005109 2020-07-24T14:58:31Z 2020-07-24T15:00:12Z CONTRIBUTOR

@fujiisoup I managed to implement the support of unsorted interpolation.

Also, I reworked the tests, I now test for much more situations.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
661958330 https://github.com/pydata/xarray/pull/4155#issuecomment-661958330 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY2MTk1ODMzMA== pums974 1005109 2020-07-21T16:17:08Z 2020-07-21T16:17:08Z CONTRIBUTOR

Thanks, @pums974 for this update. I left some comments.

Can you add some tests for more edge cases? Something we may want to check would be

* scalar interpolation
* interpolation into an unsorted dimension (e.g., `da.interp(x=[0, 3, 2])`)

Your welcome, thanks for the feedback

  • scalar interpolation: you mean a test like test_interpolate_nd_scalar but between chunks ?
  • Unsorted interpolation: As I said, I did not looked into it, This need some work, presumably an argsort at the begining in order to interpolate in a sorted dimension and reorder the result into the requested order. I'm trying to implement it, but this seems a bit more challenging than I thought.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
661030314 https://github.com/pydata/xarray/pull/4155#issuecomment-661030314 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY2MTAzMDMxNA== pums974 1005109 2020-07-20T13:11:54Z 2020-07-20T13:11:54Z CONTRIBUTOR

@fujiisoup I managed to solve the issues you raised about AttributeError: 'memoryview' object has no attribute 'dtype' This was due to datetime index. Using an IndexVariable seems to solve it

Also I realize that for 1d interpolation cubic and quadratic method are allowed which may not give the same result with chunked data (or even crash if there is not enough data in the chunked direction). Now they are forbidden

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
651694736 https://github.com/pydata/xarray/pull/4155#issuecomment-651694736 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY1MTY5NDczNg== pums974 1005109 2020-06-30T10:02:41Z 2020-06-30T10:02:41Z CONTRIBUTOR

I mean, in this case you have to interpolate in another direction. You cannot consider having a 1d function.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
650078484 https://github.com/pydata/xarray/pull/4155#issuecomment-650078484 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY1MDA3ODQ4NA== pums974 1005109 2020-06-26T09:15:05Z 2020-06-30T09:38:47Z CONTRIBUTOR

Thanks, That's weird, I have no problem in mine... What are your versions of dask and numpy ?

As for implementing this in dask, you may be right, it probably belong there, But I am even less use to their code base, and have no clue where to put it.

And for unsorted destination, that's something I didn't think about. maybe we can add an argsort at the beggining.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
651682646 https://github.com/pydata/xarray/pull/4155#issuecomment-651682646 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY1MTY4MjY0Ng== pums974 1005109 2020-06-30T09:38:32Z 2020-06-30T09:38:32Z CONTRIBUTOR

ok, but what about python res = data.interp(y=0.5)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
651581831 https://github.com/pydata/xarray/pull/4155#issuecomment-651581831 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY1MTU4MTgzMQ== pums974 1005109 2020-06-30T06:47:51Z 2020-06-30T06:47:51Z CONTRIBUTOR

Hum, ok, but I don't see how it would work if all points are between chunks (see my second example)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
649009492 https://github.com/pydata/xarray/pull/4155#issuecomment-649009492 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY0OTAwOTQ5Mg== pums974 1005109 2020-06-24T19:05:11Z 2020-06-24T19:05:11Z CONTRIBUTOR

No problem, we are all very busy. But thanks for your message.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
644204355 https://github.com/pydata/xarray/pull/4155#issuecomment-644204355 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY0NDIwNDM1NQ== pums974 1005109 2020-06-15T15:27:55Z 2020-06-15T15:27:55Z CONTRIBUTOR

On my computer it passes pytest: ``` $> pytest . ======================= test session starts ================================= platform linux -- Python 3.8.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1 [...] ===== 3822 passed, 2710 skipped, 77 xfailed, 24 xpassed, 32 warnings in 48.25s ========

$> pip freeze appdirs==1.4.4 attrs==19.3.0 black==19.10b0 click==7.1.2 dask==2.18.1 flake8==3.8.3 isort==4.3.21 mccabe==0.6.1 more-itertools==8.4.0 numpy==1.18.5 packaging==20.4 pandas==1.0.4 pathspec==0.8.0 pluggy==0.13.1 py==1.8.2 pycodestyle==2.6.0 pyflakes==2.2.0 pyparsing==2.4.7 pytest==5.4.3 python-dateutil==2.8.1 pytz==2020.1 PyYAML==5.3.1 regex==2020.6.8 scipy==1.4.1 six==1.15.0 toml==0.10.1 toolz==0.10.0 typed-ast==1.4.1 wcwidth==0.2.4 -e git+git@github.com:pums974/xarray.git@c47a1d5d8fd7ca401a0dddea67574af00c4d8e3b#egg=xarray ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.4ms · About: xarray-datasette