home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 1307112340 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • dcherian 4
  • slevang 2
  • pums974 1
  • gjoseph92 1

author_association 3

  • MEMBER 4
  • CONTRIBUTOR 3
  • NONE 1

issue 1

  • `interp` performance with chunked dimensions · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1317516314 https://github.com/pydata/xarray/issues/6799#issuecomment-1317516314 https://api.github.com/repos/pydata/xarray/issues/6799 IC_kwDOAMm_X85Oh7Qa dcherian 2448579 2022-11-16T18:55:00Z 2022-11-16T18:55:00Z MEMBER

Linking the dask issue: https://github.com/dask/dask/issues/6474

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `interp` performance with chunked dimensions 1307112340
1317358777 https://github.com/pydata/xarray/issues/6799#issuecomment-1317358777 https://api.github.com/repos/pydata/xarray/issues/6799 IC_kwDOAMm_X85OhUy5 dcherian 2448579 2022-11-16T17:04:23Z 2022-11-16T17:04:23Z MEMBER

The challenge is you could be interping to an unordered set of locations.

So perhaps we can sort the input locations, do the interp with map_overlap, then argsort the result back to expected order.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `interp` performance with chunked dimensions 1307112340
1317352980 https://github.com/pydata/xarray/issues/6799#issuecomment-1317352980 https://api.github.com/repos/pydata/xarray/issues/6799 IC_kwDOAMm_X85OhTYU gjoseph92 3309802 2022-11-16T17:00:04Z 2022-11-16T17:00:04Z NONE

The current code also has the unfortunate side-effect of merging all chunks too

Don't really know what I'm talking about here, but it looks to me like the current dask-interpolation routine uses blockwise. That is, it's trying to simply map a function over each chunk in the array. To get the chunks into a structure where this is correct to do, you have to first merge all the chunks along the interpolation axis.

I would have expected interpolation to use map_overlap. You'd add some padding to each chunk, map the interpolation over each chunk (without combining them), then trim off the extra. By using overlap, you don't need to combine all the chunks into one big array first, so the operation can actually be parallel.

FYI, fixing this would probably be a big deal to geospatial people—then you could do array reprojection without GDAL! Unfortunately not something I have time to work on right now, but perhaps someone else would be interested?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `interp` performance with chunked dimensions 1307112340
1194294204 https://github.com/pydata/xarray/issues/6799#issuecomment-1194294204 https://api.github.com/repos/pydata/xarray/issues/6799 IC_kwDOAMm_X85HL3u8 dcherian 2448579 2022-07-25T16:07:09Z 2022-07-25T16:07:09Z MEMBER

The current code also has the unfortunate side-effect of merging all chunks too.

I think we should instead think of generating a dask array of weights and then using xr.dot

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `interp` performance with chunked dimensions 1307112340
1188348238 https://github.com/pydata/xarray/issues/6799#issuecomment-1188348238 https://api.github.com/repos/pydata/xarray/issues/6799 IC_kwDOAMm_X85G1MFO slevang 39069044 2022-07-18T21:43:11Z 2022-07-18T21:43:11Z CONTRIBUTOR

The chunking structure on disk is pretty instrumental to my application, which requires fast retrievals of full slices in the time dimension. The loop option in my first post only takes about 10 seconds with ni=1000 which is fine for my use case, so I'll probably go with that for now. It would be interesting to dig deeper though and see if there is a way to handle this better in the interp logic.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `interp` performance with chunked dimensions 1307112340
1187755023 https://github.com/pydata/xarray/issues/6799#issuecomment-1187755023 https://api.github.com/repos/pydata/xarray/issues/6799 IC_kwDOAMm_X85Gy7QP pums974 1005109 2022-07-18T16:55:42Z 2022-07-18T16:55:42Z CONTRIBUTOR

You are right about the behavior of the code. I don't see any way to enhance that in the general case.

Maybe, in your case, rechunking before interpolating might be a good idea

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `interp` performance with chunked dimensions 1307112340
1187729258 https://github.com/pydata/xarray/issues/6799#issuecomment-1187729258 https://api.github.com/repos/pydata/xarray/issues/6799 IC_kwDOAMm_X85Gy09q slevang 39069044 2022-07-18T16:44:40Z 2022-07-18T16:44:40Z CONTRIBUTOR

Interpolating on chunked dimensions doesn't work at all prior to #4155. The changes in #4069 are also relevant.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `interp` performance with chunked dimensions 1307112340
1187699270 https://github.com/pydata/xarray/issues/6799#issuecomment-1187699270 https://api.github.com/repos/pydata/xarray/issues/6799 IC_kwDOAMm_X85GytpG dcherian 2448579 2022-07-18T16:18:09Z 2022-07-18T16:18:21Z MEMBER

Given the performance behavior I'm guessing we may be doing sequntial interpolation for the dimensions, basically an interp1d call for all the xx points and from there another to the yy points, which for even a small number of points would require nearly all chunks to be loaded in.

Yeah I think this is right.

You could check if it was better before https://github.com/pydata/xarray/pull/4155 (if it worked that is)

cc @pums974 @Illviljan

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `interp` performance with chunked dimensions 1307112340

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.94ms · About: xarray-datasette