home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 550957737

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3277#issuecomment-550957737 https://api.github.com/repos/pydata/xarray/issues/3277 550957737 MDEyOklzc3VlQ29tbWVudDU1MDk1NzczNw== 47371188 2019-11-07T07:25:50Z 2019-11-07T07:25:50Z NONE

Error still present in 0.14.0

I believe the bug occurs in dask_array_ops.py: rolling_window

My best guess at understanding the code: I believe there is an attempt to "pad" rolling windows to ensure the rolling windows doesn't miss data across chunk boundaries. I think the "padding" is supposed to be truncated later, but something is miscalculated and the final array ends up with the wrong chunking.

In the case I presented, the "chunking" happens along a different dimension to the "rolling" and padding is not necessary. Perhaps something goes haywire because the code was written to guard against rolling along a chunked dimension (and missing data across chunk boundaries)? Additionally, as the padding is not necessary in this case, there is a performance penalty that could be avoided?

A simple fix for my case is to not do any "padding" whenever the chunksize along the rolling dimension is equal to the arraysize along the rolling dimension.

e.g. in the function dask_array_ops.py: rolling_window

pad_size = max(start, end) + offset - depth[axis]

becomes

if a.shape[axis] == a.chunksize[axis]:
    pad_size = 0
else:
    pad_size = max(start, end) + offset - depth[axis]

This fixes the code for my usage case, perhaps someone could advise if I have understood the issue correctly?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  488547784
Powered by Datasette · Queries took 0.511ms · About: xarray-datasette