home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "NONE", issue = 488547784 and user = 47371188 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • p-d-moore · 4 ✖

issue 1

  • xarray, chunking and rolling operation adds chunking along new dimension (previously worked) · 4 ✖

author_association 1

  • NONE · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
550957737 https://github.com/pydata/xarray/issues/3277#issuecomment-550957737 https://api.github.com/repos/pydata/xarray/issues/3277 MDEyOklzc3VlQ29tbWVudDU1MDk1NzczNw== p-d-moore 47371188 2019-11-07T07:25:50Z 2019-11-07T07:25:50Z NONE

Error still present in 0.14.0

I believe the bug occurs in dask_array_ops.py: rolling_window

My best guess at understanding the code: I believe there is an attempt to "pad" rolling windows to ensure the rolling windows doesn't miss data across chunk boundaries. I think the "padding" is supposed to be truncated later, but something is miscalculated and the final array ends up with the wrong chunking.

In the case I presented, the "chunking" happens along a different dimension to the "rolling" and padding is not necessary. Perhaps something goes haywire because the code was written to guard against rolling along a chunked dimension (and missing data across chunk boundaries)? Additionally, as the padding is not necessary in this case, there is a performance penalty that could be avoided?

A simple fix for my case is to not do any "padding" whenever the chunksize along the rolling dimension is equal to the arraysize along the rolling dimension.

e.g. in the function dask_array_ops.py: rolling_window

pad_size = max(start, end) + offset - depth[axis]

becomes

if a.shape[axis] == a.chunksize[axis]:
    pad_size = 0
else:
    pad_size = max(start, end) + offset - depth[axis]

This fixes the code for my usage case, perhaps someone could advise if I have understood the issue correctly?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray, chunking and rolling operation adds chunking along new dimension (previously worked) 488547784
539757754 https://github.com/pydata/xarray/issues/3277#issuecomment-539757754 https://api.github.com/repos/pydata/xarray/issues/3277 MDEyOklzc3VlQ29tbWVudDUzOTc1Nzc1NA== p-d-moore 47371188 2019-10-09T00:18:34Z 2019-10-09T00:21:03Z NONE

Using xarray=0.13.0, the chunking behaviour has changed again (but still incorrect): ``` <xarray.DataArray (item: 300, day: 7653)> dask.array<xarray-\<this-array>, shape=(300, 7653), dtype=float64, chunksize=(20, 7653), chunktype=numpy.ndarray> Coordinates: * item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299' * day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01

<xarray.DataArray (item: 300, day: 7653)> dask.array<where, shape=(300, 7653), dtype=float64, chunksize=(20, 7648), chunktype=numpy.ndarray> Coordinates: * item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299' * day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01


ValueError
... ```

Note the chunksize is now (20, 7648) instead of (20, 7653).

I believe it might be related to https://github.com/pydata/xarray/pull/2942 -> I think disabling bottleneck for dask arrays for the rolling operation caused the bug above to appear (so the bug may have been there for a while, but doesn't appear because bottleneck was being used).

Doing a quick trace, in rolling.py , I think it's the line windows = self.construct(rolling_dim) in the reduce function which creates windows with incorrect chunking, possibly as a consequence of some transpose operations and dimension mix up?

It seems strange that other applications aren't having problems with this, unless I am doing something different in my code? Note that I am very specifically chunking in a different dimension to the rolling operation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray, chunking and rolling operation adds chunking along new dimension (previously worked) 488547784
538820900 https://github.com/pydata/xarray/issues/3277#issuecomment-538820900 https://api.github.com/repos/pydata/xarray/issues/3277 MDEyOklzc3VlQ29tbWVudDUzODgyMDkwMA== p-d-moore 47371188 2019-10-07T02:44:10Z 2019-10-07T02:44:10Z NONE

Confirm bug is still present in 0.13.0.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray, chunking and rolling operation adds chunking along new dimension (previously worked) 488547784
527643364 https://github.com/pydata/xarray/issues/3277#issuecomment-527643364 https://api.github.com/repos/pydata/xarray/issues/3277 MDEyOklzc3VlQ29tbWVudDUyNzY0MzM2NA== p-d-moore 47371188 2019-09-03T21:14:44Z 2019-09-03T21:17:28Z NONE

Some additional notes:

The bug also appears for xarray=0.12.2 (so I presume was introduced between 0.12.1 and 0.12.2). Other rolling operations are similarly affected - replacing .mean() in the sample code with .count(), .sum(), .std(), .max() etc results in the same erroneous chunking behaviour.

Another workaround is to downgrade xarray to 0.12.1

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray, chunking and rolling operation adds chunking along new dimension (previously worked) 488547784

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1050.097ms · About: xarray-datasette