issue_comments
6 rows where author_association = "NONE" and user = 47371188 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: issue_url, created_at (date), updated_at (date)
user 1
- p-d-moore · 6 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
550957737 | https://github.com/pydata/xarray/issues/3277#issuecomment-550957737 | https://api.github.com/repos/pydata/xarray/issues/3277 | MDEyOklzc3VlQ29tbWVudDU1MDk1NzczNw== | p-d-moore 47371188 | 2019-11-07T07:25:50Z | 2019-11-07T07:25:50Z | NONE | Error still present in 0.14.0 I believe the bug occurs in dask_array_ops.py: rolling_window My best guess at understanding the code: I believe there is an attempt to "pad" rolling windows to ensure the rolling windows doesn't miss data across chunk boundaries. I think the "padding" is supposed to be truncated later, but something is miscalculated and the final array ends up with the wrong chunking. In the case I presented, the "chunking" happens along a different dimension to the "rolling" and padding is not necessary. Perhaps something goes haywire because the code was written to guard against rolling along a chunked dimension (and missing data across chunk boundaries)? Additionally, as the padding is not necessary in this case, there is a performance penalty that could be avoided? A simple fix for my case is to not do any "padding" whenever the chunksize along the rolling dimension is equal to the arraysize along the rolling dimension. e.g. in the function dask_array_ops.py: rolling_window
becomes
This fixes the code for my usage case, perhaps someone could advise if I have understood the issue correctly? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray, chunking and rolling operation adds chunking along new dimension (previously worked) 488547784 | |
539757754 | https://github.com/pydata/xarray/issues/3277#issuecomment-539757754 | https://api.github.com/repos/pydata/xarray/issues/3277 | MDEyOklzc3VlQ29tbWVudDUzOTc1Nzc1NA== | p-d-moore 47371188 | 2019-10-09T00:18:34Z | 2019-10-09T00:21:03Z | NONE | Using xarray=0.13.0, the chunking behaviour has changed again (but still incorrect): ``` <xarray.DataArray (item: 300, day: 7653)> dask.array<xarray-\<this-array>, shape=(300, 7653), dtype=float64, chunksize=(20, 7653), chunktype=numpy.ndarray> Coordinates: * item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299' * day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01 <xarray.DataArray (item: 300, day: 7653)> dask.array<where, shape=(300, 7653), dtype=float64, chunksize=(20, 7648), chunktype=numpy.ndarray> Coordinates: * item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299' * day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01 ValueError Note the chunksize is now (20, 7648) instead of (20, 7653). I believe it might be related to https://github.com/pydata/xarray/pull/2942 -> I think disabling bottleneck for dask arrays for the rolling operation caused the bug above to appear (so the bug may have been there for a while, but doesn't appear because bottleneck was being used). Doing a quick trace, in rolling.py , I think it's the line windows = self.construct(rolling_dim) in the reduce function which creates windows with incorrect chunking, possibly as a consequence of some transpose operations and dimension mix up? It seems strange that other applications aren't having problems with this, unless I am doing something different in my code? Note that I am very specifically chunking in a different dimension to the rolling operation. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray, chunking and rolling operation adds chunking along new dimension (previously worked) 488547784 | |
538820900 | https://github.com/pydata/xarray/issues/3277#issuecomment-538820900 | https://api.github.com/repos/pydata/xarray/issues/3277 | MDEyOklzc3VlQ29tbWVudDUzODgyMDkwMA== | p-d-moore 47371188 | 2019-10-07T02:44:10Z | 2019-10-07T02:44:10Z | NONE | Confirm bug is still present in 0.13.0. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray, chunking and rolling operation adds chunking along new dimension (previously worked) 488547784 | |
527771975 | https://github.com/pydata/xarray/issues/3213#issuecomment-527771975 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyNzc3MTk3NQ== | p-d-moore 47371188 | 2019-09-04T07:05:37Z | 2019-09-04T07:05:37Z | NONE | Thanks @crusaderky, appreciated. Might as as well suggest it there. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
527762609 | https://github.com/pydata/xarray/issues/3213#issuecomment-527762609 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyNzc2MjYwOQ== | p-d-moore 47371188 | 2019-09-04T06:32:21Z | 2019-09-04T06:32:21Z | NONE | I would like to add a request for sparse xarrays: Support ffill and bfill operations along ordered dimensions (such as datetime coordinates) while maintaining the sparse level of data density. The challenge to overcome is that performing ffill operations on sparse data quickly creates data that is no longer "sparse" in practice and makes dealing with the data challenging. My suggested implementation (and the way I have previously done this in another programming environment) is to represent the data as rows of contiguous regions with a single (non-sparse) value rather than rows of single points. The contiguous dimensions could be defined as any dimensions that are "ordered" such as datetime coordinates. That is, the data then is represented as a list of values + coordinate ranges rather than a list of values + coordinates. The idea is that you can easily compute operations like ffill without changing the sparsity of the matrix, and thus support typical aggregating functions you might like to apply to the data before you collapse the data and convert to a non-sparse form (e.g. perform a lag difference of the most recent value with the most recent value 20 days ago, or do a cross-sectional mean on the data along a certain dimension, using the most recent data at each given point in time). These types of operations can be more useful when the data is "fuller" such as after a forward fill, but often not useful when the data is very sparsely populated (as the cross-sectional operations are unlikely to hit the sparse data among the different dimensions). Care must be taken to avoid "collisions" between sparse blocks of data, that is, avoiding that the list of sparse blocks accidentally overlap. The implementation can get tricky but I believe the goal to be worthwhile. I am happy to expand on the request if the idea is not well expressed. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
527643364 | https://github.com/pydata/xarray/issues/3277#issuecomment-527643364 | https://api.github.com/repos/pydata/xarray/issues/3277 | MDEyOklzc3VlQ29tbWVudDUyNzY0MzM2NA== | p-d-moore 47371188 | 2019-09-03T21:14:44Z | 2019-09-03T21:17:28Z | NONE | Some additional notes: The bug also appears for xarray=0.12.2 (so I presume was introduced between 0.12.1 and 0.12.2). Other rolling operations are similarly affected - replacing .mean() in the sample code with .count(), .sum(), .std(), .max() etc results in the same erroneous chunking behaviour. Another workaround is to downgrade xarray to 0.12.1 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray, chunking and rolling operation adds chunking along new dimension (previously worked) 488547784 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
issue 2