issue_comments
8 rows where issue = 333312849 and user = 1197350 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- why time grouping doesn't preserve chunks · 8 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
620961663 | https://github.com/pydata/xarray/issues/2237#issuecomment-620961663 | https://api.github.com/repos/pydata/xarray/issues/2237 | MDEyOklzc3VlQ29tbWVudDYyMDk2MTY2Mw== | rabernat 1197350 | 2020-04-29T02:45:28Z | 2020-04-29T02:45:28Z | MEMBER | I'm reviving this classic issue to report another quasi-failure of dask chunking, this time in the opposite direction. Consider this dataset:
There are just two big chunks. Now let's try to take an "annual mean" using resample
Now we have a chunksize of 1 and 10 chunks. That's bad: we should still just have two chunks, since we are aggregating only within chunks. Taken to the limit of very high temporal resolution, this example will blow up in terms of number of tasks. I wish dask could figure out that it doesn't have to create all those tasks. The graph looks like this
In contrast,
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
why time grouping doesn't preserve chunks 333312849 | |
482275708 | https://github.com/pydata/xarray/issues/2237#issuecomment-482275708 | https://api.github.com/repos/pydata/xarray/issues/2237 | MDEyOklzc3VlQ29tbWVudDQ4MjI3NTcwOA== | rabernat 1197350 | 2019-04-11T19:37:05Z | 2019-04-11T19:37:05Z | MEMBER | We had a long iteration on this in Pangeo, and big progress was made in dask. Definitely closed for now. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
why time grouping doesn't preserve chunks 333312849 | |
398597356 | https://github.com/pydata/xarray/issues/2237#issuecomment-398597356 | https://api.github.com/repos/pydata/xarray/issues/2237 | MDEyOklzc3VlQ29tbWVudDM5ODU5NzM1Ng== | rabernat 1197350 | 2018-06-20T01:42:55Z | 2018-06-20T01:42:55Z | MEMBER | I'm glad to see that this has generated so much serious discussion and thought! I will try to catch up on it in the morning when I have some hope of understanding. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
why time grouping doesn't preserve chunks 333312849 | |
398240724 | https://github.com/pydata/xarray/issues/2237#issuecomment-398240724 | https://api.github.com/repos/pydata/xarray/issues/2237 | MDEyOklzc3VlQ29tbWVudDM5ODI0MDcyNA== | rabernat 1197350 | 2018-06-19T00:57:44Z | 2018-06-19T00:57:44Z | MEMBER | With groupby in xarray, we have two main cases:
Case 2 seems similar to @shoyer's example: If the chunk size before reindexing is not 1, then yes, one needs to do something more sophisticated. But I would argue that, if the array is being re-indexed along a dimension in which the chunk size is 1, a sensible default behavior would be to avoid aggregating into a big chunk and instead just pass the original chunks though in a new order. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
why time grouping doesn't preserve chunks 333312849 | |
398158656 | https://github.com/pydata/xarray/issues/2237#issuecomment-398158656 | https://api.github.com/repos/pydata/xarray/issues/2237 | MDEyOklzc3VlQ29tbWVudDM5ODE1ODY1Ng== | rabernat 1197350 | 2018-06-18T18:55:08Z | 2018-06-18T18:55:08Z | MEMBER | Thanks for the explanation @shoyer! Yes, that appears to be the root of the issue. After literally years of struggling with this, I am happy to finally get to this level of clarity.
Do we think dask is happy with that behavior? If not, then an upstream fix would be best. Pinging @mrocklin. Otherwise we can try to work around in xarray. |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
why time grouping doesn't preserve chunks 333312849 | |
398156747 | https://github.com/pydata/xarray/issues/2237#issuecomment-398156747 | https://api.github.com/repos/pydata/xarray/issues/2237 | MDEyOklzc3VlQ29tbWVudDM5ODE1Njc0Nw== | rabernat 1197350 | 2018-06-18T18:48:34Z | 2018-06-18T18:48:34Z | MEMBER | And just because it's fun, I will show what the anomaly calculation looks like
It looks like everything is really ok up until the very end, where all the tasks aggregate into a single |
{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 2, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
why time grouping doesn't preserve chunks 333312849 | |
398152064 | https://github.com/pydata/xarray/issues/2237#issuecomment-398152064 | https://api.github.com/repos/pydata/xarray/issues/2237 | MDEyOklzc3VlQ29tbWVudDM5ODE1MjA2NA== | rabernat 1197350 | 2018-06-18T18:32:42Z | 2018-06-18T18:32:42Z | MEMBER | I agree that single value chunks illustrates the problem more clearly. I think this example is most clean if you do it like this
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
why time grouping doesn't preserve chunks 333312849 | |
398150381 | https://github.com/pydata/xarray/issues/2237#issuecomment-398150381 | https://api.github.com/repos/pydata/xarray/issues/2237 | MDEyOklzc3VlQ29tbWVudDM5ODE1MDM4MQ== | rabernat 1197350 | 2018-06-18T18:27:08Z | 2018-06-18T18:27:08Z | MEMBER |
One way to answer that is the following: Here is the dask graph for Here is the dask graph for |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
why time grouping doesn't preserve chunks 333312849 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1