issue_comments: 620961663
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/2237#issuecomment-620961663 | https://api.github.com/repos/pydata/xarray/issues/2237 | 620961663 | MDEyOklzc3VlQ29tbWVudDYyMDk2MTY2Mw== | 1197350 | 2020-04-29T02:45:28Z | 2020-04-29T02:45:28Z | MEMBER | I'm reviving this classic issue to report another quasi-failure of dask chunking, this time in the opposite direction. Consider this dataset:
There are just two big chunks. Now let's try to take an "annual mean" using resample
Now we have a chunksize of 1 and 10 chunks. That's bad: we should still just have two chunks, since we are aggregating only within chunks. Taken to the limit of very high temporal resolution, this example will blow up in terms of number of tasks. I wish dask could figure out that it doesn't have to create all those tasks. The graph looks like this
In contrast,
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
333312849 |