html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1308#issuecomment-286516988,https://api.github.com/repos/pydata/xarray/issues/1308,286516988,MDEyOklzc3VlQ29tbWVudDI4NjUxNjk4OA==,1217238,2017-03-14T18:29:55Z,2017-03-14T18:29:55Z,MEMBER,"> I wonder if the fact that the data is highly compressed (short types converted to float64 with the scaled and offset attributes) can have an influence on dask performance and memory consumption? (especially the later)
Memory consumption, yes, performance, not so much. Scale/offset (de)compression can be applied super fast, unlike zlib compression which can be 10x slower than reading from disk.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,214088387
https://github.com/pydata/xarray/issues/1308#issuecomment-286502400,https://api.github.com/repos/pydata/xarray/issues/1308,286502400,MDEyOklzc3VlQ29tbWVudDI4NjUwMjQwMA==,1217238,2017-03-14T17:43:13Z,2017-03-14T17:43:13Z,MEMBER,"We currently do all the groupby handling ourselves, which means that when you group over smaller units the dask graph gets bigger and each of the tasks gets smaller. Given that each chunk in the grouped data is only about ~250,000 elements, it's not surprising that things get a bit slower -- that's near the point where Python overhead starts to get significant.
It would be useful to benchmark graph creation and execution separately (especially using dask-distributed's profiling tools) to understand where the slow-down is.
One thing that might help quite a bit in cases like this where the individual groups are small is to rewrite xarray's groupby to do some groupby operations *inside* dask, rather than in a loop outside of dask. That would allow executing tasks on bigger chunks of arrays at once, which could significantly reduce scheduler overhead.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,214088387
https://github.com/pydata/xarray/issues/1308#issuecomment-286482853,https://api.github.com/repos/pydata/xarray/issues/1308,286482853,MDEyOklzc3VlQ29tbWVudDI4NjQ4Mjg1Mw==,1217238,2017-03-14T16:43:27Z,2017-03-14T16:43:27Z,MEMBER,"Can you share the shape and dask chunking for `data`, and also describe how the data is stored? That can make a big difference for performance.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,214088387