issue_comments: 620961663

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/2237#issuecomment-620961663	https://api.github.com/repos/pydata/xarray/issues/2237	620961663	MDEyOklzc3VlQ29tbWVudDYyMDk2MTY2Mw==	1197350	2020-04-29T02:45:28Z	2020-04-29T02:45:28Z	MEMBER	I'm reviving this classic issue to report another quasi-failure of dask chunking, this time in the opposite direction. Consider this dataset: `python import xarray as xr ds = xr.Dataset({'foo': (['time'], dsa.ones(120, chunks=60))}, coords={'year': (['time'], np.repeat(np.arange(10), 12))})` `<xarray.Dataset> Dimensions: (time: 120) Coordinates: year (time) int64 0 0 0 0 0 0 0 0 0 0 0 0 1 ... 9 9 9 9 9 9 9 9 9 9 9 9 Dimensions without coordinates: time Data variables: foo (time) float64 dask.array<chunksize=(60,), meta=np.ndarray>` There are just two big chunks. Now let's try to take an "annual mean" using resample `python ds.foo.groupby('year').mean(dim='time')` `<xarray.DataArray 'foo' (year: 10)> dask.array<stack, shape=(10,), dtype=float64, chunksize=(1,), chunktype=numpy.ndarray> Coordinates: * year (year) int64 0 1 2 3 4 5 6 7 8 9` Now we have a chunksize of 1 and 10 chunks. That's bad: we should still just have two chunks, since we are aggregating only within chunks. Taken to the limit of very high temporal resolution, this example will blow up in terms of number of tasks. I wish dask could figure out that it doesn't have to create all those tasks. The graph looks like this In contrast, `coarsen` is smart enough, probably because it relies on dask's underlying coarsen function `ds.foo.coarsen(time=12).mean()` `<xarray.DataArray (time: 10)> dask.array<mean_agg-aggregate, shape=(10,), dtype=float64, chunksize=(5,), chunktype=numpy.ndarray> Coordinates: year (time) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 Dimensions without coordinates: time`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		333312849