home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 620961663

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2237#issuecomment-620961663 https://api.github.com/repos/pydata/xarray/issues/2237 620961663 MDEyOklzc3VlQ29tbWVudDYyMDk2MTY2Mw== 1197350 2020-04-29T02:45:28Z 2020-04-29T02:45:28Z MEMBER

I'm reviving this classic issue to report another quasi-failure of dask chunking, this time in the opposite direction.

Consider this dataset: python import xarray as xr ds = xr.Dataset({'foo': (['time'], dsa.ones(120, chunks=60))}, coords={'year': (['time'], np.repeat(np.arange(10), 12))})

<xarray.Dataset> Dimensions: (time: 120) Coordinates: year (time) int64 0 0 0 0 0 0 0 0 0 0 0 0 1 ... 9 9 9 9 9 9 9 9 9 9 9 9 Dimensions without coordinates: time Data variables: foo (time) float64 dask.array<chunksize=(60,), meta=np.ndarray>

There are just two big chunks.

Now let's try to take an "annual mean" using resample

python ds.foo.groupby('year').mean(dim='time')

<xarray.DataArray 'foo' (year: 10)> dask.array<stack, shape=(10,), dtype=float64, chunksize=(1,), chunktype=numpy.ndarray> Coordinates: * year (year) int64 0 1 2 3 4 5 6 7 8 9

Now we have a chunksize of 1 and 10 chunks. That's bad: we should still just have two chunks, since we are aggregating only within chunks. Taken to the limit of very high temporal resolution, this example will blow up in terms of number of tasks. I wish dask could figure out that it doesn't have to create all those tasks.

The graph looks like this

In contrast, coarsen is smart enough, probably because it relies on dask's underlying coarsen function ds.foo.coarsen(time=12).mean()

<xarray.DataArray (time: 10)> dask.array<mean_agg-aggregate, shape=(10,), dtype=float64, chunksize=(5,), chunktype=numpy.ndarray> Coordinates: year (time) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 Dimensions without coordinates: time

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  333312849
Powered by Datasette · Queries took 0.65ms · About: xarray-datasette