home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 172291585

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
172291585 MDU6SXNzdWUxNzIyOTE1ODU= 979 align() should align chunks 6213168 closed 0     4 2016-08-20T21:25:01Z 2019-01-24T17:19:30Z 2019-01-24T17:19:30Z MEMBER      

In the xarray docs I read

With the current version of dask, there is no automatic alignment of chunks when performing operations between dask arrays with different chunk sizes. If your computation involves multiple dask arrays with different chunks, you may need to explicitly rechunk each array to ensure compatibility.

While chunk auto-alignment could be done within the dask library, that would be limited to arrays with the same dimensionality and same dims order. For example it would not be possible to have a dask library call to align the chunks on xarrays with the following dims: - (time, latitude, longitude) - (time) - (longitude, latitude)

even if it makes perfect sense in xarray.

I think xarray.align() should take care of it automatically.

A safe algorithm would be to always scale down the chunksize when in conflict. This would prevent having chunks larger than expected, and should minimise (in a greedy way) the number of operations. It's also a good idea on dask.distributed, where merging two chunks could cause one of them to travel on the network - which is very expensive.

e.g. to reconcile chunksizes a: (5, 10, 6) b: (5, 7, 9) the algorithm would rechunk both arrays to (5, 7, 3, 6).

Finally, when served with a numpy-based array and a dask-based array, align() should convert the numpy array to dask. The critical use case that would benefit from this behaviour is when align() is invoked inside a broadcast() between a tiny constant you just loaded from csv/pandas/pure python list/whatever - e.g. dims=(time, ) shape=(100, ) - and a huge dask-backed array e.g. dims=(time, scenario) shape=(100, 2**30) chunks=(25, 2**20).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/979/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 4 rows from issues_id in issues_labels
  • 4 rows from issue in issue_comments
Powered by Datasette · Queries took 80.55ms · About: xarray-datasette