home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 688640232

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
688640232 MDU6SXNzdWU2ODg2NDAyMzI= 4389 Stack: avoid re-chunking (dask) and insert new coordinates arbitrarily 1053153 open 0     3 2020-08-30T02:35:48Z 2022-04-28T01:39:06Z   CONTRIBUTOR      

The behavior of stack was not quite intuitive to me, and I'd like to understand if this was an explicit technical decision or if it can be changed.

First, with regard to chunking: arr = xr.DataArray(da.zeros((2, 3, 4), dtype=np.int, chunks=(1 ,1, 1)), dims=['z', 'y' ,'x']) stacked = arr.stack(v=('y', 'x')) print(stacked) -- xarray.DataArray 'zeros-6eb2edd0fca7ec97141e1310bd303988' (z: 2, v: 12)> dask.array<reshape, shape=(2, 12), dtype=int64, chunksize=(1, 4), chunktype=numpy.ndarray> Coordinates: * v (v) MultiIndex - y (v) int64 0 0 0 0 1 1 1 1 2 2 2 2 - x (v) int64 0 1 2 3 0 1 2 3 0 1 2 3 Dimensions without coordinates: z

Why did the number of chunks change in this case? Couldn't the chunksize be (1,1)?

Next, why is it necessary to put the new dimension at the end? It seems there are often more natural (perhaps just to my naive thought process) placements. One example would be that same array above, but stacked on the first two dimensions. I would want the new dimension to be the first dimension (again without the rechunking above). To accomplish this, I do:

arr = xr.DataArray(da.zeros((2, 3, 4), dtype=np.int, chunks=(1 ,1, 1)), dims=['z', 'y' ,'x']) stacked = arr.stack(v=('z', 'y')).transpose('v', ...).chunk({'v': 1}) print(stacked) -- <xarray.DataArray 'zeros-6eb2edd0fca7ec97141e1310bd303988' (v: 6, x: 4)> dask.array<rechunk-merge, shape=(6, 4), dtype=int64, chunksize=(1, 1), chunktype=numpy.ndarray> Coordinates: * v (v) MultiIndex - z (v) int64 0 0 0 1 1 1 - y (v) int64 0 1 2 0 1 2 Dimensions without coordinates: x

The dask graph for this last bit insert a rechunk and two transposes, but my intent was not to have any of the underlying chunks change at all. Here is 1 of 8 pieces of the graph (with optimization off -- optimization combines operations, but doesn't change the topology or the operations):

Is it technically feasible for stack to avoid rechunking, and for the user to determine where the new dimensions should go?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4389/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 156.276ms · About: xarray-datasette