home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 682335024

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4372#issuecomment-682335024 https://api.github.com/repos/pydata/xarray/issues/4372 682335024 MDEyOklzc3VlQ29tbWVudDY4MjMzNTAyNA== 5821660 2020-08-28T05:35:28Z 2020-08-28T05:35:28Z MEMBER

From the dask apply_gufunc docstring:

```python """ allow_rechunk: Optional, bool, keyword only

Allows rechunking, otherwise chunk sizes need to match and core dimensions are to consist only of one chunk. 
Warning: enabling this can increase memory usage significantly. Defaults to False

""" ``` Current code handling in dask:

https://github.com/dask/dask/blob/42873f27ce11ce35652dda344dae5c47b742bef2/dask/array/gufunc.py#L398-L417

python if not allow_rechunk: chunksizes = chunksizess[dim] #### Check if core dimensions consist of only one chunk if (dim in core_shapes) and (chunksizes[0][0] < core_shapes[dim]): raise ValueError( "Core dimension `'{}'` consists of multiple chunks. To fix, rechunk into a single \ chunk along this dimension or set `allow_rechunk=True`, but beware that this may increase memory usage \ significantly.".format( dim ) ) #### Check if loop dimensions consist of same chunksizes, when they have sizes > 1 relevant_chunksizes = list( unique(c for s, c in zip(sizes, chunksizes) if s > 1) ) if len(relevant_chunksizes) > 1: raise ValueError( "Dimension `'{}'` with different chunksize present".format(dim) )

IIUTC, this not only rechunks non-core dimensions but also fixes core dimensions with more than one chunk. Would this be intended from the xarray-side? Before #4060 core dimension chunks>1 was catched and errored:

python # core dimensions cannot span multiple chunks for axis, dim in enumerate(core_dims, start=-len(core_dims)): if len(data.chunks[axis]) != 1: raise ValueError( "dimension {!r} on {}th function argument to " "apply_ufunc with dask='parallelized' consists of " "multiple chunks, but is also a core dimension. To " "fix, rechunk into a single dask array chunk along " "this dimension, i.e., ``.chunk({})``, but beware " "that this may significantly increase memory usage.".format( dim, n, {dim: -1} ) )

Explicit rechunk was recommended to the user, though.

That means setting allow_rechunk=True per default alone will not give us same behaviour as before #4060. I'm unsure how to proceed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  684930038
Powered by Datasette · Queries took 0.736ms · About: xarray-datasette