html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4428#issuecomment-712066302,https://api.github.com/repos/pydata/xarray/issues/4428,712066302,MDEyOklzc3VlQ29tbWVudDcxMjA2NjMwMg==,1312546,2020-10-19T11:08:13Z,2020-10-19T11:43:46Z,MEMBER,"Sorry, my comment in https://github.com/pydata/xarray/issues/4428#issuecomment-711034128 was incorrect in a couple ways 1. We still do the splitting, even when slicing with an out-of-order indexer. Checking on if that's appropriate. 2. I'm checking in on a logic bug when computing the number of chunks. I don't think we properly handle non-uniform chunking on the other axes.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,702646191 https://github.com/pydata/xarray/issues/4428#issuecomment-711034128,https://api.github.com/repos/pydata/xarray/issues/4428,711034128,MDEyOklzc3VlQ29tbWVudDcxMTAzNDEyOA==,1312546,2020-10-17T15:54:48Z,2020-10-17T15:54:48Z,MEMBER,"I assume that the indices `[np.argsort(da.x.data)]` are not going to be monotonically increasing. That induces a different slicing pattern. The docs in https://docs.dask.org/en/latest/array-slicing.html#efficiency describe the case where the indices are sorted, but doesn't discuss the non-sorted case (yet).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,702646191 https://github.com/pydata/xarray/issues/4428#issuecomment-710683863,https://api.github.com/repos/pydata/xarray/issues/4428,710683863,MDEyOklzc3VlQ29tbWVudDcxMDY4Mzg2Mw==,2448579,2020-10-16T22:40:50Z,2020-10-16T22:40:50Z,MEMBER,"@TomAugspurger @jbusecke is seeing some funny behaviour in https://github.com/jbusecke/cmip6_preprocessing/issues/58 Here's a reproducer ``` python import dask import numpy as np import xarray as xr dask.config.set( **{ ""array.slicing.split_large_chunks"": True, ""array.chunk-size"": ""24 MiB"", } ) da = xr.DataArray( dask.array.random.random((10, 1000, 2000), chunks=(-1, -1, 200)), dims=[""x"", ""y"", ""time""], coords={""x"": [3, 4, 5, 6, 7, 9, 8, 0, 2, 1]}, ) da ``` ![image](https://user-images.githubusercontent.com/2448579/96319766-d15a4b00-0fcd-11eb-9f9d-0f7116933367.png) ![image](https://user-images.githubusercontent.com/2448579/96319786-e0d99400-0fcd-11eb-9eaf-074e92ffc941.png) Which is basically ``` python da.data[np.argsort(da.x.data), ...] ``` ![image](https://user-images.githubusercontent.com/2448579/96319876-141c2300-0fce-11eb-92ec-935645c6dffc.png) I don't understand why its rechunking when we are indexing with a list along a dimension with a single chunk...","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,702646191 https://github.com/pydata/xarray/issues/4428#issuecomment-709539887,https://api.github.com/repos/pydata/xarray/issues/4428,709539887,MDEyOklzc3VlQ29tbWVudDcwOTUzOTg4Nw==,1312546,2020-10-15T19:20:53Z,2020-10-15T19:20:53Z,MEMBER,"Closing the loop here, with https://github.com/dask/dask/pull/6665 the behavior of Dask=2.25.0 should be restored (possibly with a warning about creating large chunks). So this can probably be closed, though there *may* be parts of xarray that should be updated to avoid creating large chunks, or we could rely on the user to do that through the dask config system.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,702646191 https://github.com/pydata/xarray/issues/4428#issuecomment-696475388,https://api.github.com/repos/pydata/xarray/issues/4428,696475388,MDEyOklzc3VlQ29tbWVudDY5NjQ3NTM4OA==,8587080,2020-09-22T02:19:03Z,2020-09-22T02:19:03Z,NONE,Hi. This change of behaviour broke an interpolation for me. The interpolation function does a sortby along the interpolated dimension. But then you can't interpolate along a chunked dimension. I would argue the interpolation function needs to rechunk after the sortby to the original values or stop people from interpolating without assume_sorted=True with a dask array.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,702646191 https://github.com/pydata/xarray/issues/4428#issuecomment-693552440,https://api.github.com/repos/pydata/xarray/issues/4428,693552440,MDEyOklzc3VlQ29tbWVudDY5MzU1MjQ0MA==,6582745,2020-09-16T17:31:54Z,2020-09-16T17:31:54Z,NONE,"Thanks! I will definitely give that a go when I am back at my work PC. My personal take is that this level of automated rechunking is dangerous. I have constructed the chunking in my code with great care and for a reason. Having it changed ""invisibly"" by operations which didn't have this behaviour previously seems problematic to me.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,702646191 https://github.com/pydata/xarray/issues/4428#issuecomment-693475844,https://api.github.com/repos/pydata/xarray/issues/4428,693475844,MDEyOklzc3VlQ29tbWVudDY5MzQ3NTg0NA==,2448579,2020-09-16T15:17:44Z,2020-09-16T15:17:44Z,MEMBER,"This looks like a consequence of https://github.com/dask/dask/pull/6514 . That change helps with cases like https://github.com/pydata/xarray/issues/4112 `sortby` is basically an `isel` indexing operation; so dask is automatically rechunking to make chunks with size < the default. You could fix this by setting an appropriate value in `array.chunk-size` either temporarily or permanently ``` python with dask.config.set({""array.chunk-size"": ""256MiB""}): # or appropriate value ... ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,702646191 https://github.com/pydata/xarray/issues/4428#issuecomment-693385409,https://api.github.com/repos/pydata/xarray/issues/4428,693385409,MDEyOklzc3VlQ29tbWVudDY5MzM4NTQwOQ==,6582745,2020-09-16T12:54:39Z,2020-09-16T12:54:39Z,NONE,"Finally managed to reproduce. Here it is: ```python import xarray import dask.array as da import numpy as np if __name__ == ""__main__"": data = da.random.random([10000, 16, 4], chunks=(10000, 16, 4)) dtype = np.float32 xds = xarray.Dataset( data_vars={""DATA1"": ((""x"", ""y"", ""z""), data.astype(dtype))}) upsample_factor = 1024//xds.dims[""y""] # Create a selection which will upsample the y axis. selection = np.repeat(np.arange(xds.dims[""y""]), upsample_factor) print(""xarray.Dataset prior to resampling:\n"", xds) xds = xds.sel({""y"": selection}) print(""xarray.Dataset post resampling:\n"", xds) ``` With `dask==2.25.0` this gives: ``` xarray.Dataset prior to resampling: Dimensions: (x: 10000, y: 16, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) float32 dask.array xarray.Dataset post resampling: Dimensions: (x: 10000, y: 1024, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) float32 dask.array ``` With `dask==2.26.0` this gives: ``` xarray.Dataset prior to resampling: Dimensions: (x: 10000, y: 16, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) float32 dask.array xarray.Dataset post resampling: Dimensions: (x: 10000, y: 1024, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) float32 dask.array ``` And finally, the most distressing part - changing the dtype changes the chunking! With `dtype = np.complex64`, `dask==2.26.0` gives: ``` xarray.Dataset prior to resampling: Dimensions: (x: 10000, y: 16, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) complex64 dask.array xarray.Dataset post resampling: Dimensions: (x: 10000, y: 1024, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) complex64 dask.array ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,702646191