id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 732910109,MDU6SXNzdWU3MzI5MTAxMDk=,4554,Unexpected chunking of 3d DataArray in `polyfit()`,26591824,open,0,,,3,2020-10-30T06:07:34Z,2021-04-19T15:44:07Z,,CONTRIBUTOR,,,," **What happened**: When running `polyfit()` on a 3d chunked xarray DataArray, the output is chunked differently than the input array. **What you expected to happen**: I expect the output to have the same chunking as the input. **Minimal Complete Verifiable Example**: (from @rabernat in [https://github.com/xgcm/xrft/issues/116](https://github.com/xgcm/xrft/issues/116)) Example: number of chunks decreases ```python import dask.array as dsa import xarray as xr nz, ny, nx = (10, 20, 30) data = dsa.ones((nz, ny, nx), chunks=(1, 5, nx)) da = xr.DataArray(data, dims=['z', 'y', 'x']) da.chunks # -> ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (5, 5, 5, 5), (30,)) pf = da.polyfit('x', 1) pf.polyfit_coefficients.chunks # -> ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (20,), (30,)) # chunks on the y dimension have been consolidated! pv = xr.polyval(da.x, pf.polyfit_coefficients).transpose('z', 'y', 'x') pv.chunks # -> ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (20,), (30,)) # and this propagates to polyval # align back against the original data (da - pv).chunks # -> ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (5, 5, 5, 5), (30,)) # hides the fact that we have chunk consolidation happening upstream ``` Example: number of chunks increases ```python nz, ny, nx = (6, 10, 4) data = dsa.ones((nz, ny, nx), chunks=(2, 10, 2)) da = xr.DataArray(data, dims=['z', 'y', 'x']) da.chunks # -> ((2, 2, 2), (10,), (2, 2)) pf = da.polyfit('y', 1) pf.polyfit_coefficients.chunks # -> ((2,), (1, 1, 1, 1, 1, 1), (4,)) pv = xr.polyval(da.y, pf.polyfit_coefficients).transpose('z', 'y', 'x') pv.chunks # -> ((1, 1, 1, 1, 1, 1), (10,), (4,)) (da - pv).chunks # -> ((1, 1, 1, 1, 1, 1), (10,), (2, 2)) ``` (This discussion started in [https://github.com/xgcm/xrft/issues/116](https://github.com/xgcm/xrft/issues/116) with @rabernat and @navidcy.) **Environment**: Running on Pangeo Cloud
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.112+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.3 numpy: 1.19.2 scipy: 1.5.2 netCDF4: 1.5.4 pydap: installed h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.7 cfgrib: 0.9.8.4 iris: None bottleneck: 1.3.2 dask: 2.30.0 distributed: 2.30.0 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.16.1 setuptools: 49.6.0.post20201009 pip: 20.2.3 conda: None pytest: 6.1.1 IPython: 7.18.1 sphinx: 3.2.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4554/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue