id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 984555353,MDU6SXNzdWU5ODQ1NTUzNTM=,5754,Variable.stack constructs extremely large chunks,2448579,closed,0,,,6,2021-09-01T03:08:02Z,2023-03-22T14:51:44Z,2021-12-14T17:31:45Z,MEMBER,,,,"**Minimal Complete Verifiable Example**: Here's a small array with too-small chunk sizes just as an example ```python # Put your MCVE code here import dask.array import xarray as xr var = xr.Variable((""x"", ""y"", ""z""), dask.array.random.random((4, 18483, 1000), chunks=(1, 183, -1))) ``` Now stack two dimensions, this is a 100x increase in chunk size (in my actual code, 85MB chunks become 8.5GB chunks =) ) ``` var.stack(new=(""x"", ""y"")) ``` But calling `reshape` on the dask array preserves the original chunk size ``` var.data.reshape((4*18483, -1)) ``` ## Solution Ah, found it , we transpose then reshape in `Variable_stack_once`. https://github.com/pydata/xarray/blob/f915515d610b4471888fa44dfb00dbae3fd22349/xarray/core/variable.py#L1521-L1527 Writing those steps with pure dask yields the same 100x increase in chunksize ``` python var.data.transpose([2, 0, 1]).reshape((-1, 4*18483)) ``` **Anything else we need to know?**: **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:21:18) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1127.18.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.3.1 numpy: 1.21.1 scipy: 1.5.3 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.3.0 Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: 3.0.4 bottleneck: 1.3.2 dask: 2021.07.2 distributed: 2021.07.2 matplotlib: 3.4.2 cartopy: 0.19.0.post1 seaborn: 0.11.1 numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.2.2 conda: 4.10.3 pytest: 6.2.4 IPython: 7.26.0 sphinx: 4.1.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5754/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 985376146,MDU6SXNzdWU5ODUzNzYxNDY=,5758,RTD build failing,2448579,closed,0,,,6,2021-09-01T16:50:58Z,2021-09-08T09:47:17Z,2021-09-08T09:47:16Z,MEMBER,,,,"The current RTD build is failing in `plotting.rst` ``` sphinx.errors.SphinxParallelError: RuntimeError: Non Expected exception in `/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/latest/doc/user-guide/plotting.rst` line None Sphinx parallel build error: RuntimeError: Non Expected exception in `/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/latest/doc/user-guide/plotting.rst` line None [IPKernelApp] WARNING | Parent appears to have exited, shutting down. ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5758/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue