id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1554036799,PR_kwDOAMm_X85IYHUz,7472,Avoid in-memory broadcasting when converting to_dask_dataframe,14371165,closed,0,,,1,2023-01-24T00:15:01Z,2023-01-26T17:00:24Z,2023-01-26T17:00:23Z,MEMBER,,0,pydata/xarray/pulls/7472,"Turns out that there's a call to `.set_dims` that forces a broadcast on the numpy coordinates. - [x] Closes #6811 - [x] Tests added, see #7474. - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` Debugging script:
```python import dask.array as da import xarray as xr import numpy as np chunks = 5000 # I have to restart the pc if running with this: # dim1_sz = 100_000 # dim2_sz = 100_000 # Does not crash when using the following constants, >5 gig RAM increase though: dim1_sz = 40_000 dim2_sz = 40_000 x = da.random.random((dim1_sz, dim2_sz), chunks=chunks) ds = xr.Dataset( { ""x"": xr.DataArray( data=x, dims=[""dim1"", ""dim2""], coords={""dim1"": np.arange(0, dim1_sz), ""dim2"": np.arange(0, dim2_sz)}, ) } ) # with dask.config.set(**{""array.slicing.split_large_chunks"": True}): df = ds.to_dask_dataframe() print(df) ```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7472/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1555497796,PR_kwDOAMm_X85Ic_wm,7474,Add benchmarks for to_dataframe and to_dask_dataframe,14371165,closed,0,,,1,2023-01-24T18:48:26Z,2023-01-24T21:00:39Z,2023-01-24T20:13:30Z,MEMBER,,0,pydata/xarray/pulls/7474,Related to #7472.,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7474/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull