issues: 1554036799
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1554036799 | PR_kwDOAMm_X85IYHUz | 7472 | Avoid in-memory broadcasting when converting to_dask_dataframe | 14371165 | closed | 0 | 1 | 2023-01-24T00:15:01Z | 2023-01-26T17:00:24Z | 2023-01-26T17:00:23Z | MEMBER | 0 | pydata/xarray/pulls/7472 | Turns out that there's a call to
Debugging script:
```python
import dask.array as da
import xarray as xr
import numpy as np
chunks = 5000
# I have to restart the pc if running with this:
# dim1_sz = 100_000
# dim2_sz = 100_000
# Does not crash when using the following constants, >5 gig RAM increase though:
dim1_sz = 40_000
dim2_sz = 40_000
x = da.random.random((dim1_sz, dim2_sz), chunks=chunks)
ds = xr.Dataset(
{
"x": xr.DataArray(
data=x,
dims=["dim1", "dim2"],
coords={"dim1": np.arange(0, dim1_sz), "dim2": np.arange(0, dim2_sz)},
)
}
)
# with dask.config.set(**{"array.slicing.split_large_chunks": True}):
df = ds.to_dask_dataframe()
print(df)
```
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7472/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | pull |