pull_requests: 1214281011
This data as json
id | node_id | number | state | locked | title | user | body | created_at | updated_at | closed_at | merged_at | merge_commit_sha | assignee | milestone | draft | head | base | author_association | auto_merge | repo | url | merged_by |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1214281011 | PR_kwDOAMm_X85IYHUz | 7472 | closed | 0 | Avoid in-memory broadcasting when converting to_dask_dataframe | 14371165 | Turns out that there's a call to `.set_dims` that forces a broadcast on the numpy coordinates. <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6811 - [x] Tests added, see #7474. - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` Debugging script: <details> ```python import dask.array as da import xarray as xr import numpy as np chunks = 5000 # I have to restart the pc if running with this: # dim1_sz = 100_000 # dim2_sz = 100_000 # Does not crash when using the following constants, >5 gig RAM increase though: dim1_sz = 40_000 dim2_sz = 40_000 x = da.random.random((dim1_sz, dim2_sz), chunks=chunks) ds = xr.Dataset( { "x": xr.DataArray( data=x, dims=["dim1", "dim2"], coords={"dim1": np.arange(0, dim1_sz), "dim2": np.arange(0, dim2_sz)}, ) } ) # with dask.config.set(**{"array.slicing.split_large_chunks": True}): df = ds.to_dask_dataframe() print(df) ``` </details> | 2023-01-24T00:15:01Z | 2023-01-26T17:00:24Z | 2023-01-26T17:00:23Z | 2023-01-26T17:00:23Z | d385e2063a6b5919e1fe9dd3e27a24bc7117137e | 0 | 04173c26ca533c2a67a8522ec3ddba596076dcd9 | 3ee7b5a63bb65ce62eff3dafe4a2e90bca7a9eeb | MEMBER | 13221727 | https://github.com/pydata/xarray/pull/7472 |
Links from other tables
- 2 rows from pull_requests_id in labels_pull_requests