home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1554036799

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1554036799 PR_kwDOAMm_X85IYHUz 7472 Avoid in-memory broadcasting when converting to_dask_dataframe 14371165 closed 0     1 2023-01-24T00:15:01Z 2023-01-26T17:00:24Z 2023-01-26T17:00:23Z MEMBER   0 pydata/xarray/pulls/7472

Turns out that there's a call to .set_dims that forces a broadcast on the numpy coordinates.

  • [x] Closes #6811
  • [x] Tests added, see #7474.
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

Debugging script:

```python import dask.array as da import xarray as xr import numpy as np chunks = 5000 # I have to restart the pc if running with this: # dim1_sz = 100_000 # dim2_sz = 100_000 # Does not crash when using the following constants, >5 gig RAM increase though: dim1_sz = 40_000 dim2_sz = 40_000 x = da.random.random((dim1_sz, dim2_sz), chunks=chunks) ds = xr.Dataset( { "x": xr.DataArray( data=x, dims=["dim1", "dim2"], coords={"dim1": np.arange(0, dim1_sz), "dim2": np.arange(0, dim2_sz)}, ) } ) # with dask.config.set(**{"array.slicing.split_large_chunks": True}): df = ds.to_dask_dataframe() print(df) ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7472/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.521ms · About: xarray-datasette