home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 406812274

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
406812274 MDU6SXNzdWU0MDY4MTIyNzQ= 2745 reindex doesn't preserve chunks 4711805 open 0     1 2019-02-05T14:37:24Z 2023-12-04T20:46:36Z   CONTRIBUTOR      

The following code creates a small (100x100) chunked DataArray, and then re-indexes it into a huge one (100000x100000):

```python import xarray as xr import numpy as np

n = 100 x = np.arange(n) y = np.arange(n) da = xr.DataArray(np.zeros(n*n).reshape(n, n), coords=[x, y], dims=['x', 'y']).chunk(n, n)

n2 = 100000 x2 = np.arange(n2) y2 = np.arange(n2) da2 = da.reindex({'x': x2, 'y': y2}) da2 ```

But the re-indexed DataArray has chunksize=(100000, 100000) instead of chunksize=(100, 100):

<xarray.DataArray (x: 100000, y: 100000)> dask.array<shape=(100000, 100000), dtype=float64, chunksize=(100000, 100000)> Coordinates: * x (x) int64 0 1 2 3 4 5 6 ... 99994 99995 99996 99997 99998 99999 * y (y) int64 0 1 2 3 4 5 6 ... 99994 99995 99996 99997 99998 99999

Which immediately leads to a memory error when trying to e.g. store it to a zarr archive:

python ds2 = da2.to_dataset(name='foo') ds2.to_zarr(store='foo', mode='w')

Trying to re-chunk to 100x100 before storing it doesn't help, but this time it takes a lot more time before crashing with a memory error:

python da3 = da2.chunk(n, n) ds3 = da3.to_dataset(name='foo') ds3.to_zarr(store='foo', mode='w')

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2745/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 5.848ms · About: xarray-datasette