issues: 309100522
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
309100522 | MDU6SXNzdWUzMDkxMDA1MjI= | 2018 | MemoryError when using save_mfdataset() | 1117224 | closed | 0 | 1 | 2018-03-27T19:22:28Z | 2020-03-28T07:51:17Z | 2020-03-28T07:51:17Z | NONE | Code Sample, a copy-pastable example if possible```python import xarray as xr import dask Dummy data that on disk is about ~200GBda = xr.DataArray(dask.array.random.normal(0, 1, size=(12,408,1367,304,448), chunks=(1, 1, 1, 304, 448)), dims=('ensemble', 'init_time', 'fore_time', 'x', 'y')) Perform some calculation on the dask datada_sum = da.sum(dim='x').sum(dim='y')(2525)/(10**6) Write to multiple filesc_e, datasets = zip(*da_sum.to_dataset(name='sic').groupby('ensemble')) paths = ['file_%s.nc' % e for e in c_e] xr.save_mfdataset(datasets, paths) ``` Problem descriptionResults in a MemoryError, when dask should handle writing this OOM DataArray to multiple within-memory-sized netcdf files. Related SO post here Expected Output12 netcdf files (grouped by the ensemble dim). Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2018/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |