home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 408894639

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2329#issuecomment-408894639 https://api.github.com/repos/pydata/xarray/issues/2329 408894639 MDEyOklzc3VlQ29tbWVudDQwODg5NDYzOQ== 12278765 2018-07-30T15:01:27Z 2018-07-30T15:10:43Z NONE

@rabernat Thanks for your answer.

I have one big NetCDF of ~500GB. What I have changed: - Run in a Jupyter notebook with distributed to get the dashboard - Change the chunks to {'lat': 90, 'lon': 90}. That should be around 1GB per chunk. - Chunk from the beginning with ds = xr.open_dataset('my_netcdf.nc', chunks=chunks) - About the LZ4 compression, I did some test with a 1.5GB extract and the writing time was just 2% slower than uncompressed.

Now when I run to_zarr(), it creates a zarr store (~40kB) and all the workers start to read the disk, but they don't write anything.

The Dask dashboard looks like this:

After a while I get warnings:

distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: 1.55 GB -- Worker memory limit: 2.08 GB

Is this the expected behaviour? I was somehow expecting that each worker will read a chunk and then write it to zarr, streamlined. This does not seem to be the case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  345715825
Powered by Datasette · Queries took 76.266ms · About: xarray-datasette