home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 449151325

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2624#issuecomment-449151325 https://api.github.com/repos/pydata/xarray/issues/2624 449151325 MDEyOklzc3VlQ29tbWVudDQ0OTE1MTMyNQ== 1197350 2018-12-20T22:09:20Z 2018-12-20T22:09:20Z MEMBER

So the key information is this: dask.array<shape=(2920, 32, 361, 720), chunksize=(1460, 32, 361, 720)>

This says that your dask chunk size is 1460 x 32 x 361 x 720 (x 4 bytes for float32 data) = 48573849600 bytes = ~49 GB. So this dataset is probably unusable for any purpose, including serialization (to zarr, netCDF, or any other format supported by xarray.)

Furthermore, the dask chunks will be automatically mapped to zarr chunks by xarray. These zarr chunks would be much too big to be useful. Zarr docs say "at least 1MB". In my example notebook I recommeded 10-100 MB.)

For both zarr and dask, you can think of a chunk as an amount of data that can be comfortably held in memory and passed around the network. (That's where the 10 - 100 MB estimate comes from.) It is also the minimum size of data that can be read from the dataset at once. Even if you only need one single value, the whole chunk needs to be read into memory and decompressed.

I would recommend you chunk along the time dimension. You can accomplish by adding the chunks keyword when opening the dataset python ds = xr.open_mfdataset([f1, f2], chunks={'time': 1})

I imagine that will fix most of your issues.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  393214032
Powered by Datasette · Queries took 0.807ms · About: xarray-datasette