home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 406705740

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2300#issuecomment-406705740 https://api.github.com/repos/pydata/xarray/issues/2300 406705740 MDEyOklzc3VlQ29tbWVudDQwNjcwNTc0MA== 1530840 2018-07-20T19:36:08Z 2018-07-20T19:38:03Z NONE

Ah, that's great. I do see some improvement. Specifically, I can now set chunks using xarray, and successfully write to zarr, and reopen it. However, when reopening it I do find that the chunks have been inconsistently applied (some fields have the expected chunksize whereas some small fields have the entire variable in one chunk). Furthermore, trying to write a second time with to_zarr leads to: *** NotImplementedError: Specified zarr chunks (100,) would overlap multiple dask chunks ((100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 4),). This is not implemented in xarray yet. Consider rechunking the data usingchunk()or specifying different chunks in encoding. Trying to reapply the original chunks with xr.Dataset.chunk succeeds, and ds.chunks no longer reports "inconsistent chunks", but trying to write still produces the same error.

I also tried loading my entire dataset into memory, allowing the initial to_zarr to default to zarr's chunking heuristics. Trying to read and write a second time again results in the same error: NotImplementedError: Specified zarr chunks (63170,) would overlap multiple dask chunks ((63170, 63170, 63170, 63170, 63170, 63170, 63170, 63169),). This is not implemented in xarray yet. Consider rechunking the data usingchunk()or specifying different chunks in encoding. I tried this round-tripping experiment with my monkey patches, and it works for a sequence of read/write/read/write... without any intervention in between. This only works for default zarr-chunking, however, since the patch to xr.backends.zarr._determine_zarr_chunks overrides whatever chunks are on the originating dataset.

Curious: Is there any downside in xarray to using datasets with inconsistent chunks? I take it that it is a supported configuration because xarray allows it to happen, but just outputs that error when calling ds.chunks, which is just a sort of convenience method for looking at chunks across a whole dataset which happens to have consistent chunks...?

One other thing to add: it might be nice to have an option to allow zarr auto-chunking even when chunks!={}. I don't know how sensitive zarr performance is to chunksizes, but it'd be nice to have some form of sane auto-chunking available when you don't want to bother with manually choosing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  342531772
Powered by Datasette · Queries took 0.683ms · About: xarray-datasette