home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 751489633

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4380#issuecomment-751489633 https://api.github.com/repos/pydata/xarray/issues/4380 751489633 MDEyOklzc3VlQ29tbWVudDc1MTQ4OTYzMw== 35919497 2020-12-27T16:43:56Z 2021-01-08T07:20:24Z COLLABORATOR

Does encoding['chunks'] serve any purpose after you've loaded a Zarr store and all the variables are defined as dask arrays?

No. I run into this frequently and it is annoying. @rabernat do you remember why you chose to keep chunks around in encoding

The encodings["chunks"] is used in to_zarr. It seems to be reasonable: I expect that I should be able to read and re-write a Zarr without modifying the chunking on disk. It seems to me that dask chunks are used in writing only when the encodings["chunks"] is not defined or they are not compatible anymore with variables shapes. In the other cases encodings["chunks"] is used. So if you want to use the encoded chunks, you have to be sure that they are still compatible with variables shapes and that each Zarr chunk is contained in only one dask chunk. If you want to use the dask chunks you can: - Delite the encoded chunking as done by @eric-czech. - Use encoding when you write: ds.to_zarr('/tmp/ds3.zarr', mode='w', encoding={'x': {}}).

Maybe this interface is a little bit confusing. Probably would be better to move overwrite_encoded_chunks from open_dataset to to_zarr. open_dataset interface would be cleaner and would be clear how to use dask chunks in writing.

Concerning the different chunking per variable, I link here this related issue: https://github.com/pydata/xarray/issues/4623

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  686608969
Powered by Datasette · Queries took 0.657ms · About: xarray-datasette