html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4380#issuecomment-795114188,https://api.github.com/repos/pydata/xarray/issues/4380,795114188,MDEyOklzc3VlQ29tbWVudDc5NTExNDE4OA==,743508,2021-03-10T09:00:48Z,2021-03-10T09:00:48Z,CONTRIBUTOR,"Running into the same issue, when I: 1. Load input from a Zarr data source 2. Queue some processing (delayed dask ufuncs) 3. Re-chunk using `chunk()` to get the dask task size I want 4. use to_zarr to trigger the calculation (dask distributed backend) and save to a new file on disk I get the chunk size mismatch error which I solve by manually overwriting the `encoding['chunks']` value, which seems unintuitive to me. Since I'm going from->to a zarr, I assumed that calling `chunk()` would set the chunk size for both the dask arrays and the zarr output, since calling `to_zarr` on a dask array will only work if the dask and zarr encoding chunk size match. I didn't realize the `overwrite_encoded_chunks` option existed but it's also a bit confusing that to get the right chunksize on the *output* i need to set the overwrite option on the *input*. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,686608969 https://github.com/pydata/xarray/issues/4380#issuecomment-751489633,https://api.github.com/repos/pydata/xarray/issues/4380,751489633,MDEyOklzc3VlQ29tbWVudDc1MTQ4OTYzMw==,35919497,2020-12-27T16:43:56Z,2021-01-08T07:20:24Z,COLLABORATOR,"> > Does encoding['chunks'] serve any purpose after you've loaded a Zarr store and all the variables are defined as dask arrays? > > No. I run into this frequently and it is annoying. @rabernat do you remember why you chose to keep `chunks` around in `encoding` The `encodings[""chunks""]` is used in `to_zarr`. It seems to be reasonable: I expect that I should be able to read and re-write a Zarr without modifying the chunking on disk. It seems to me that dask chunks are used in writing only when the `encodings[""chunks""]` is not defined or they are not compatible anymore with variables shapes. In the other cases `encodings[""chunks""]` is used. So if you want to use the encoded chunks, you have to be sure that they are still compatible with variables shapes and that each Zarr chunk is contained in only one dask chunk. If you want to use the dask chunks you can: - Delite the encoded chunking as done by @eric-czech. - Use encoding when you write: `ds.to_zarr('/tmp/ds3.zarr', mode='w', encoding={'x': {}})`. Maybe this interface is a little bit confusing. Probably would be better to move `overwrite_encoded_chunks` from `open_dataset` to `to_zarr`. `open_dataset` interface would be cleaner and would be clear how to use dask chunks in writing. Concerning the different chunking per variable, I link here this related issue: https://github.com/pydata/xarray/issues/4623","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,686608969 https://github.com/pydata/xarray/issues/4380#issuecomment-751481163,https://api.github.com/repos/pydata/xarray/issues/4380,751481163,MDEyOklzc3VlQ29tbWVudDc1MTQ4MTE2Mw==,35919497,2020-12-27T15:32:10Z,2020-12-27T15:32:29Z,COLLABORATOR,"I'm not sure but ... It seems to be a bug this error. There is a check on the final chunk that it seems to have the wrong direction in the inequality. The part of the code to decide what's chunking should be used in case we have defined both, dask chunking and encoded chucking, is the following: https://github.com/pydata/xarray/blob/ac234619d5471e789b0670a673084dbb01df4f9e/xarray/backends/zarr.py#L141-L173 the aims of these checks, as described in the comment, is to avoid to have multiple dask chunks in one zarr chunk. According to this logic this inequality at line 163: https://github.com/pydata/xarray/blob/ac234619d5471e789b0670a673084dbb01df4f9e/xarray/backends/zarr.py#L163 has the wrong direction. It should be in this way: `if dchunks[-1] < zchunk`, but this last one seems to me that it is always verified.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,686608969 https://github.com/pydata/xarray/issues/4380#issuecomment-683301156,https://api.github.com/repos/pydata/xarray/issues/4380,683301156,MDEyOklzc3VlQ29tbWVudDY4MzMwMTE1Ng==,2448579,2020-08-29T14:53:28Z,2020-08-29T14:53:28Z,MEMBER,"> Does encoding['chunks'] serve any purpose after you've loaded a zarr store and all the variables are defined as dask arrays? No. I run into this frequently and it is annoying. @rabernat do you remember why you chose to keep `chunks` around in `encoding`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,686608969