html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2624#issuecomment-453799948,https://api.github.com/repos/pydata/xarray/issues/2624,453799948,MDEyOklzc3VlQ29tbWVudDQ1Mzc5OTk0OA==,2443309,2019-01-13T03:54:07Z,2019-01-13T03:54:07Z,MEMBER,I'm going to close this as the original issue (error in compression/codecs) has been resolved. @ktyle - I'd be happy to continue this discussion on the Pangeo issue tracker if you'd like to discuss optimal chunk layout more.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,393214032 https://github.com/pydata/xarray/issues/2624#issuecomment-451206728,https://api.github.com/repos/pydata/xarray/issues/2624,451206728,MDEyOklzc3VlQ29tbWVudDQ1MTIwNjcyOA==,2443309,2019-01-03T16:59:06Z,2019-01-03T16:59:06Z,MEMBER,"@ktyle - glad to hear things are moving for you. I'm pretty sure the last chunk in each of your datasets is smaller than the rest. So after concatenation, you end up with a small chunk in the middle and at the end of the time dimension. I bet if you used a chunk size of 172 (divides evenly into 2924), you wouldn't need to rechunk.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,393214032 https://github.com/pydata/xarray/issues/2624#issuecomment-449184669,https://api.github.com/repos/pydata/xarray/issues/2624,449184669,MDEyOklzc3VlQ29tbWVudDQ0OTE4NDY2OQ==,1197350,2018-12-21T00:16:40Z,2018-12-21T00:16:40Z,MEMBER,"> You can also rechunk your dataset after the fact using the `chunk` method: Not a good idea in this case. The original 49GB chunks will still exist in the task graph and will have to be computed before the rechunking step.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,393214032 https://github.com/pydata/xarray/issues/2624#issuecomment-449184291,https://api.github.com/repos/pydata/xarray/issues/2624,449184291,MDEyOklzc3VlQ29tbWVudDQ0OTE4NDI5MQ==,2443309,2018-12-21T00:14:22Z,2018-12-21T00:14:22Z,MEMBER,"You can also rechunk your dataset after the fact using the `chunk` method: ```Python ds = ds.chunk({'time': 1}) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,393214032 https://github.com/pydata/xarray/issues/2624#issuecomment-449151325,https://api.github.com/repos/pydata/xarray/issues/2624,449151325,MDEyOklzc3VlQ29tbWVudDQ0OTE1MTMyNQ==,1197350,2018-12-20T22:09:20Z,2018-12-20T22:09:20Z,MEMBER,"So the key information is this: ``` dask.array ``` This says that your dask chunk size is 1460 x 32 x 361 x 720 (x 4 bytes for `float32` data) = 48573849600 bytes = ~49 GB. So this dataset is probably unusable for any purpose, including serialization (to zarr, netCDF, or any other format supported by xarray.) Furthermore, the dask chunks will be automatically mapped to zarr chunks by xarray. These zarr chunks would be much too big to be useful. [Zarr docs](https://zarr.readthedocs.io/en/stable/tutorial.html#chunk-optimizations) say ""at least 1MB"". In my [example notebook](https://gist.github.com/rabernat/4ae286464f55a75e30b55f529195aff0) I recommeded 10-100 MB.) For both zarr and dask, you can think of a chunk as an amount of data that can be comfortably held in memory and passed around the network. (That's where the 10 - 100 MB estimate comes from.) It is also the _minimum_ size of data that can be read from the dataset at once. Even if you only need one single value, the whole chunk needs to be read into memory and decompressed. I would recommend you chunk along the time dimension. You can accomplish by adding the `chunks` keyword when opening the dataset ```python ds = xr.open_mfdataset([f1, f2], chunks={'time': 1}) ``` I imagine that will fix most of your issues.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,393214032 https://github.com/pydata/xarray/issues/2624#issuecomment-449145011,https://api.github.com/repos/pydata/xarray/issues/2624,449145011,MDEyOklzc3VlQ29tbWVudDQ0OTE0NTAxMQ==,1197350,2018-12-20T21:43:40Z,2018-12-20T21:43:40Z,MEMBER,"> I thought I might try specifying no compression, as supported in Zarr, by adding ""compressor = None"" as a kwarg in the to_zarr call in xarray, but that is not supported. The syntax and an example for specifying a compressor is given in the docs here: http://xarray.pydata.org/en/latest/io.html#zarr-compressors-and-filters. It needs to be part of the `encoding` keyword. But I don't think this will solve your problem.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,393214032 https://github.com/pydata/xarray/issues/2624#issuecomment-449144275,https://api.github.com/repos/pydata/xarray/issues/2624,449144275,MDEyOklzc3VlQ29tbWVudDQ0OTE0NDI3NQ==,1197350,2018-12-20T21:40:44Z,2018-12-20T21:40:44Z,MEMBER,"@ktyle - it sounds like your chunks are too big. Can you post xarray's representation of your dataset before writing it to zarr? Call `print(ds)` and paste the output here. p.s. I edited your comment a bit to put the code into code blocks.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,393214032