html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/6456#issuecomment-1099643203,https://api.github.com/repos/pydata/xarray/issues/6456,1099643203,IC_kwDOAMm_X85BizlD,5635139,2022-04-14T21:31:37Z,2022-04-14T21:31:37Z,MEMBER,"> @max-sixty could you explain which bit isn't working for you? The initial example I shared works fine in colab for me, so that might be a you problem. The second one required specifying the chunks when making the datasets (I've editted above). Right, you changed the example after I responded > But this bug report was more about the fact that overwriting was converting data to NaNs (in two different ways depending on the code apparently). > > In my case there is no longer any need to do the overwriting, but this doesn't seem like the expected behaviour of overwriting, and I'm sure there are some valid reasons to overwrite data - hence me opening the bug report. Something surprising is indeed going on here. To focus on the surprising part; ```python print(ds3.low_dim.values) ds3.to_zarr('zarr_bug.zarr', mode='w') print(ds3.low_dim.values) ``` returns: ``` [[2. 3. 2. ... 8. 0. 9.] [6. 2. 6. ... 2. 4. 3.] [0. 8. 8. ... 6. 5. 4.] ... [1. 0. 5. ... 2. 0. 3.] [5. 5. 7. ... 9. 6. 2.] [5. 7. 8. ... 4. 8. 9.]] [[nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan] ... [ 1. 0. 5. ... 2. 0. 3.] [ 5. 5. 7. ... 9. 6. 2.] [ 5. 7. 8. ... 4. 8. 9.]] ``` Similarly: ```python In [50]: ds3.low_dim.count().compute() Out[50]: array(1000000) In [51]: ds3.to_zarr('zarr_bug.zarr', mode='w') Out[51]: In [55]: ds3.low_dim.count().compute() Out[55]: array(500000) ``` So it's changing the result in memory just from writing to the Zarr store. I'm not sure what the cause is. We can still massively reduce the size of this example — it's currently doing pickling, got a bunch of repeated code, etc. Does it work without the pickling? What if `ds3 = xr.concat([ds1, ds1.copy(deep=True)])`, etc.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1197117301 https://github.com/pydata/xarray/issues/6456#issuecomment-1095585081,https://api.github.com/repos/pydata/xarray/issues/6456,1095585081,IC_kwDOAMm_X85BTU05,5635139,2022-04-11T21:29:27Z,2022-04-11T21:29:27Z,MEMBER,"@tbloch1 it doesn't copy in to someone else's python atm — that's the ""C"" part of MCVE...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1197117301 https://github.com/pydata/xarray/issues/6456#issuecomment-1094412198,https://api.github.com/repos/pydata/xarray/issues/6456,1094412198,IC_kwDOAMm_X85BO2em,5635139,2022-04-10T23:46:53Z,2022-04-10T23:46:53Z,MEMBER,"> Have you tried asking on stackoverflow with the xarray tag? Or GH Discussions! But it would need a smaller MCVE ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1197117301 https://github.com/pydata/xarray/issues/6456#issuecomment-1093253883,https://api.github.com/repos/pydata/xarray/issues/6456,1093253883,IC_kwDOAMm_X85BKbr7,5635139,2022-04-08T19:05:12Z,2022-04-08T19:05:12Z,MEMBER,"Hi @tbloch1 — thanks for the issue So I understand — is this loading the existing dataset, adding one a slice, and then writing the whole result? Have you considered using `mode='a'` if you want to write from different processes? For the example — would it be possible to slim that down a bit further? Does it happen with with one read & write after the initial one?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1197117301