html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/6456#issuecomment-1099643203,https://api.github.com/repos/pydata/xarray/issues/6456,1099643203,IC_kwDOAMm_X85BizlD,5635139,2022-04-14T21:31:37Z,2022-04-14T21:31:37Z,MEMBER,"> @max-sixty could you explain which bit isn't working for you? The initial example I shared works fine in colab for me, so that might be a you problem. The second one required specifying the chunks when making the datasets (I've editted above).
Right, you changed the example after I responded
> But this bug report was more about the fact that overwriting was converting data to NaNs (in two different ways depending on the code apparently).
>
> In my case there is no longer any need to do the overwriting, but this doesn't seem like the expected behaviour of overwriting, and I'm sure there are some valid reasons to overwrite data - hence me opening the bug report.
Something surprising is indeed going on here. To focus on the surprising part;
```python
print(ds3.low_dim.values)
ds3.to_zarr('zarr_bug.zarr', mode='w')
print(ds3.low_dim.values)
```
returns:
```
[[2. 3. 2. ... 8. 0. 9.]
[6. 2. 6. ... 2. 4. 3.]
[0. 8. 8. ... 6. 5. 4.]
...
[1. 0. 5. ... 2. 0. 3.]
[5. 5. 7. ... 9. 6. 2.]
[5. 7. 8. ... 4. 8. 9.]]
[[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
...
[ 1. 0. 5. ... 2. 0. 3.]
[ 5. 5. 7. ... 9. 6. 2.]
[ 5. 7. 8. ... 4. 8. 9.]]
```
Similarly:
```python
In [50]: ds3.low_dim.count().compute()
Out[50]:
array(1000000)
In [51]: ds3.to_zarr('zarr_bug.zarr', mode='w')
Out[51]:
In [55]: ds3.low_dim.count().compute()
Out[55]:
array(500000)
```
So it's changing the result in memory just from writing to the Zarr store. I'm not sure what the cause is.
We can still massively reduce the size of this example — it's currently doing pickling, got a bunch of repeated code, etc. Does it work without the pickling? What if `ds3 = xr.concat([ds1, ds1.copy(deep=True)])`, etc.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1197117301
https://github.com/pydata/xarray/issues/6456#issuecomment-1095585081,https://api.github.com/repos/pydata/xarray/issues/6456,1095585081,IC_kwDOAMm_X85BTU05,5635139,2022-04-11T21:29:27Z,2022-04-11T21:29:27Z,MEMBER,"@tbloch1 it doesn't copy in to someone else's python atm — that's the ""C"" part of MCVE...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1197117301
https://github.com/pydata/xarray/issues/6456#issuecomment-1094412198,https://api.github.com/repos/pydata/xarray/issues/6456,1094412198,IC_kwDOAMm_X85BO2em,5635139,2022-04-10T23:46:53Z,2022-04-10T23:46:53Z,MEMBER,"> Have you tried asking on stackoverflow with the xarray tag?
Or GH Discussions! But it would need a smaller MCVE
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1197117301
https://github.com/pydata/xarray/issues/6456#issuecomment-1093253883,https://api.github.com/repos/pydata/xarray/issues/6456,1093253883,IC_kwDOAMm_X85BKbr7,5635139,2022-04-08T19:05:12Z,2022-04-08T19:05:12Z,MEMBER,"Hi @tbloch1 — thanks for the issue
So I understand — is this loading the existing dataset, adding one a slice, and then writing the whole result? Have you considered using `mode='a'` if you want to write from different processes?
For the example — would it be possible to slim that down a bit further? Does it happen with with one read & write after the initial one?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1197117301