html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/6329#issuecomment-1064973518,https://api.github.com/repos/pydata/xarray/issues/6329,1064973518,IC_kwDOAMm_X84_ejTO,9576982,2022-03-11T10:19:03Z,2022-03-11T10:20:09Z,NONE,"> If you find out more about the cloud case, please post a note, otherwise, we can assume that the original bug report is fine? I think so, except that it affects append and region methods not just append. Yes for the above case, it should work. I need to better test all this. Thanks","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1159923690 https://github.com/pydata/xarray/issues/6329#issuecomment-1063949669,https://api.github.com/repos/pydata/xarray/issues/6329,1063949669,IC_kwDOAMm_X84_apVl,9576982,2022-03-10T11:21:18Z,2022-03-10T11:21:18Z,NONE,"Ok, changing to `'r+'` leads to the error suggesting to use `'a'` `ValueError: dataset contains non-pre-existing variables ['air'], which is not allowed in ``xarray.Dataset.to_zarr()`` with mode='r+'. To allow writing new variables, set mode='a'.` I have found something that gives me satisfactory results. The reason why I have issues in the cloud, I still don't know, I am still investigating. Maybe it is unrelated. The following script kinds of keep the important stuff but still it is not very clean as some of the parameters are not included in the final file. I ended up doing the same kind of convoluted approach as I was making before. But hopefully that's helpful to someone looking for some sort of real-case example. Definitely clarified stuff in my head. ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np import dask.array as da def init_coord(ds, X,Y): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1)), X,Y) return arr_r.x.values, arr_r.y.values def some_processing(arr, X,Y): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(Y,X), resampling=Resampling.bilinear, nodata=np.nan) return arr_r filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') ds.air.encoding['dtype']=np.dtype('float32') X,Y=250, 250 #size of each final timestep x,y=init_coord(ds, X,Y) dummy=da.zeros((len(ds.time.values), Y, X)) ds_to_write=xr.Dataset({'air':(('time','y','x'), dummy)}, coords={'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}) ds_to_write.to_zarr(filename, compute=False, encoding={""time"": {""chunks"": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1)),X,Y) buff= arr_r.drop(['spatial_ref','x','y']).chunk({'time':1,'x':X,'y':Y}) del buff.air.attrs[""_FillValue""] buff.to_zarr(filename, mode='r+', region={'time':slice(i,i+1)}) ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1159923690 https://github.com/pydata/xarray/issues/6329#issuecomment-1063851972,https://api.github.com/repos/pydata/xarray/issues/6329,1063851972,IC_kwDOAMm_X84_aRfE,9576982,2022-03-10T09:36:00Z,2022-03-10T09:36:18Z,NONE,"sorry that's a mistake. I think append was suggested at some point by one of the error message. I cannot remember `'r+' ` being described into the doc of xarray. Would you mind detailing what it does? Cheers","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1159923690 https://github.com/pydata/xarray/issues/6329#issuecomment-1062724755,https://api.github.com/repos/pydata/xarray/issues/6329,1062724755,IC_kwDOAMm_X84_V-ST,9576982,2022-03-09T09:30:42Z,2022-03-09T09:30:42Z,NONE,"OK, that is easy to change, now you have the exact same error message as for the appending. I have tried a lot of different ways and I am not getting anywhere with writing the data correctly in a store. ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np def init_coord(ds): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1))) return arr_r.x.values, arr_r.y.values def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) return arr_r filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') x,y=init_coord(ds) ds_to_write=xr.Dataset(coords={'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}) ds_to_write.to_zarr(filename, compute=False, encoding={""time"": {""chunks"": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1))) buff= arr_r.drop(['spatial_ref','x','y']).chunk({'time':1,'x':250,'y':250}) buff.air.encoding['dtype']=np.dtype('float32') buff.to_zarr(filename, mode='a', region={'time':slice(i,i+1)}) ``` `ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1159923690 https://github.com/pydata/xarray/issues/6329#issuecomment-1061651626,https://api.github.com/repos/pydata/xarray/issues/6329,1061651626,IC_kwDOAMm_X84_R4Sq,9576982,2022-03-08T10:55:50Z,2022-03-08T10:55:50Z,NONE,"Ok sorry for the different mistakes, I wrote that in a hurry. Strangely enough this has a different behaviour but it crashes too. ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np def init_coord(ds): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1))) return arr_r.x.values, arr_r.y.values def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) return arr_r filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') x,y=init_coord(ds) ds_to_write=xr.Dataset(coords={'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}) ds_to_write.to_zarr(filename, compute=False, encoding={""time"": {""chunks"": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1))) buff= arr_r.drop(['spatial_ref','x','y']).chunk({'time':1,'x':250,'y':250}) buff.to_zarr(filename, mode='a', region={'time':slice(i,i+1)}) ``` With error: `ValueError: fill_value nan is not valid for dtype int16; nested exception: cannot convert float NaN to integer` but the output of buff is: ![image](https://user-images.githubusercontent.com/9576982/157224772-473d53ef-b5ad-43e0-a3ed-aec89ea2ca8d.png) ie. it contains only floats","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1159923690 https://github.com/pydata/xarray/issues/6329#issuecomment-1060493852,https://api.github.com/repos/pydata/xarray/issues/6329,1060493852,IC_kwDOAMm_X84_Ndoc,9576982,2022-03-07T10:48:21Z,2022-03-07T10:48:21Z,NONE,"This will fail like append. just tried to make some kind of realistic example like reprojecting from a geographic to an orthogonal system. If you look at all the stages you need to go through... and still not sure this is working as it should ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np def init_coord(ds): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1)) return arr_r.x.values, arr_r.y.values def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) return arr_r filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') x,y=init_coord(ds) ds_to_write=xr.Dataset({'coords':{'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}}) ds_to_write.to_zarr(filename, compute =false, encoding={""time"": {""chunks"": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1)) agg_r_t= agg_r.drop(['spatial_ref']).expand_dims({'time':[ds.time.values[i]]}) buff= xr.Dataset(({'air':agg_r_t}).chunk({'time':1,'x':250,'y':250}) buff.drop(['x','y']).to_zarr(filename, , region={'time':slice(i,i+1)}) ``` You would need to change the processing function to something like: ``` python def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) del arr_r.attrs[""_FillValue""] return arr_r ``` Sorry maybe I am repetitive but I want to be sure that it is clearly illustrated. I have done another test on the cloud, checking the values at the moment. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1159923690 https://github.com/pydata/xarray/issues/6329#issuecomment-1059760613,https://api.github.com/repos/pydata/xarray/issues/6329,1059760613,IC_kwDOAMm_X84_Kqnl,9576982,2022-03-05T13:01:40Z,2022-03-05T14:51:07Z,NONE,"just to make clear what is weird this is just a test to see if the regions were written to file and it seems that it did randomly and most likely overprinted regions on regions. I have no idea how that is possible. In theory everything should be written from i = 95 to 954. It could be in my code so I am checking again but that sounds unlikely without raising any error. I am just showing this so that you better understand what I am observing ![image](https://user-images.githubusercontent.com/9576982/156884177-06b033da-eab5-4a55-b17e-15d6da3951c6.png) Just to say that I had all the timesteps written in theory as I print a confirmation message at each iteration ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1159923690 https://github.com/pydata/xarray/issues/6329#issuecomment-1059777476,https://api.github.com/repos/pydata/xarray/issues/6329,1059777476,IC_kwDOAMm_X84_KuvE,9576982,2022-03-05T14:48:58Z,2022-03-05T14:48:58Z,NONE,"I can confirm that it also fails with precomputing a dataset and fill regions with the same error `ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1159923690 https://github.com/pydata/xarray/issues/6329#issuecomment-1059754523,https://api.github.com/repos/pydata/xarray/issues/6329,1059754523,IC_kwDOAMm_X84_KpIb,9576982,2022-03-05T12:26:52Z,2022-03-05T12:26:52Z,NONE,Sorry to add to the confusion I actually have had another kind of strange behaviour by deleting the fill_value with the `region` method. I thought the run worked but it didn't. I am investigating...,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1159923690 https://github.com/pydata/xarray/issues/6329#issuecomment-1059423718,https://api.github.com/repos/pydata/xarray/issues/6329,1059423718,IC_kwDOAMm_X84_JYXm,9576982,2022-03-04T18:44:01Z,2022-03-04T18:44:01Z,NONE,"I will try to reproduce the strange behaviour but it was in a cloud environment (google) and the time steps were writing over each other and the number of ""preserved"" time-steps varied with time. I suggest we use something closer to the original problem such as the tutorial dataset?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1159923690