home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

20 rows where author_association = "NONE" and user = 9576982 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 2

  • to_zarr: region not recognised as dataset dimensions 10
  • `to_zarr` with append or region mode and `_FillValue` doesnt work 10

user 1

  • Boorhin · 20 ✖

author_association 1

  • NONE · 20 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1064973518 https://github.com/pydata/xarray/issues/6329#issuecomment-1064973518 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_ejTO Boorhin 9576982 2022-03-11T10:19:03Z 2022-03-11T10:20:09Z NONE

If you find out more about the cloud case, please post a note, otherwise, we can assume that the original bug report is fine?

I think so, except that it affects append and region methods not just append. Yes for the above case, it should work. I need to better test all this. Thanks

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1063949669 https://github.com/pydata/xarray/issues/6329#issuecomment-1063949669 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_apVl Boorhin 9576982 2022-03-10T11:21:18Z 2022-03-10T11:21:18Z NONE

Ok, changing to 'r+' leads to the error suggesting to use 'a' ValueError: dataset contains non-pre-existing variables ['air'], which is not allowed in ``xarray.Dataset.to_zarr()`` with mode='r+'. To allow writing new variables, set mode='a'.

I have found something that gives me satisfactory results. The reason why I have issues in the cloud, I still don't know, I am still investigating. Maybe it is unrelated. The following script kinds of keep the important stuff but still it is not very clean as some of the parameters are not included in the final file. I ended up doing the same kind of convoluted approach as I was making before. But hopefully that's helpful to someone looking for some sort of real-case example. Definitely clarified stuff in my head.

``` python import xarray as xr from rasterio.enums import Resampling import numpy as np import dask.array as da

def init_coord(ds, X,Y): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1)), X,Y) return arr_r.x.values, arr_r.y.values

def some_processing(arr, X,Y): ''' A reprojection routine'''
arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(Y,X), resampling=Resampling.bilinear, nodata=np.nan) return arr_r

filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') ds.air.encoding['dtype']=np.dtype('float32') X,Y=250, 250 #size of each final timestep x,y=init_coord(ds, X,Y) dummy=da.zeros((len(ds.time.values), Y, X)) ds_to_write=xr.Dataset({'air':(('time','y','x'), dummy)}, coords={'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}) ds_to_write.to_zarr(filename, compute=False, encoding={"time": {"chunks": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1)),X,Y) buff= arr_r.drop(['spatial_ref','x','y']).chunk({'time':1,'x':X,'y':Y}) del buff.air.attrs["_FillValue"] buff.to_zarr(filename, mode='r+', region={'time':slice(i,i+1)}) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1063851972 https://github.com/pydata/xarray/issues/6329#issuecomment-1063851972 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_aRfE Boorhin 9576982 2022-03-10T09:36:00Z 2022-03-10T09:36:18Z NONE

sorry that's a mistake. I think append was suggested at some point by one of the error message. I cannot remember 'r+' being described into the doc of xarray. Would you mind detailing what it does? Cheers

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1062724755 https://github.com/pydata/xarray/issues/6329#issuecomment-1062724755 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_V-ST Boorhin 9576982 2022-03-09T09:30:42Z 2022-03-09T09:30:42Z NONE

OK, that is easy to change, now you have the exact same error message as for the appending. I have tried a lot of different ways and I am not getting anywhere with writing the data correctly in a store. ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np

def init_coord(ds): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1))) return arr_r.x.values, arr_r.y.values

def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) return arr_r

filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') x,y=init_coord(ds) ds_to_write=xr.Dataset(coords={'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}) ds_to_write.to_zarr(filename, compute=False, encoding={"time": {"chunks": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1))) buff= arr_r.drop(['spatial_ref','x','y']).chunk({'time':1,'x':250,'y':250}) buff.air.encoding['dtype']=np.dtype('float32') buff.to_zarr(filename, mode='a', region={'time':slice(i,i+1)}) ``ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.`

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1061651626 https://github.com/pydata/xarray/issues/6329#issuecomment-1061651626 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_R4Sq Boorhin 9576982 2022-03-08T10:55:50Z 2022-03-08T10:55:50Z NONE

Ok sorry for the different mistakes, I wrote that in a hurry. Strangely enough this has a different behaviour but it crashes too. ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np

def init_coord(ds): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1))) return arr_r.x.values, arr_r.y.values

def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) return arr_r

filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') x,y=init_coord(ds) ds_to_write=xr.Dataset(coords={'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}) ds_to_write.to_zarr(filename, compute=False, encoding={"time": {"chunks": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1))) buff= arr_r.drop(['spatial_ref','x','y']).chunk({'time':1,'x':250,'y':250}) buff.to_zarr(filename, mode='a', region={'time':slice(i,i+1)}) ```

With error:

ValueError: fill_value nan is not valid for dtype int16; nested exception: cannot convert float NaN to integer

but the output of buff is:

ie. it contains only floats

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1060493852 https://github.com/pydata/xarray/issues/6329#issuecomment-1060493852 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_Ndoc Boorhin 9576982 2022-03-07T10:48:21Z 2022-03-07T10:48:21Z NONE

This will fail like append. just tried to make some kind of realistic example like reprojecting from a geographic to an orthogonal system. If you look at all the stages you need to go through... and still not sure this is working as it should

``` python import xarray as xr from rasterio.enums import Resampling import numpy as np

def init_coord(ds): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1)) return arr_r.x.values, arr_r.y.values

def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) return arr_r

filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') x,y=init_coord(ds) ds_to_write=xr.Dataset({'coords':{'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}}) ds_to_write.to_zarr(filename, compute =false, encoding={"time": {"chunks": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1)) agg_r_t= agg_r.drop(['spatial_ref']).expand_dims({'time':[ds.time.values[i]]}) buff= xr.Dataset(({'air':agg_r_t}).chunk({'time':1,'x':250,'y':250}) buff.drop(['x','y']).to_zarr(filename, , region={'time':slice(i,i+1)}) ``` You would need to change the processing function to something like:

python def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) del arr_r.attrs["_FillValue"] return arr_r Sorry maybe I am repetitive but I want to be sure that it is clearly illustrated. I have done another test on the cloud, checking the values at the moment.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1059760613 https://github.com/pydata/xarray/issues/6329#issuecomment-1059760613 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_Kqnl Boorhin 9576982 2022-03-05T13:01:40Z 2022-03-05T14:51:07Z NONE

just to make clear what is weird this is just a test to see if the regions were written to file and it seems that it did randomly and most likely overprinted regions on regions. I have no idea how that is possible. In theory everything should be written from i = 95 to 954. It could be in my code so I am checking again but that sounds unlikely without raising any error. I am just showing this so that you better understand what I am observing Just to say that I had all the timesteps written in theory as I print a confirmation message at each iteration

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1059777476 https://github.com/pydata/xarray/issues/6329#issuecomment-1059777476 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_KuvE Boorhin 9576982 2022-03-05T14:48:58Z 2022-03-05T14:48:58Z NONE

I can confirm that it also fails with precomputing a dataset and fill regions with the same error

ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1059754523 https://github.com/pydata/xarray/issues/6329#issuecomment-1059754523 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_KpIb Boorhin 9576982 2022-03-05T12:26:52Z 2022-03-05T12:26:52Z NONE

Sorry to add to the confusion I actually have had another kind of strange behaviour by deleting the fill_value with the region method. I thought the run worked but it didn't. I am investigating...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1059423718 https://github.com/pydata/xarray/issues/6329#issuecomment-1059423718 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_JYXm Boorhin 9576982 2022-03-04T18:44:01Z 2022-03-04T18:44:01Z NONE

I will try to reproduce the strange behaviour but it was in a cloud environment (google) and the time steps were writing over each other and the number of "preserved" time-steps varied with time. I suggest we use something closer to the original problem such as the tutorial dataset?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1059400265 https://github.com/pydata/xarray/issues/6069#issuecomment-1059400265 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X84_JSpJ Boorhin 9576982 2022-03-04T18:09:44Z 2022-03-04T18:10:49Z NONE

@d70-t we can try to branch it to the CF related issue yes. The del method is the one I tried and when doing it on my files I had very weird things happening so I would not recommend it as a proper workaround. as I wrote before it was not appending the file as it should have. I have now a run functioning with the region method but I had to simulate my whole file which was a bit challenging and is actually pretty easy to break as I need to use the geometry of a single variable to generate my temporal and spatial coordinates for the whole archive. Going through the whole variables is a bit of a no-go. The initialisation with both methods is really a challenge I find.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208
1059274384 https://github.com/pydata/xarray/issues/6069#issuecomment-1059274384 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X84_Iz6Q Boorhin 9576982 2022-03-04T15:42:36Z 2022-03-04T15:42:36Z NONE

I have tried to specify the chunk before writing the dataset and I have had some really strange behaviour with data written into the same chunks, the time dimension never went over 5, growing and reducing through the processing...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208
1059121536 https://github.com/pydata/xarray/issues/6069#issuecomment-1059121536 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X84_IOmA Boorhin 9576982 2022-03-04T12:30:01Z 2022-03-04T12:30:01Z NONE

Effectively I have unstable results with sometimes errors with timesteps refusing to write I systematically have this warning python /opt/conda/lib/python3.7/site-packages/xarray/core/dataset.py:2050: SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs safe_chunks=safe_chunks, the crashes are related to dimension of time itself but time is always of size 1, so it is hard to understand

``` python /tmp/ipykernel_1629/1269180709.py in aggregate_with_time(farm_name, resolution_M, canvas, W, H, master_raster_coordinates) 39 raster.drop( 40 ['x','y']).to_zarr( ---> 41 uri, mode='a', append_dim='time') 42 #except: 43 #print('something went wrong')

/opt/conda/lib/python3.7/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 2048 append_dim=append_dim, 2049 region=region, -> 2050 safe_chunks=safe_chunks, 2051 ) 2052

/opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 1406 _validate_datatypes_for_zarr_append(dataset) 1407 if append_dim is not None: -> 1408 existing_dims = zstore.get_dimensions() 1409 if append_dim not in existing_dims: 1410 raise ValueError(

/opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in get_dimensions(self) 450 if d in dimensions and dimensions[d] != s: 451 raise ValueError( --> 452 f"found conflicting lengths for dimension {d} " 453 f"({s} != {dimensions[d]})" 454 )

ValueError: found conflicting lengths for dimension time (2 != 1) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208
1059078276 https://github.com/pydata/xarray/issues/6069#issuecomment-1059078276 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X84_IECE Boorhin 9576982 2022-03-04T11:26:04Z 2022-03-04T11:26:04Z NONE

In my case I specify _fillvalue in the reprojection so I would not think this is an issue to overwrite it. I just don't know how to do it

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208
1059052257 https://github.com/pydata/xarray/issues/6069#issuecomment-1059052257 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X84_H9rh Boorhin 9576982 2022-03-04T10:50:09Z 2022-03-04T10:50:09Z NONE

OK that's not exactly the same error message, I could not even start the appending. But that's basically one example that could be tested. A model would want to compute each of these variables step by step and variable by variable and save them for each single iteration. There is no need of concurrent writing as most of the resources are focused on the modelling.

python import xarray as xr from rasterio.enums import Resampling import numpy as np ds = xr.tutorial.open_dataset('air_temperature').isel(time=0) ds = ds.rio.write_crs('EPSG:4326') dst = ds.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) dst.to_zarr('test.zarr') Returns


ValueError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/zarr/util.py in normalize_fill_value(fill_value, dtype) 277 else: --> 278 fill_value = np.array(fill_value, dtype=dtype)[()] 279

ValueError: cannot convert float NaN to integer

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) /tmp/ipykernel_2604/3259577033.py in <module> ----> 1 dst.to_zarr('test.zarr')

/opt/conda/lib/python3.7/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 2048 append_dim=append_dim, 2049 region=region, -> 2050 safe_chunks=safe_chunks, 2051 ) 2052

/opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 1429 writer = ArrayWriter() 1430 # TODO: figure out how to properly handle unlimited_dims -> 1431 dump_to_store(dataset, zstore, writer, encoding=encoding) 1432 writes = writer.sync(compute=compute) 1433

/opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1117 variables, attrs = encoder(variables, attrs) 1118 -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1120 1121

/opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 549 550 self.set_variables( --> 551 variables_encoded, check_encoding_set, writer, unlimited_dims=unlimited_dims 552 ) 553 if self._consolidate_on_close:

/opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 607 dtype = str 608 zarr_array = self.zarr_group.create( --> 609 name, shape=shape, dtype=dtype, fill_value=fill_value, **encoding 610 ) 611 zarr_array.attrs.put(encoded_attrs)

/opt/conda/lib/python3.7/site-packages/zarr/hierarchy.py in create(self, name, kwargs) 889 """Create an array. Keyword arguments as per 890 :func:zarr.creation.create.""" --> 891 return self._write_op(self._create_nosync, name, kwargs) 892 893 def _create_nosync(self, name, **kwargs):

/opt/conda/lib/python3.7/site-packages/zarr/hierarchy.py in _write_op(self, f, args, kwargs) 659 660 with lock: --> 661 return f(args, **kwargs) 662 663 def create_group(self, name, overwrite=False):

/opt/conda/lib/python3.7/site-packages/zarr/hierarchy.py in _create_nosync(self, name, kwargs) 896 kwargs.setdefault('cache_attrs', self.attrs.cache) 897 return create(store=self._store, path=path, chunk_store=self._chunk_store, --> 898 kwargs) 899 900 def empty(self, name, **kwargs):

/opt/conda/lib/python3.7/site-packages/zarr/creation.py in create(shape, chunks, dtype, compressor, fill_value, order, store, synchronizer, overwrite, path, chunk_store, filters, cache_metadata, cache_attrs, read_only, object_codec, dimension_separator, **kwargs) 139 fill_value=fill_value, order=order, overwrite=overwrite, path=path, 140 chunk_store=chunk_store, filters=filters, object_codec=object_codec, --> 141 dimension_separator=dimension_separator) 142 143 # instantiate array

/opt/conda/lib/python3.7/site-packages/zarr/storage.py in init_array(store, shape, chunks, dtype, compressor, fill_value, order, overwrite, path, chunk_store, filters, object_codec, dimension_separator) 356 chunk_store=chunk_store, filters=filters, 357 object_codec=object_codec, --> 358 dimension_separator=dimension_separator) 359 360

/opt/conda/lib/python3.7/site-packages/zarr/storage.py in _init_array_metadata(store, shape, chunks, dtype, compressor, fill_value, order, overwrite, path, chunk_store, filters, object_codec, dimension_separator) 392 chunks = normalize_chunks(chunks, shape, dtype.itemsize) 393 order = normalize_order(order) --> 394 fill_value = normalize_fill_value(fill_value, dtype) 395 396 # optional array metadata

/opt/conda/lib/python3.7/site-packages/zarr/util.py in normalize_fill_value(fill_value, dtype) 281 # re-raise with our own error message to be helpful 282 raise ValueError('fill_value {!r} is not valid for dtype {}; nested ' --> 283 'exception: {}'.format(fill_value, dtype, e)) 284 285 return fill_value

ValueError: fill_value nan is not valid for dtype int16; nested exception: cannot convert float NaN to integer

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208
1059022639 https://github.com/pydata/xarray/issues/6069#issuecomment-1059022639 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X84_H2cv Boorhin 9576982 2022-03-04T10:10:08Z 2022-03-04T10:10:08Z NONE

The _FillValue is always the same (np.nan) and specified when I reproject with rioxarray. so I don't understand the first error then. The thing is that the _fillvalue is attached to a variable not the whole dataset. But it never change. Not too sure what to do

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208
1058323632 https://github.com/pydata/xarray/issues/6069#issuecomment-1058323632 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X84_FLyw Boorhin 9576982 2022-03-03T17:54:27Z 2022-03-03T17:54:27Z NONE

I did make ds.attrs={} but at each appending I get a warning /opt/conda/lib/python3.7/site-packages/xarray/core/dataset.py:2050: SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs safe_chunks=safe_chunks,

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208
1058315108 https://github.com/pydata/xarray/issues/6069#issuecomment-1058315108 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X84_FJtk Boorhin 9576982 2022-03-03T17:45:15Z 2022-03-03T17:45:15Z NONE

I have looked at these examples and I still don't manage to make it work in the real world. I find append the most logical but I have attributes attached to a dataset that I don't seem to be able to drop before appending. This generates this error: ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually. However, I cannot find a way of getting rid of this attribute

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208
1034678675 https://github.com/pydata/xarray/issues/6069#issuecomment-1034678675 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X849q_GT Boorhin 9576982 2022-02-10T09:18:47Z 2022-02-10T09:18:47Z NONE

If Xarray/zarr is to replace netcdf, appending by time step is really an important feature Most (all?) numerical models will output results per time step onto a multidimensional grid with different variables Said grid will also have other parameters that will help rebuild the geometry or follow standards, like CF and Ugrid (The things that you are supposed to drop). The geometry of the grid is computed at the initialisation of the model. It is a bit counter intuitive to get rid of it for incremental backups especially that each write will not concern this part of the file. What I do at the moment is that I create a first dataset at the final dimension based on dummy dask arrays Export it to_zarr withcompute = False

With a buffer system, I create a new dataset for each buffer with the right data at the right place meaning only the time interval concerned and I write to_zarr with the region attribute I flush the buffer dataset after it being written.

At the end I write all the parameters before closing the main dataset.

To my knowledge, that's the only method which works.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208
1032480933 https://github.com/pydata/xarray/issues/6069#issuecomment-1032480933 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X849imil Boorhin 9576982 2022-02-08T11:01:21Z 2022-02-08T11:01:21Z NONE

I don't get the second crash. It is not true that these variables are not in common, they are the coordinates of each of the variables. They are all made the same. This is a typical example of an unstructured grid backup. Meanwhile I found an alternate solution which is also better for memory management. I think the documentation example doesn't actually work. I will try to formulate my trick but that's not using this particular method of region that is not functioning as it should in my opinion.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.317ms · About: xarray-datasette