issue_comments
20 rows where author_association = "NONE" and user = 9576982 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: issue_url, created_at (date), updated_at (date)
user 1
- Boorhin · 20 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1064973518 | https://github.com/pydata/xarray/issues/6329#issuecomment-1064973518 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_ejTO | Boorhin 9576982 | 2022-03-11T10:19:03Z | 2022-03-11T10:20:09Z | NONE |
I think so, except that it affects append and region methods not just append. Yes for the above case, it should work. I need to better test all this. Thanks |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1063949669 | https://github.com/pydata/xarray/issues/6329#issuecomment-1063949669 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_apVl | Boorhin 9576982 | 2022-03-10T11:21:18Z | 2022-03-10T11:21:18Z | NONE | Ok, changing to I have found something that gives me satisfactory results. The reason why I have issues in the cloud, I still don't know, I am still investigating. Maybe it is unrelated. The following script kinds of keep the important stuff but still it is not very clean as some of the parameters are not included in the final file. I ended up doing the same kind of convoluted approach as I was making before. But hopefully that's helpful to someone looking for some sort of real-case example. Definitely clarified stuff in my head. ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np import dask.array as da def init_coord(ds, X,Y): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1)), X,Y) return arr_r.x.values, arr_r.y.values def some_processing(arr, X,Y):
''' A reprojection routine''' filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') ds.air.encoding['dtype']=np.dtype('float32') X,Y=250, 250 #size of each final timestep x,y=init_coord(ds, X,Y) dummy=da.zeros((len(ds.time.values), Y, X)) ds_to_write=xr.Dataset({'air':(('time','y','x'), dummy)}, coords={'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}) ds_to_write.to_zarr(filename, compute=False, encoding={"time": {"chunks": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1)),X,Y) buff= arr_r.drop(['spatial_ref','x','y']).chunk({'time':1,'x':X,'y':Y}) del buff.air.attrs["_FillValue"] buff.to_zarr(filename, mode='r+', region={'time':slice(i,i+1)}) ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1063851972 | https://github.com/pydata/xarray/issues/6329#issuecomment-1063851972 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_aRfE | Boorhin 9576982 | 2022-03-10T09:36:00Z | 2022-03-10T09:36:18Z | NONE | sorry that's a mistake. I think append was suggested at some point by one of the error message.
I cannot remember |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1062724755 | https://github.com/pydata/xarray/issues/6329#issuecomment-1062724755 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_V-ST | Boorhin 9576982 | 2022-03-09T09:30:42Z | 2022-03-09T09:30:42Z | NONE | OK, that is easy to change, now you have the exact same error message as for the appending. I have tried a lot of different ways and I am not getting anywhere with writing the data correctly in a store. ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np def init_coord(ds): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1))) return arr_r.x.values, arr_r.y.values def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) return arr_r filename='processed_dataset.zarr'
ds = xr.tutorial.open_dataset('air_temperature')
x,y=init_coord(ds)
ds_to_write=xr.Dataset(coords={'time':('time',ds.time.values),'x':('x', x),'y':('y',y)})
ds_to_write.to_zarr(filename, compute=False, encoding={"time": {"chunks": [1]}})
for i in range(len(ds.time)):
# some kind of heavy processing
arr_r=some_processing(ds.isel(time=slice(i,i+1)))
buff= arr_r.drop(['spatial_ref','x','y']).chunk({'time':1,'x':250,'y':250})
buff.air.encoding['dtype']=np.dtype('float32')
buff.to_zarr(filename, mode='a', region={'time':slice(i,i+1)})
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1061651626 | https://github.com/pydata/xarray/issues/6329#issuecomment-1061651626 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_R4Sq | Boorhin 9576982 | 2022-03-08T10:55:50Z | 2022-03-08T10:55:50Z | NONE | Ok sorry for the different mistakes, I wrote that in a hurry. Strangely enough this has a different behaviour but it crashes too. ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np def init_coord(ds): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1))) return arr_r.x.values, arr_r.y.values def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) return arr_r filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') x,y=init_coord(ds) ds_to_write=xr.Dataset(coords={'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}) ds_to_write.to_zarr(filename, compute=False, encoding={"time": {"chunks": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1))) buff= arr_r.drop(['spatial_ref','x','y']).chunk({'time':1,'x':250,'y':250}) buff.to_zarr(filename, mode='a', region={'time':slice(i,i+1)}) ``` With error:
but the output of buff is: ie. it contains only floats |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1060493852 | https://github.com/pydata/xarray/issues/6329#issuecomment-1060493852 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_Ndoc | Boorhin 9576982 | 2022-03-07T10:48:21Z | 2022-03-07T10:48:21Z | NONE | This will fail like append. just tried to make some kind of realistic example like reprojecting from a geographic to an orthogonal system. If you look at all the stages you need to go through... and still not sure this is working as it should ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np def init_coord(ds): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1)) return arr_r.x.values, arr_r.y.values def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) return arr_r filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') x,y=init_coord(ds) ds_to_write=xr.Dataset({'coords':{'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}}) ds_to_write.to_zarr(filename, compute =false, encoding={"time": {"chunks": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1)) agg_r_t= agg_r.drop(['spatial_ref']).expand_dims({'time':[ds.time.values[i]]}) buff= xr.Dataset(({'air':agg_r_t}).chunk({'time':1,'x':250,'y':250}) buff.drop(['x','y']).to_zarr(filename, , region={'time':slice(i,i+1)}) ``` You would need to change the processing function to something like:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1059760613 | https://github.com/pydata/xarray/issues/6329#issuecomment-1059760613 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_Kqnl | Boorhin 9576982 | 2022-03-05T13:01:40Z | 2022-03-05T14:51:07Z | NONE | just to make clear what is weird
this is just a test to see if the regions were written to file and it seems that it did randomly and most likely overprinted regions on regions. I have no idea how that is possible. In theory everything should be written from i = 95 to 954. It could be in my code so I am checking again but that sounds unlikely without raising any error. I am just showing this so that you better understand what I am observing
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1059777476 | https://github.com/pydata/xarray/issues/6329#issuecomment-1059777476 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_KuvE | Boorhin 9576982 | 2022-03-05T14:48:58Z | 2022-03-05T14:48:58Z | NONE | I can confirm that it also fails with precomputing a dataset and fill regions with the same error
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1059754523 | https://github.com/pydata/xarray/issues/6329#issuecomment-1059754523 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_KpIb | Boorhin 9576982 | 2022-03-05T12:26:52Z | 2022-03-05T12:26:52Z | NONE | Sorry to add to the confusion I actually have had another kind of strange behaviour by deleting the fill_value with the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1059423718 | https://github.com/pydata/xarray/issues/6329#issuecomment-1059423718 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_JYXm | Boorhin 9576982 | 2022-03-04T18:44:01Z | 2022-03-04T18:44:01Z | NONE | I will try to reproduce the strange behaviour but it was in a cloud environment (google) and the time steps were writing over each other and the number of "preserved" time-steps varied with time. I suggest we use something closer to the original problem such as the tutorial dataset? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1059400265 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059400265 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_JSpJ | Boorhin 9576982 | 2022-03-04T18:09:44Z | 2022-03-04T18:10:49Z | NONE | @d70-t we can try to branch it to the CF related issue yes.
The |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059274384 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059274384 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_Iz6Q | Boorhin 9576982 | 2022-03-04T15:42:36Z | 2022-03-04T15:42:36Z | NONE | I have tried to specify the chunk before writing the dataset and I have had some really strange behaviour with data written into the same chunks, the time dimension never went over 5, growing and reducing through the processing... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059121536 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059121536 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_IOmA | Boorhin 9576982 | 2022-03-04T12:30:01Z | 2022-03-04T12:30:01Z | NONE | Effectively I have unstable results with sometimes errors with timesteps refusing to write
I systematically have this warning
``` python /tmp/ipykernel_1629/1269180709.py in aggregate_with_time(farm_name, resolution_M, canvas, W, H, master_raster_coordinates) 39 raster.drop( 40 ['x','y']).to_zarr( ---> 41 uri, mode='a', append_dim='time') 42 #except: 43 #print('something went wrong') /opt/conda/lib/python3.7/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 2048 append_dim=append_dim, 2049 region=region, -> 2050 safe_chunks=safe_chunks, 2051 ) 2052 /opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 1406 _validate_datatypes_for_zarr_append(dataset) 1407 if append_dim is not None: -> 1408 existing_dims = zstore.get_dimensions() 1409 if append_dim not in existing_dims: 1410 raise ValueError( /opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in get_dimensions(self) 450 if d in dimensions and dimensions[d] != s: 451 raise ValueError( --> 452 f"found conflicting lengths for dimension {d} " 453 f"({s} != {dimensions[d]})" 454 ) ValueError: found conflicting lengths for dimension time (2 != 1) ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059078276 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059078276 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_IECE | Boorhin 9576982 | 2022-03-04T11:26:04Z | 2022-03-04T11:26:04Z | NONE | In my case I specify _fillvalue in the reprojection so I would not think this is an issue to overwrite it. I just don't know how to do it |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059052257 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059052257 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_H9rh | Boorhin 9576982 | 2022-03-04T10:50:09Z | 2022-03-04T10:50:09Z | NONE | OK that's not exactly the same error message, I could not even start the appending. But that's basically one example that could be tested. A model would want to compute each of these variables step by step and variable by variable and save them for each single iteration. There is no need of concurrent writing as most of the resources are focused on the modelling.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059022639 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059022639 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_H2cv | Boorhin 9576982 | 2022-03-04T10:10:08Z | 2022-03-04T10:10:08Z | NONE | The _FillValue is always the same (np.nan) and specified when I reproject with rioxarray. so I don't understand the first error then. The thing is that the _fillvalue is attached to a variable not the whole dataset. But it never change. Not too sure what to do |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1058323632 | https://github.com/pydata/xarray/issues/6069#issuecomment-1058323632 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_FLyw | Boorhin 9576982 | 2022-03-03T17:54:27Z | 2022-03-03T17:54:27Z | NONE | I did make
ds.attrs={}
but at each appending I get a warning
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1058315108 | https://github.com/pydata/xarray/issues/6069#issuecomment-1058315108 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_FJtk | Boorhin 9576982 | 2022-03-03T17:45:15Z | 2022-03-03T17:45:15Z | NONE | I have looked at these examples and I still don't manage to make it work in the real world.
I find append the most logical but I have attributes attached to a dataset that I don't seem to be able to drop before appending. This generates this error:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1034678675 | https://github.com/pydata/xarray/issues/6069#issuecomment-1034678675 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X849q_GT | Boorhin 9576982 | 2022-02-10T09:18:47Z | 2022-02-10T09:18:47Z | NONE | If Xarray/zarr is to replace netcdf, appending by time step is really an important feature
Most (all?) numerical models will output results per time step onto a multidimensional grid with different variables
Said grid will also have other parameters that will help rebuild the geometry or follow standards, like CF and Ugrid (The things that you are supposed to drop). The geometry of the grid is computed at the initialisation of the model. It is a bit counter intuitive to get rid of it for incremental backups especially that each write will not concern this part of the file.
What I do at the moment is that I create a first dataset at the final dimension based on dummy dask arrays
Export it With a buffer system, I create a new dataset for each buffer with the right data at the right place meaning only the time interval concerned and I write At the end I write all the parameters before closing the main dataset. To my knowledge, that's the only method which works. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1032480933 | https://github.com/pydata/xarray/issues/6069#issuecomment-1032480933 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X849imil | Boorhin 9576982 | 2022-02-08T11:01:21Z | 2022-02-08T11:01:21Z | NONE | I don't get the second crash. It is not true that these variables are not in common, they are the coordinates of each of the variables. They are all made the same. This is a typical example of an unstructured grid backup. Meanwhile I found an alternate solution which is also better for memory management. I think the documentation example doesn't actually work. I will try to formulate my trick but that's not using this particular method of region that is not functioning as it should in my opinion. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
issue 2