issue_comments
17 rows where issue = 1159923690 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- `to_zarr` with append or region mode and `_FillValue` doesnt work · 17 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1064981526 | https://github.com/pydata/xarray/issues/6329#issuecomment-1064981526 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_elQW | d70-t 6574622 | 2022-03-11T10:28:35Z | 2022-03-11T10:28:35Z | CONTRIBUTOR | Thanks for pointing out |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1064973518 | https://github.com/pydata/xarray/issues/6329#issuecomment-1064973518 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_ejTO | Boorhin 9576982 | 2022-03-11T10:19:03Z | 2022-03-11T10:20:09Z | NONE |
I think so, except that it affects append and region methods not just append. Yes for the above case, it should work. I need to better test all this. Thanks |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1063977656 | https://github.com/pydata/xarray/issues/6329#issuecomment-1063977656 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_awK4 | d70-t 6574622 | 2022-03-10T11:56:44Z | 2022-03-10T11:56:44Z | CONTRIBUTOR | Yes, this is kind of the behaviour I'd expect. And great that it helped clarifying things. Still, building up the metadata nicely upfront (which is required for region writes) ist quite convoluted... That's what I meant with
in the previous comment. I think, establishing and documenting good practices for this would help, but probably we also want to have better tools. In any case, this would probably be yet another issue. Note that if you care about this paricular example (e.g. appending in a single thread in increasing order of timesteps), then it should also be possible to do this much simpler using append: ```python filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') ds.air.encoding['dtype']=np.dtype('float32') X,Y=250, 250 #size of each final timestep for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1)),X,Y) del arr_r.air.attrs["_FillValue"] if os.path.exists(filename): arr_r.to_zarr(filename, append_dim='time') else: arr_r.to_zarr(filename) ``` If you find out more about the cloud case, please post a note, otherwise, we can assume that the original bug report is fine? |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1063949669 | https://github.com/pydata/xarray/issues/6329#issuecomment-1063949669 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_apVl | Boorhin 9576982 | 2022-03-10T11:21:18Z | 2022-03-10T11:21:18Z | NONE | Ok, changing to I have found something that gives me satisfactory results. The reason why I have issues in the cloud, I still don't know, I am still investigating. Maybe it is unrelated. The following script kinds of keep the important stuff but still it is not very clean as some of the parameters are not included in the final file. I ended up doing the same kind of convoluted approach as I was making before. But hopefully that's helpful to someone looking for some sort of real-case example. Definitely clarified stuff in my head. ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np import dask.array as da def init_coord(ds, X,Y): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1)), X,Y) return arr_r.x.values, arr_r.y.values def some_processing(arr, X,Y):
''' A reprojection routine''' filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') ds.air.encoding['dtype']=np.dtype('float32') X,Y=250, 250 #size of each final timestep x,y=init_coord(ds, X,Y) dummy=da.zeros((len(ds.time.values), Y, X)) ds_to_write=xr.Dataset({'air':(('time','y','x'), dummy)}, coords={'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}) ds_to_write.to_zarr(filename, compute=False, encoding={"time": {"chunks": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1)),X,Y) buff= arr_r.drop(['spatial_ref','x','y']).chunk({'time':1,'x':X,'y':Y}) del buff.air.attrs["_FillValue"] buff.to_zarr(filename, mode='r+', region={'time':slice(i,i+1)}) ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1063859715 | https://github.com/pydata/xarray/issues/6329#issuecomment-1063859715 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_aTYD | d70-t 6574622 | 2022-03-10T09:44:59Z | 2022-03-10T09:44:59Z | CONTRIBUTOR | Sure, no problem. I believe, this page has a good summary:
So the difference between "a" and "r+" roughly codifies the intended behaviour for sequential access (it's ok to modify everything) and parallel access to independent chunks (where modifying metadata would be bad). So probably that message was suggesting that you have to use "a" if you want to modify metadata (e.g. by expanding the shape), which is true. But to me, it's unclear how one would do that safely with (potentially) parallel region writes, so it's kind of reasonable that region writes don't like to modify metadata. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1063851972 | https://github.com/pydata/xarray/issues/6329#issuecomment-1063851972 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_aRfE | Boorhin 9576982 | 2022-03-10T09:36:00Z | 2022-03-10T09:36:18Z | NONE | sorry that's a mistake. I think append was suggested at some point by one of the error message.
I cannot remember |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1062755678 | https://github.com/pydata/xarray/issues/6329#issuecomment-1062755678 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_WF1e | d70-t 6574622 | 2022-03-09T10:06:22Z | 2022-03-09T10:06:22Z | CONTRIBUTOR | Yes, that looks like the error as described in the initial post.
Adding the described workaround (i.e.
Which is due to a mix of append-mode (
Currently, I can't really imagine how a mix of both should behave. If you can't prepare the dataset for the final shape upfront (to use |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1062724755 | https://github.com/pydata/xarray/issues/6329#issuecomment-1062724755 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_V-ST | Boorhin 9576982 | 2022-03-09T09:30:42Z | 2022-03-09T09:30:42Z | NONE | OK, that is easy to change, now you have the exact same error message as for the appending. I have tried a lot of different ways and I am not getting anywhere with writing the data correctly in a store. ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np def init_coord(ds): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1))) return arr_r.x.values, arr_r.y.values def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) return arr_r filename='processed_dataset.zarr'
ds = xr.tutorial.open_dataset('air_temperature')
x,y=init_coord(ds)
ds_to_write=xr.Dataset(coords={'time':('time',ds.time.values),'x':('x', x),'y':('y',y)})
ds_to_write.to_zarr(filename, compute=False, encoding={"time": {"chunks": [1]}})
for i in range(len(ds.time)):
# some kind of heavy processing
arr_r=some_processing(ds.isel(time=slice(i,i+1)))
buff= arr_r.drop(['spatial_ref','x','y']).chunk({'time':1,'x':250,'y':250})
buff.air.encoding['dtype']=np.dtype('float32')
buff.to_zarr(filename, mode='a', region={'time':slice(i,i+1)})
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1061711069 | https://github.com/pydata/xarray/issues/6329#issuecomment-1061711069 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_SGzd | d70-t 6574622 | 2022-03-08T12:09:38Z | 2022-03-08T12:09:38Z | CONTRIBUTOR | You've got the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1061651626 | https://github.com/pydata/xarray/issues/6329#issuecomment-1061651626 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_R4Sq | Boorhin 9576982 | 2022-03-08T10:55:50Z | 2022-03-08T10:55:50Z | NONE | Ok sorry for the different mistakes, I wrote that in a hurry. Strangely enough this has a different behaviour but it crashes too. ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np def init_coord(ds): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1))) return arr_r.x.values, arr_r.y.values def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) return arr_r filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') x,y=init_coord(ds) ds_to_write=xr.Dataset(coords={'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}) ds_to_write.to_zarr(filename, compute=False, encoding={"time": {"chunks": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1))) buff= arr_r.drop(['spatial_ref','x','y']).chunk({'time':1,'x':250,'y':250}) buff.to_zarr(filename, mode='a', region={'time':slice(i,i+1)}) ``` With error:
but the output of buff is: ie. it contains only floats |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1061081884 | https://github.com/pydata/xarray/issues/6329#issuecomment-1061081884 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_PtMc | d70-t 6574622 | 2022-03-07T20:03:18Z | 2022-03-07T20:03:18Z | CONTRIBUTOR | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | ||
1060493852 | https://github.com/pydata/xarray/issues/6329#issuecomment-1060493852 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_Ndoc | Boorhin 9576982 | 2022-03-07T10:48:21Z | 2022-03-07T10:48:21Z | NONE | This will fail like append. just tried to make some kind of realistic example like reprojecting from a geographic to an orthogonal system. If you look at all the stages you need to go through... and still not sure this is working as it should ``` python import xarray as xr from rasterio.enums import Resampling import numpy as np def init_coord(ds): ''' To have the geometry right''' arr_r=some_processing(ds.isel(time=slice(0,1)) return arr_r.x.values, arr_r.y.values def some_processing(arr): ''' A reprojection routine''' arr = arr.rio.write_crs('EPSG:4326') arr_r = arr.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) return arr_r filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') x,y=init_coord(ds) ds_to_write=xr.Dataset({'coords':{'time':('time',ds.time.values),'x':('x', x),'y':('y',y)}}) ds_to_write.to_zarr(filename, compute =false, encoding={"time": {"chunks": [1]}}) for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1)) agg_r_t= agg_r.drop(['spatial_ref']).expand_dims({'time':[ds.time.values[i]]}) buff= xr.Dataset(({'air':agg_r_t}).chunk({'time':1,'x':250,'y':250}) buff.drop(['x','y']).to_zarr(filename, , region={'time':slice(i,i+1)}) ``` You would need to change the processing function to something like:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1059760613 | https://github.com/pydata/xarray/issues/6329#issuecomment-1059760613 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_Kqnl | Boorhin 9576982 | 2022-03-05T13:01:40Z | 2022-03-05T14:51:07Z | NONE | just to make clear what is weird
this is just a test to see if the regions were written to file and it seems that it did randomly and most likely overprinted regions on regions. I have no idea how that is possible. In theory everything should be written from i = 95 to 954. It could be in my code so I am checking again but that sounds unlikely without raising any error. I am just showing this so that you better understand what I am observing
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1059777476 | https://github.com/pydata/xarray/issues/6329#issuecomment-1059777476 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_KuvE | Boorhin 9576982 | 2022-03-05T14:48:58Z | 2022-03-05T14:48:58Z | NONE | I can confirm that it also fails with precomputing a dataset and fill regions with the same error
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1059754523 | https://github.com/pydata/xarray/issues/6329#issuecomment-1059754523 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_KpIb | Boorhin 9576982 | 2022-03-05T12:26:52Z | 2022-03-05T12:26:52Z | NONE | Sorry to add to the confusion I actually have had another kind of strange behaviour by deleting the fill_value with the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1059426353 | https://github.com/pydata/xarray/issues/6329#issuecomment-1059426353 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_JZAx | d70-t 6574622 | 2022-03-04T18:48:13Z | 2022-03-04T18:48:13Z | CONTRIBUTOR | If that's necessary to reproduce the problem, then yes. If it's possible to show the same thing with less "noise", then it's better to not use the tutorial dataset and to not use something like a cloud backend. But we can also try to iterate on this again, to progressively get down to a smaller example. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 | |
1059423718 | https://github.com/pydata/xarray/issues/6329#issuecomment-1059423718 | https://api.github.com/repos/pydata/xarray/issues/6329 | IC_kwDOAMm_X84_JYXm | Boorhin 9576982 | 2022-03-04T18:44:01Z | 2022-03-04T18:44:01Z | NONE | I will try to reproduce the strange behaviour but it was in a cloud environment (google) and the time steps were writing over each other and the number of "preserved" time-steps varied with time. I suggest we use something closer to the original problem such as the tutorial dataset? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
`to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 2