issue_comments
23 rows where issue = 1077079208 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- to_zarr: region not recognised as dataset dimensions · 23 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1059405550 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059405550 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_JT7u | d70-t 6574622 | 2022-03-04T18:16:57Z | 2022-03-04T18:16:57Z | CONTRIBUTOR | I'll set up a new issue. @Boorhin, I couldn't confirm the weirdness with the small example, but will put in a note to your comment. If you can reproduce the weirdness on the minimal example, would you make a comment to the new issue? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059403646 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059403646 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_JTd- | dcherian 2448579 | 2022-03-04T18:14:18Z | 2022-03-04T18:14:18Z | MEMBER | :+1: to creating a new issue with your minimal example (I think we're just missing a check whether the Dataset and on-disk fill values are equal). It did seem like there were two issues mixed up here. Thanks for confirming that. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059400265 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059400265 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_JSpJ | Boorhin 9576982 | 2022-03-04T18:09:44Z | 2022-03-04T18:10:49Z | NONE | @d70-t we can try to branch it to the CF related issue yes.
The |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059378287 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059378287 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_JNRv | d70-t 6574622 | 2022-03-04T17:39:24Z | 2022-03-04T17:39:24Z | CONTRIBUTOR | I've made a simpler example of the The workaround:
@dcherian, @Boorhin should we make a new (CF-related) issue out of this and try to keep focussing on append and region use-cases here, which seemed to be the initial problem in this thread (probably by going further through your example @Boorhin?). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059274384 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059274384 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_Iz6Q | Boorhin 9576982 | 2022-03-04T15:42:36Z | 2022-03-04T15:42:36Z | NONE | I have tried to specify the chunk before writing the dataset and I have had some really strange behaviour with data written into the same chunks, the time dimension never went over 5, growing and reducing through the processing... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059121536 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059121536 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_IOmA | Boorhin 9576982 | 2022-03-04T12:30:01Z | 2022-03-04T12:30:01Z | NONE | Effectively I have unstable results with sometimes errors with timesteps refusing to write
I systematically have this warning
``` python /tmp/ipykernel_1629/1269180709.py in aggregate_with_time(farm_name, resolution_M, canvas, W, H, master_raster_coordinates) 39 raster.drop( 40 ['x','y']).to_zarr( ---> 41 uri, mode='a', append_dim='time') 42 #except: 43 #print('something went wrong') /opt/conda/lib/python3.7/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 2048 append_dim=append_dim, 2049 region=region, -> 2050 safe_chunks=safe_chunks, 2051 ) 2052 /opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 1406 _validate_datatypes_for_zarr_append(dataset) 1407 if append_dim is not None: -> 1408 existing_dims = zstore.get_dimensions() 1409 if append_dim not in existing_dims: 1410 raise ValueError( /opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in get_dimensions(self) 450 if d in dimensions and dimensions[d] != s: 451 raise ValueError( --> 452 f"found conflicting lengths for dimension {d} " 453 f"({s} != {dimensions[d]})" 454 ) ValueError: found conflicting lengths for dimension time (2 != 1) ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059078961 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059078961 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_IEMx | d70-t 6574622 | 2022-03-04T11:27:12Z | 2022-03-04T11:27:44Z | CONTRIBUTOR | btw, as a work-around it works when removing the
But still, this might call for another issue to solve. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059078276 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059078276 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_IECE | Boorhin 9576982 | 2022-03-04T11:26:04Z | 2022-03-04T11:26:04Z | NONE | In my case I specify _fillvalue in the reprojection so I would not think this is an issue to overwrite it. I just don't know how to do it |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059076885 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059076885 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_IDsV | d70-t 6574622 | 2022-03-04T11:23:56Z | 2022-03-04T11:23:56Z | CONTRIBUTOR | Ok, I believe, I've now reproduced your error: ```python import xarray as xr from rasterio.enums import Resampling import numpy as np ds = xr.tutorial.open_dataset('air_temperature').isel(time=0) ds = ds.rio.write_crs('EPSG:4326') dst = ds.rio.reproject('EPSG:3857', shape=(250, 250), resampling=Resampling.bilinear, nodata=np.nan) dst.air.encoding = {} dst = dst.assign(air=dst.air.expand_dims("time"), time=dst.time.expand_dims("time")) m = {}
dst.to_zarr(m)
dst.to_zarr(m, append_dim="time")
This seems to be due to handling of CF-Conventions which might go wrong in the append case: the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059063397 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059063397 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_IAZl | d70-t 6574622 | 2022-03-04T11:05:07Z | 2022-03-04T11:05:07Z | CONTRIBUTOR | This error ist unrelated to region or append writes. The dataset
but still carries encoding-information from
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059052257 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059052257 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_H9rh | Boorhin 9576982 | 2022-03-04T10:50:09Z | 2022-03-04T10:50:09Z | NONE | OK that's not exactly the same error message, I could not even start the appending. But that's basically one example that could be tested. A model would want to compute each of these variables step by step and variable by variable and save them for each single iteration. There is no need of concurrent writing as most of the resources are focused on the modelling.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059025444 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059025444 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_H3Ik | d70-t 6574622 | 2022-03-04T10:13:40Z | 2022-03-04T10:13:40Z | CONTRIBUTOR | 🤷 can't help any further without a minimal reproducible example here... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1059022639 | https://github.com/pydata/xarray/issues/6069#issuecomment-1059022639 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_H2cv | Boorhin 9576982 | 2022-03-04T10:10:08Z | 2022-03-04T10:10:08Z | NONE | The _FillValue is always the same (np.nan) and specified when I reproject with rioxarray. so I don't understand the first error then. The thing is that the _fillvalue is attached to a variable not the whole dataset. But it never change. Not too sure what to do |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1058381922 | https://github.com/pydata/xarray/issues/6069#issuecomment-1058381922 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_FaBi | d70-t 6574622 | 2022-03-03T18:56:13Z | 2022-03-03T18:56:13Z | CONTRIBUTOR | I don't yet know a proper answer, but there'd be three observations I have:
* The |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1058323632 | https://github.com/pydata/xarray/issues/6069#issuecomment-1058323632 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_FLyw | Boorhin 9576982 | 2022-03-03T17:54:27Z | 2022-03-03T17:54:27Z | NONE | I did make
ds.attrs={}
but at each appending I get a warning
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1058315108 | https://github.com/pydata/xarray/issues/6069#issuecomment-1058315108 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84_FJtk | Boorhin 9576982 | 2022-03-03T17:45:15Z | 2022-03-03T17:45:15Z | NONE | I have looked at these examples and I still don't manage to make it work in the real world.
I find append the most logical but I have attributes attached to a dataset that I don't seem to be able to drop before appending. This generates this error:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1052252098 | https://github.com/pydata/xarray/issues/6069#issuecomment-1052252098 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84-uBfC | d70-t 6574622 | 2022-02-26T16:07:56Z | 2022-02-26T16:07:56Z | CONTRIBUTOR | While testing a bit further, I found another case which might potentially be dangerous: ```python ds is the same as above, but chunksize is {"time": 1, "x": 1}once on the coordinatords.to_zarr("test.zarr", compute=False, encoding={"time": {"chunks": [1]}, "x": {"chunks": [1]}}) in parallelds.isel(time=slice(0,1), x=slice(0,1)).to_zarr("test.zarr", mode="r+", region={"time": slice(0,1), "x": slice(0,1)}) ds.isel(time=slice(0,1), x=slice(1,2)).to_zarr("test.zarr", mode="r+", region={"time": slice(0,1), "x": slice(1,2)}) ds.isel(time=slice(0,1), x=slice(2,3)).to_zarr("test.zarr", mode="r+", region={"time": slice(0,1), "x": slice(2,3)}) ds.isel(time=slice(1,2), x=slice(0,1)).to_zarr("test.zarr", mode="r+", region={"time": slice(1,2), "x": slice(0,1)}) ds.isel(time=slice(1,2), x=slice(1,2)).to_zarr("test.zarr", mode="r+", region={"time": slice(1,2), "x": slice(1,2)}) ds.isel(time=slice(1,2), x=slice(2,3)).to_zarr("test.zarr", mode="r+", region={"time": slice(1,2), "x": slice(2,3)}) ``` This example doesn't produce any error, but the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1052240616 | https://github.com/pydata/xarray/issues/6069#issuecomment-1052240616 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X84-t-ro | d70-t 6574622 | 2022-02-26T15:58:48Z | 2022-02-26T15:58:48Z | CONTRIBUTOR | I'm trying to picture some usage scenarios based on incrementally adding timesteps to data on store. I hope these might help to answer questions from above. In particular, I think that I'll use the following dataset for demonstration code:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1034678675 | https://github.com/pydata/xarray/issues/6069#issuecomment-1034678675 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X849q_GT | Boorhin 9576982 | 2022-02-10T09:18:47Z | 2022-02-10T09:18:47Z | NONE | If Xarray/zarr is to replace netcdf, appending by time step is really an important feature
Most (all?) numerical models will output results per time step onto a multidimensional grid with different variables
Said grid will also have other parameters that will help rebuild the geometry or follow standards, like CF and Ugrid (The things that you are supposed to drop). The geometry of the grid is computed at the initialisation of the model. It is a bit counter intuitive to get rid of it for incremental backups especially that each write will not concern this part of the file.
What I do at the moment is that I create a first dataset at the final dimension based on dummy dask arrays
Export it With a buffer system, I create a new dataset for each buffer with the right data at the right place meaning only the time interval concerned and I write At the end I write all the parameters before closing the main dataset. To my knowledge, that's the only method which works. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1034196986 | https://github.com/pydata/xarray/issues/6069#issuecomment-1034196986 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X849pJf6 | shoyer 1217238 | 2022-02-09T21:12:31Z | 2022-02-09T21:12:31Z | MEMBER | The reason why this isn't allowed is because it's ambiguous what to do with the other variables that are not restricted to the region (['cell', 'face', 'layer', 'max_cell_node', 'max_face_nodes', 'node', 'siglay'] in this case). I can imagine quite a few different ways this behavior could be implemented:
I believe your proposal here (removing these checks from (4) seems like perhaps the most user-friendly option, but checking existing variables can add significant overhead. When experimenting adding The current solution is not to do any of these, and to force the user to make an explicit choice by dropping new variables, or write them in a separate call to |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1033814820 | https://github.com/pydata/xarray/issues/6069#issuecomment-1033814820 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X849nsMk | observingClouds 43613877 | 2022-02-09T14:23:54Z | 2022-02-09T14:36:48Z | CONTRIBUTOR | You are right, the coordinates should not be dropped. I think the function _validate_region has a bug. Currently it checks for all Changing the function to
``` seems to work. |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 1, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1032480933 | https://github.com/pydata/xarray/issues/6069#issuecomment-1032480933 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X849imil | Boorhin 9576982 | 2022-02-08T11:01:21Z | 2022-02-08T11:01:21Z | NONE | I don't get the second crash. It is not true that these variables are not in common, they are the coordinates of each of the variables. They are all made the same. This is a typical example of an unstructured grid backup. Meanwhile I found an alternate solution which is also better for memory management. I think the documentation example doesn't actually work. I will try to formulate my trick but that's not using this particular method of region that is not functioning as it should in my opinion. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 | |
1031773761 | https://github.com/pydata/xarray/issues/6069#issuecomment-1031773761 | https://api.github.com/repos/pydata/xarray/issues/6069 | IC_kwDOAMm_X849f55B | observingClouds 43613877 | 2022-02-07T18:19:08Z | 2022-02-07T18:19:08Z | CONTRIBUTOR | Hi @Boorhin,
I just ran into the same issue. The
This leads however to another issue: ```python ValueError Traceback (most recent call last) <ipython-input-52-bb3d2c1adc12> in <module> 18 for var in varnames: 19 ds[var].isel(time=slice(t)).values += np.random.random((len(layers),len(nodesx))) ---> 20 ds.isel(time=slice(t)).to_zarr(outfile, region={"time": slice(t)}) ~/.local/lib/python3.8/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks) 2029 encoding = {} 2030 -> 2031 return to_zarr( 2032 self, 2033 store=store, ~/.local/lib/python3.8/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks) 1359 1360 if region is not None: -> 1361 _validate_region(dataset, region) 1362 if append_dim is not None and append_dim in region: 1363 raise ValueError( ~/.local/lib/python3.8/site-packages/xarray/backends/api.py in _validate_region(ds, region)
1272 ]
1273 if non_matching_vars:
-> 1274 raise ValueError(
1275 f"when setting ValueError: when setting Here, the solution is however provided with the error message. Following the instructions, the snippet below finally works (as far as I can tell): ```python import xarray as xr from datetime import datetime,timedelta import numpy as np dt= datetime.now() times= np.arange(dt,dt+timedelta(days=6), timedelta(hours=1)) nodesx,nodesy,layers=np.arange(10,50), np.arange(10,50)+15, np.arange(10) ds=xr.Dataset() ds.coords['time']=('time', times) ds.coords['node_x']=('node', nodesx)ds.coords['node_y']=('node', nodesy)ds.coords['layer']=('layer', layers)outfile='my_zarr' varnames=['potato','banana', 'apple'] for var in varnames: ds[var]=(('time', 'layer', 'node'), np.zeros((len(times), len(layers),len(nodesx)))) ds.to_zarr(outfile, mode='a') for t in range(len(times)): for var in varnames: ds[var].isel(time=slice(t)).values += np.random.random((len(layers),len(nodesx))) ds.isel(time=slice(t)).to_zarr(outfile, region={"time": slice(t)}) ``` Maybe one would like to generalise Cheers |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_zarr: region not recognised as dataset dimensions 1077079208 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 5