home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "CONTRIBUTOR" and issue = 1159923690 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • d70-t 7

issue 1

  • `to_zarr` with append or region mode and `_FillValue` doesnt work · 7 ✖

author_association 1

  • CONTRIBUTOR · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1064981526 https://github.com/pydata/xarray/issues/6329#issuecomment-1064981526 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_elQW d70-t 6574622 2022-03-11T10:28:35Z 2022-03-11T10:28:35Z CONTRIBUTOR

Thanks for pointing out region again. I've updated the header and the initial comment.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1063977656 https://github.com/pydata/xarray/issues/6329#issuecomment-1063977656 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_awK4 d70-t 6574622 2022-03-10T11:56:44Z 2022-03-10T11:56:44Z CONTRIBUTOR

Yes, this is kind of the behaviour I'd expect. And great that it helped clarifying things. Still, building up the metadata nicely upfront (which is required for region writes) ist quite convoluted... That's what I meant with

some better tooling for writing and updating zarr dataset metadata (I don't know if that would fit in the realm of xarray though, as it looks like handling Datasets without content. For "appending" metadata, I really don't know how I'd picture this propery in xarray world.)

in the previous comment. I think, establishing and documenting good practices for this would help, but probably we also want to have better tools. In any case, this would probably be yet another issue.

Note that if you care about this paricular example (e.g. appending in a single thread in increasing order of timesteps), then it should also be possible to do this much simpler using append:

```python filename='processed_dataset.zarr' ds = xr.tutorial.open_dataset('air_temperature') ds.air.encoding['dtype']=np.dtype('float32') X,Y=250, 250 #size of each final timestep

for i in range(len(ds.time)): # some kind of heavy processing arr_r=some_processing(ds.isel(time=slice(i,i+1)),X,Y) del arr_r.air.attrs["_FillValue"] if os.path.exists(filename): arr_r.to_zarr(filename, append_dim='time') else: arr_r.to_zarr(filename) ```

If you find out more about the cloud case, please post a note, otherwise, we can assume that the original bug report is fine?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1063859715 https://github.com/pydata/xarray/issues/6329#issuecomment-1063859715 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_aTYD d70-t 6574622 2022-03-10T09:44:59Z 2022-03-10T09:44:59Z CONTRIBUTOR

Sure, no problem. I believe, this page has a good summary:

mode ({"w", "w-", "a", "r+", None}, optional) – Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override existing variables (create if does not exist); “r+” means modify existing array values only (raise an error if any metadata or shapes would change). The default mode is “a” if append_dim is set. Otherwise, it is “r+” if region is set and w- otherwise.

So the difference between "a" and "r+" roughly codifies the intended behaviour for sequential access (it's ok to modify everything) and parallel access to independent chunks (where modifying metadata would be bad).

So probably that message was suggesting that you have to use "a" if you want to modify metadata (e.g. by expanding the shape), which is true. But to me, it's unclear how one would do that safely with (potentially) parallel region writes, so it's kind of reasonable that region writes don't like to modify metadata.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1062755678 https://github.com/pydata/xarray/issues/6329#issuecomment-1062755678 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_WF1e d70-t 6574622 2022-03-09T10:06:22Z 2022-03-09T10:06:22Z CONTRIBUTOR

Yes, that looks like the error as described in the initial post. Adding the described workaround (i.e. del buff.air.attrs["_FillValue"] in this case) leads to the next error message:

ValueError: variable 'air' already exists with different dimension sizes: {'time': 0, 'y': 250, 'x': 250} != {'time': 1, 'y': 250, 'x': 250}. to_zarr() only supports changing dimension sizes when explicitly appending, but append_dim=None.

Which is due to a mix of append-mode (mode='a') and region-write (region={'time':slice(i,i+1)}), which is e.g. out of the scope as outlined in this comment. It may or may not be possible or intended to support this, but I'm not deep enough into the design of xarray to give a definitive answer here. For me, it's unclear how this should behave. My current point of view is:

  • append: may change structure-defining metadata, must be sequential, mode='a'
  • region: may not change structure-defining metadata, can be parallel, mode='r+'

Currently, I can't really imagine how a mix of both should behave. If you can't prepare the dataset for the final shape upfront (to use region) and you also can't use append_dim, then probably what's needed is a separate method of expanding the dataset (i.e. reshape) without filling in the data. If such a thing would be available, one could (as a user) ensure that all reshaping operations are properly sequenced with region operations, but region operations could be run in parallel. (I think this is possible with plain-zarr, but I'm not aware of a corresponding xarray API).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1061711069 https://github.com/pydata/xarray/issues/6329#issuecomment-1061711069 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_SGzd d70-t 6574622 2022-03-08T12:09:38Z 2022-03-08T12:09:38Z CONTRIBUTOR

You've got the encoding of air set to int16: python print(buff.air.encoding) {'source': '.../xarray_tutorial_data/69c68be1605878a6c8efdd34d85b4ca1-air_temperature.nc', 'original_shape': (2920, 25, 53), 'dtype': dtype('int16'), 'scale_factor': 0.01, 'grid_mapping': 'spatial_ref'}

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1061081884 https://github.com/pydata/xarray/issues/6329#issuecomment-1061081884 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_PtMc d70-t 6574622 2022-03-07T20:03:18Z 2022-03-07T20:03:18Z CONTRIBUTOR

Sorry, @Boorhin. But the code example you showed has many syntax errors:

$ python3 test.py File "test.py", line 8 return arr_r.x.values, arr_r.y.values ^ SyntaxError: invalid syntax (there are more and I wasn't sure how to fix them at all places to match what you likely wanted to express)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690
1059426353 https://github.com/pydata/xarray/issues/6329#issuecomment-1059426353 https://api.github.com/repos/pydata/xarray/issues/6329 IC_kwDOAMm_X84_JZAx d70-t 6574622 2022-03-04T18:48:13Z 2022-03-04T18:48:13Z CONTRIBUTOR

If that's necessary to reproduce the problem, then yes. If it's possible to show the same thing with less "noise", then it's better to not use the tutorial dataset and to not use something like a cloud backend. But we can also try to iterate on this again, to progressively get down to a smaller example.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr` with append or region mode and `_FillValue` doesnt work 1159923690

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.392ms · About: xarray-datasette