home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where issue = 868352536 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 6

  • rabernat 2
  • shoyer 2
  • bolliger32 2
  • mathause 1
  • vedal 1
  • rizziemma 1

author_association 3

  • MEMBER 5
  • CONTRIBUTOR 2
  • NONE 2

issue 1

  • Zarr encoding attributes persist after slicing data, raising error on `to_zarr` · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1339620908 https://github.com/pydata/xarray/issues/5219#issuecomment-1339620908 https://api.github.com/repos/pydata/xarray/issues/5219 IC_kwDOAMm_X85P2P4s rizziemma 32960943 2022-12-06T16:16:20Z 2022-12-06T16:16:20Z NONE

@vedal Hi, I'm still facing this issue using xarray 2022.11.0 and python 3.10. Is this PR included in the latest xarray version ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr encoding attributes persist after slicing data, raising error on `to_zarr` 868352536
1312691114 https://github.com/pydata/xarray/issues/5219#issuecomment-1312691114 https://api.github.com/repos/pydata/xarray/issues/5219 IC_kwDOAMm_X85OPhOq vedal 22004000 2022-11-13T10:01:37Z 2022-11-13T10:02:07Z NONE

Is this still an issue after merging PR ,#5065?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr encoding attributes persist after slicing data, raising error on `to_zarr` 868352536
839106491 https://github.com/pydata/xarray/issues/5219#issuecomment-839106491 https://api.github.com/repos/pydata/xarray/issues/5219 MDEyOklzc3VlQ29tbWVudDgzOTEwNjQ5MQ== rabernat 1197350 2021-05-11T20:08:27Z 2021-05-11T20:08:27Z MEMBER

Instead we could require explicitly supplying chunks vis the encoding parameter in the to_zarr() call.

This could also break existing workflows though. For example, pangeo-forge is using the encoding.chunks attribute to specify target dataset chunks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr encoding attributes persist after slicing data, raising error on `to_zarr` 868352536
839090347 https://github.com/pydata/xarray/issues/5219#issuecomment-839090347 https://api.github.com/repos/pydata/xarray/issues/5219 MDEyOklzc3VlQ29tbWVudDgzOTA5MDM0Nw== shoyer 1217238 2021-05-11T20:02:58Z 2021-05-11T20:02:58Z MEMBER

It occurs to me that another possible fix would be to ignore chunks from pre-existing encoding attributes in to_zarr(). Instead we could require explicitly supplying chunks vis the encoding parameter in the to_zarr() call.

Probably better to remove encoding more aggressively overall, but this might be a good intermediate step...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr encoding attributes persist after slicing data, raising error on `to_zarr` 868352536
830974512 https://github.com/pydata/xarray/issues/5219#issuecomment-830974512 https://api.github.com/repos/pydata/xarray/issues/5219 MDEyOklzc3VlQ29tbWVudDgzMDk3NDUxMg== shoyer 1217238 2021-05-03T00:52:21Z 2021-05-03T00:52:21Z MEMBER

Somewhat inevitably, I finally fit this issue this week, too :)

I am increasingly coming to the conclusion that there are no "safe" manipulations in xarray that should preserve encoding. Perhaps we should just drop encoding after every operation in Xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr encoding attributes persist after slicing data, raising error on `to_zarr` 868352536
828072654 https://github.com/pydata/xarray/issues/5219#issuecomment-828072654 https://api.github.com/repos/pydata/xarray/issues/5219 MDEyOklzc3VlQ29tbWVudDgyODA3MjY1NA== bolliger32 4801430 2021-04-28T01:31:17Z 2021-04-28T01:31:17Z CONTRIBUTOR

Yup this all makes sense thanks for the explanation @rabernat . It does seem like it would be good to drop encoding["chunks"] at some point but I can see how that is tricky timing. I'm assuming it's necessary metadata to keep around when the zarr has been "opened" but data has not yet been read, b/c it is used by xarray to read the zarr?

Anyways, we'll continue with the manual deletion for now but I'm inclined to keep this issue open as I do think it would be helpful to eventually figure out how to automatically do this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr encoding attributes persist after slicing data, raising error on `to_zarr` 868352536
828071017 https://github.com/pydata/xarray/issues/5219#issuecomment-828071017 https://api.github.com/repos/pydata/xarray/issues/5219 MDEyOklzc3VlQ29tbWVudDgyODA3MTAxNw== rabernat 1197350 2021-04-28T01:26:34Z 2021-04-28T01:26:34Z MEMBER

we probably would NOT want to use safe_chunks=False, correct?

correct

The problem in this issue is that the dataset is carrying around its original chunks in .encoding and then xarray tries to use these values to set the chunk encoding on the second write op. The solution is to manually delete the chunk encoding from all your data variables. Something like python for var in ds: del ds[var].encoding['chunks']

Originally part of #5056 was a change that would have xarray automatically do this deletion after some operations (such as calling .chunk()); however, we could not reach a consensus on the best way to implement that change. Your example is interesting because it is a slightly different scenario -- calling sel() instead of chunk() -- but the root cause appears to be the same: encoding['chunks'] is being kept around too conservatively.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr encoding attributes persist after slicing data, raising error on `to_zarr` 868352536
828004004 https://github.com/pydata/xarray/issues/5219#issuecomment-828004004 https://api.github.com/repos/pydata/xarray/issues/5219 MDEyOklzc3VlQ29tbWVudDgyODAwNDAwNA== bolliger32 4801430 2021-04-27T23:05:02Z 2021-04-27T23:05:28Z CONTRIBUTOR

Thanks for the pointer @mathause that is super helpful. And thanks for #5065 @rabernat. If I'm understanding the PR correctly (looks like it evolved a lot!) in most cases matching the example above, we probably would NOT want to use safe_chunks=False, correct? B/c if we're writing in parallel, this could lead to data corruption. Instead, we'd want to manually delete the chunks item from each variables encoding attribute after loading/persisting the data into memory. That way, to_zarr would use the dask chunks as the zarr chunks, rather than relying on whatever chunks were used in the "original" zarr store (the source of the in-memory Dataset).

Does that sound right? I feel like if I'm reading through the PR comments correctly, this was one of the controversial parts that didnt' end up in the merged PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr encoding attributes persist after slicing data, raising error on `to_zarr` 868352536
827990207 https://github.com/pydata/xarray/issues/5219#issuecomment-827990207 https://api.github.com/repos/pydata/xarray/issues/5219 MDEyOklzc3VlQ29tbWVudDgyNzk5MDIwNw== mathause 10194086 2021-04-27T22:36:42Z 2021-04-27T22:36:42Z MEMBER

Thanks for the clear error report. On master you should be able to do ds2.to_zarr("test2.zarr", consolidated=True, mode="w", safe_chunks=False) - see #5065

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr encoding attributes persist after slicing data, raising error on `to_zarr` 868352536

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.873ms · About: xarray-datasette