home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 766973931

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4826#issuecomment-766973931 https://api.github.com/repos/pydata/xarray/issues/4826 766973931 MDEyOklzc3VlQ29tbWVudDc2Njk3MzkzMQ== 463809 2021-01-25T17:19:19Z 2021-01-25T17:19:19Z CONTRIBUTOR

Tagging a few maintainers: @dcherian @shoyer.

Sorry to tag you directly, hope that's ok. I think I've found the issue here and would like to provide a PR to fix, but need some input on what you think would be best.

To summarize, the current behavior leading to the bug is:

  1. When a bool dtype is initially written, the maybe_encode_bool function us used to convert it the bool to a i1 with a vars.attr that says it is actually a bool. It would appear from #2937 that it ends up an i8 somehow anyway. https://github.com/pydata/xarray/blob/cc53a77ff0c8aaf8686f0b0bd7f75985b74e2054/xarray/conventions.py#L119
  2. When this is loaded the first time, the i8 is correctly identified as actually being a bool using the attributes. https://github.com/pydata/xarray/blob/cc53a77ff0c8aaf8686f0b0bd7f75985b74e2054/xarray/conventions.py#L352
  3. However, 2 lines above that, there is a encoding.setdefault("dtype", original_dtype) so this Variable object now has .encoding["dtype"] which is i8.
  4. When I try to save this again, it tries maybe_encode_bool from step 1 again, but this time the function is bypassed because of step 3 above.
  5. The dataset I write from step 4 now does not have the attribute identifying it as a bool, and so it's an i8 when I load it back.

I can think of a few fixes: - Drop the .encoding dict on load for bools. Presumably these .encodings are kept such that datasets attempts to preserve the compressor, chunk size, etc. of its source. However, given that .encoding seems to be dropped when I eg. do a .astype("bool") maybe this is OK for bools. - Set var.attrs["dtype"] = np.dtype("bool") on load. This would preserve what we'd get out of maybe_encode_bool. The challenge with this is that .attrs are not always preserved? - Change maybe_encode_bool to not ignore when .encoding["dtype"] exists. I suppose there would be a need to check that the compressor is still compatible, etc?

As a local fix while we consider these options, can you confirm that, as the docs state, the .encoding is only used for serializing and not deserializing arrays, and therefore if I drop the .encoding on my DataArrays as a temporary fix I wouldn't break anything (if I am ok with xarray not preserving my compressors, etc).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  789410367
Powered by Datasette · Queries took 1.377ms · About: xarray-datasette