home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where issue = 789410367 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 7

  • amatsukawa 3
  • kmuehlbauer 2
  • shaunc 1
  • shoyer 1
  • JoerivanEngelen 1
  • andersy005 1
  • slevang 1

author_association 3

  • CONTRIBUTOR 4
  • MEMBER 4
  • NONE 2

issue 1

  • Reading and writing a zarr dataset multiple times casts bools to int8 · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1498799474 https://github.com/pydata/xarray/issues/4826#issuecomment-1498799474 https://api.github.com/repos/pydata/xarray/issues/4826 IC_kwDOAMm_X85ZVd1y kmuehlbauer 5821660 2023-04-06T09:59:42Z 2023-04-06T09:59:42Z MEMBER

@JoerivanEngelen Thanks for taking the time. Much appreciated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
1498796759 https://github.com/pydata/xarray/issues/4826#issuecomment-1498796759 https://api.github.com/repos/pydata/xarray/issues/4826 IC_kwDOAMm_X85ZVdLX JoerivanEngelen 9744750 2023-04-06T09:57:38Z 2023-04-06T09:57:38Z NONE

@kmuehlbauer your fix fixes both the problems set up by @slevang and @amatsukawa on my laptop.

I furthermore tested the double roundtrip with load_dataset and to_netcdf, with the three possible engines, they all worked.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
1497971459 https://github.com/pydata/xarray/issues/4826#issuecomment-1497971459 https://api.github.com/repos/pydata/xarray/issues/4826 IC_kwDOAMm_X85ZSTsD kmuehlbauer 5821660 2023-04-05T18:56:23Z 2023-04-05T18:56:23Z MEMBER

Please check #7720 if that fixes the conversion problems. Thanks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
822981416 https://github.com/pydata/xarray/issues/4826#issuecomment-822981416 https://api.github.com/repos/pydata/xarray/issues/4826 MDEyOklzc3VlQ29tbWVudDgyMjk4MTQxNg== shaunc 193170 2021-04-20T05:21:40Z 2021-04-20T05:22:29Z NONE

Closed https://github.com/pydata/xarray/issues/5192 in favor of this as I think it's a duplicate. Just NB that it can occur with h5netcdf as well as netcdf4. (Thanks, @andersy005 )

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
822893203 https://github.com/pydata/xarray/issues/4826#issuecomment-822893203 https://api.github.com/repos/pydata/xarray/issues/4826 MDEyOklzc3VlQ29tbWVudDgyMjg5MzIwMw== andersy005 13301940 2021-04-20T01:01:46Z 2021-04-20T01:03:23Z MEMBER

I think this issue is related to #2937.

This is the same problem reported in https://github.com/pydata/xarray/issues/4386. All these issues (#2937, #4826, #4386, #5192) should be closable once there's a fix.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
780912328 https://github.com/pydata/xarray/issues/4826#issuecomment-780912328 https://api.github.com/repos/pydata/xarray/issues/4826 MDEyOklzc3VlQ29tbWVudDc4MDkxMjMyOA== shoyer 1217238 2021-02-17T23:04:03Z 2021-02-17T23:04:03Z MEMBER

There are at least two issues here:

  • For both netCDF and zarr, we should definitely "round trip" preserve dtypes. There are a few ways this might be done. I would have to look at a PR to say for sure, but from the looks of it "Change maybe_encode_bool to not ignore when .encoding["dtype"] exists" is the right fix.
  • Zarr supports bool data, so we don't need conversion from bool -> int8. We should just save data as bool.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
780900086 https://github.com/pydata/xarray/issues/4826#issuecomment-780900086 https://api.github.com/repos/pydata/xarray/issues/4826 MDEyOklzc3VlQ29tbWVudDc4MDkwMDA4Ng== slevang 39069044 2021-02-17T22:36:08Z 2021-02-17T22:36:08Z CONTRIBUTOR

I ran into this as well with the basic netcdf backends:

```python import xarray as xr

ds = xr.Dataset( data_vars={"foo":(["x"], [False, True, False])}, coords={"x": [1, 2, 3]}, ) ds.to_netcdf('test.nc') ds = xr.load_dataset('test.nc') print(ds.foo.dtype) ds.to_netcdf('test.nc') ds = xr.load_dataset('test.nc') print(ds.foo.dtype) Gives: bool int8 ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
766973931 https://github.com/pydata/xarray/issues/4826#issuecomment-766973931 https://api.github.com/repos/pydata/xarray/issues/4826 MDEyOklzc3VlQ29tbWVudDc2Njk3MzkzMQ== amatsukawa 463809 2021-01-25T17:19:19Z 2021-01-25T17:19:19Z CONTRIBUTOR

Tagging a few maintainers: @dcherian @shoyer.

Sorry to tag you directly, hope that's ok. I think I've found the issue here and would like to provide a PR to fix, but need some input on what you think would be best.

To summarize, the current behavior leading to the bug is:

  1. When a bool dtype is initially written, the maybe_encode_bool function us used to convert it the bool to a i1 with a vars.attr that says it is actually a bool. It would appear from #2937 that it ends up an i8 somehow anyway. https://github.com/pydata/xarray/blob/cc53a77ff0c8aaf8686f0b0bd7f75985b74e2054/xarray/conventions.py#L119
  2. When this is loaded the first time, the i8 is correctly identified as actually being a bool using the attributes. https://github.com/pydata/xarray/blob/cc53a77ff0c8aaf8686f0b0bd7f75985b74e2054/xarray/conventions.py#L352
  3. However, 2 lines above that, there is a encoding.setdefault("dtype", original_dtype) so this Variable object now has .encoding["dtype"] which is i8.
  4. When I try to save this again, it tries maybe_encode_bool from step 1 again, but this time the function is bypassed because of step 3 above.
  5. The dataset I write from step 4 now does not have the attribute identifying it as a bool, and so it's an i8 when I load it back.

I can think of a few fixes: - Drop the .encoding dict on load for bools. Presumably these .encodings are kept such that datasets attempts to preserve the compressor, chunk size, etc. of its source. However, given that .encoding seems to be dropped when I eg. do a .astype("bool") maybe this is OK for bools. - Set var.attrs["dtype"] = np.dtype("bool") on load. This would preserve what we'd get out of maybe_encode_bool. The challenge with this is that .attrs are not always preserved? - Change maybe_encode_bool to not ignore when .encoding["dtype"] exists. I suppose there would be a need to check that the compressor is still compatible, etc?

As a local fix while we consider these options, can you confirm that, as the docs state, the .encoding is only used for serializing and not deserializing arrays, and therefore if I drop the .encoding on my DataArrays as a temporary fix I wouldn't break anything (if I am ok with xarray not preserving my compressors, etc).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
765747736 https://github.com/pydata/xarray/issues/4826#issuecomment-765747736 https://api.github.com/repos/pydata/xarray/issues/4826 MDEyOklzc3VlQ29tbWVudDc2NTc0NzczNg== amatsukawa 463809 2021-01-22T23:37:47Z 2021-01-22T23:38:22Z CONTRIBUTOR

OK here's the other side of the problem. The original dtype (which is i8) is set in the encoding:

https://github.com/pydata/xarray/blob/v0.16.2/xarray/conventions.py#L350

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
765741719 https://github.com/pydata/xarray/issues/4826#issuecomment-765741719 https://api.github.com/repos/pydata/xarray/issues/4826 MDEyOklzc3VlQ29tbWVudDc2NTc0MTcxOQ== amatsukawa 463809 2021-01-22T23:27:57Z 2021-01-22T23:27:57Z CONTRIBUTOR

Apparently my proposed fix broke a bunch of other things, eg. some writing of timedeltas with units and such.

Deleting the "dtype" key in the .encoding of the boolean variable also seems to do the trick. The issue is that bools are not encoded correctly if the .encoding field already has a dtype:

https://github.com/pydata/xarray/blob/v0.16.2/xarray/conventions.py#L119

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2957.474ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows