home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where user = 463809 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 3

  • Reading and writing a zarr dataset multiple times casts bools to int8 3
  • Serialization issue with distributed, h5netcdf, and fsspec (ImplicitToExplicitIndexingAdapter) 2
  • Allow appending datetime & boolean variables to zarr stores 1

user 1

  • amatsukawa · 6 ✖

author_association 1

  • CONTRIBUTOR 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
871611079 https://github.com/pydata/xarray/issues/4591#issuecomment-871611079 https://api.github.com/repos/pydata/xarray/issues/4591 MDEyOklzc3VlQ29tbWVudDg3MTYxMTA3OQ== amatsukawa 463809 2021-06-30T17:53:54Z 2021-06-30T17:53:54Z CONTRIBUTOR

I am trying to use worker_client that is opening xarrays, submitting further compute, and then saving xarrays. Perhaps somehow related to that?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Serialization issue with distributed, h5netcdf, and fsspec (ImplicitToExplicitIndexingAdapter) 745801652
870152019 https://github.com/pydata/xarray/issues/4591#issuecomment-870152019 https://api.github.com/repos/pydata/xarray/issues/4591 MDEyOklzc3VlQ29tbWVudDg3MDE1MjAxOQ== amatsukawa 463809 2021-06-29T01:10:30Z 2021-06-29T01:14:58Z CONTRIBUTOR

This issue appears to be back in some form, with engine=zarr.

The code looks like this, using fsspec's mapper API to access Azure blob store: fs = fsspec.filesystem("az://...") ds = xr.open_dataset(fs.get_mapper(path), engine="zarr", chunks="auto"): ...

I have not tracked down a self-contained reproducer, as it only fails for one call but not others of a similar form. Reporting it while I dig into it further, in case you have any suggestions.

[2021-06-29 00:44:47] [2021-06-29 00:44:47 core.py:74 CRITICAL] Failed to Serialize [2021-06-29 00:44:47] Traceback (most recent call last): [2021-06-29 00:44:47] File "/deps/envs/deps/lib/python3.7/site-packages/distributed/protocol/core.py", line 70, in dumps [2021-06-29 00:44:47] frames[0] = msgpack.dumps(msg, default=_encode_default, use_bin_type=True) [2021-06-29 00:44:47] File "/deps/envs/deps/lib/python3.7/site-packages/msgpack/__init__.py", line 35, in packb [2021-06-29 00:44:47] return Packer(**kwargs).pack(o) [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 286, in msgpack._cmsgpack.Packer.pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 292, in msgpack._cmsgpack.Packer.pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 289, in msgpack._cmsgpack.Packer.pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 258, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 258, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 279, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "/deps/envs/deps/lib/python3.7/site-packages/distributed/protocol/core.py", line 56, in _encode_default [2021-06-29 00:44:47] obj, serializers=serializers, on_error=on_error, context=context [2021-06-29 00:44:47] File "/deps/envs/deps/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 422, in serialize_and_split [2021-06-29 00:44:47] header, frames = serialize(x, serializers, on_error, context) [2021-06-29 00:44:47] File "/deps/envs/deps/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 256, in serialize [2021-06-29 00:44:47] iterate_collection=True, [2021-06-29 00:44:47] File "/deps/envs/deps/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 348, in serialize [2021-06-29 00:44:47] raise TypeError(msg, str(x)[:10000]) [2021-06-29 00:44:47] TypeError: ('Could not serialize object of type ImplicitToExplicitIndexingAdapter.', 'ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyIndexedArray(array=<xarray.backends.zarr.ZarrArrayWrapper object at 0x7f52dedbb690>, key=BasicIndexer((slice(None, None, None), slice(None, None, None))))))') [2021-06-29 00:44:47] [2021-06-29 00:44:47 utils.py:37 ERROR] ('Could not serialize object of type ImplicitToExplicitIndexingAdapter.', 'ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyIndexedArray(array=<xarray.backends.zarr.ZarrArrayWrapper object at 0x7f52dedbb690>, key=BasicIndexer((slice(None, None, None), slice(None, None, None))))))')

pip list | grep 'dask\|distributed\|xarray\|zarr\|msgpack\|adlfs' adlfs 0.7.7 dask 2021.6.2 distributed 2021.6.2 msgpack 1.0.0 xarray 0.18.2 zarr 2.8.3

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Serialization issue with distributed, h5netcdf, and fsspec (ImplicitToExplicitIndexingAdapter) 745801652
766973931 https://github.com/pydata/xarray/issues/4826#issuecomment-766973931 https://api.github.com/repos/pydata/xarray/issues/4826 MDEyOklzc3VlQ29tbWVudDc2Njk3MzkzMQ== amatsukawa 463809 2021-01-25T17:19:19Z 2021-01-25T17:19:19Z CONTRIBUTOR

Tagging a few maintainers: @dcherian @shoyer.

Sorry to tag you directly, hope that's ok. I think I've found the issue here and would like to provide a PR to fix, but need some input on what you think would be best.

To summarize, the current behavior leading to the bug is:

  1. When a bool dtype is initially written, the maybe_encode_bool function us used to convert it the bool to a i1 with a vars.attr that says it is actually a bool. It would appear from #2937 that it ends up an i8 somehow anyway. https://github.com/pydata/xarray/blob/cc53a77ff0c8aaf8686f0b0bd7f75985b74e2054/xarray/conventions.py#L119
  2. When this is loaded the first time, the i8 is correctly identified as actually being a bool using the attributes. https://github.com/pydata/xarray/blob/cc53a77ff0c8aaf8686f0b0bd7f75985b74e2054/xarray/conventions.py#L352
  3. However, 2 lines above that, there is a encoding.setdefault("dtype", original_dtype) so this Variable object now has .encoding["dtype"] which is i8.
  4. When I try to save this again, it tries maybe_encode_bool from step 1 again, but this time the function is bypassed because of step 3 above.
  5. The dataset I write from step 4 now does not have the attribute identifying it as a bool, and so it's an i8 when I load it back.

I can think of a few fixes: - Drop the .encoding dict on load for bools. Presumably these .encodings are kept such that datasets attempts to preserve the compressor, chunk size, etc. of its source. However, given that .encoding seems to be dropped when I eg. do a .astype("bool") maybe this is OK for bools. - Set var.attrs["dtype"] = np.dtype("bool") on load. This would preserve what we'd get out of maybe_encode_bool. The challenge with this is that .attrs are not always preserved? - Change maybe_encode_bool to not ignore when .encoding["dtype"] exists. I suppose there would be a need to check that the compressor is still compatible, etc?

As a local fix while we consider these options, can you confirm that, as the docs state, the .encoding is only used for serializing and not deserializing arrays, and therefore if I drop the .encoding on my DataArrays as a temporary fix I wouldn't break anything (if I am ok with xarray not preserving my compressors, etc).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
765747736 https://github.com/pydata/xarray/issues/4826#issuecomment-765747736 https://api.github.com/repos/pydata/xarray/issues/4826 MDEyOklzc3VlQ29tbWVudDc2NTc0NzczNg== amatsukawa 463809 2021-01-22T23:37:47Z 2021-01-22T23:38:22Z CONTRIBUTOR

OK here's the other side of the problem. The original dtype (which is i8) is set in the encoding:

https://github.com/pydata/xarray/blob/v0.16.2/xarray/conventions.py#L350

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
765741719 https://github.com/pydata/xarray/issues/4826#issuecomment-765741719 https://api.github.com/repos/pydata/xarray/issues/4826 MDEyOklzc3VlQ29tbWVudDc2NTc0MTcxOQ== amatsukawa 463809 2021-01-22T23:27:57Z 2021-01-22T23:27:57Z CONTRIBUTOR

Apparently my proposed fix broke a bunch of other things, eg. some writing of timedeltas with units and such.

Deleting the "dtype" key in the .encoding of the boolean variable also seems to do the trick. The issue is that bools are not encoded correctly if the .encoding field already has a dtype:

https://github.com/pydata/xarray/blob/v0.16.2/xarray/conventions.py#L119

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
552160158 https://github.com/pydata/xarray/pull/3504#issuecomment-552160158 https://api.github.com/repos/pydata/xarray/issues/3504 MDEyOklzc3VlQ29tbWVudDU1MjE2MDE1OA== amatsukawa 463809 2019-11-10T03:58:53Z 2019-11-10T03:58:53Z CONTRIBUTOR

Thanks @max-sixty, I made a small update to the error message. I had added a line to doc/whats-new.rst in 2e91693 is that what you were referring to?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow appending datetime & boolean variables to zarr stores 520507183

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.949ms · About: xarray-datasette