home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where author_association = "CONTRIBUTOR" and issue = 745801652 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • martindurant 4
  • amatsukawa 2

issue 1

  • Serialization issue with distributed, h5netcdf, and fsspec (ImplicitToExplicitIndexingAdapter) · 6 ✖

author_association 1

  • CONTRIBUTOR · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
871611079 https://github.com/pydata/xarray/issues/4591#issuecomment-871611079 https://api.github.com/repos/pydata/xarray/issues/4591 MDEyOklzc3VlQ29tbWVudDg3MTYxMTA3OQ== amatsukawa 463809 2021-06-30T17:53:54Z 2021-06-30T17:53:54Z CONTRIBUTOR

I am trying to use worker_client that is opening xarrays, submitting further compute, and then saving xarrays. Perhaps somehow related to that?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Serialization issue with distributed, h5netcdf, and fsspec (ImplicitToExplicitIndexingAdapter) 745801652
870777725 https://github.com/pydata/xarray/issues/4591#issuecomment-870777725 https://api.github.com/repos/pydata/xarray/issues/4591 MDEyOklzc3VlQ29tbWVudDg3MDc3NzcyNQ== martindurant 6042212 2021-06-29T17:20:43Z 2021-06-29T17:20:43Z CONTRIBUTOR

I only have vague thoughts.

To be sure: you can pickle the file-system, any mapper (.get_mapper()) and any open file (.open()), right?

The question here is, why msgpack is being invoked. Those items, as well as any internal xarray stuff should only be in tasks, and so pickled. Is there a high-level-graph layer encapsulating things that were previously pickled? The only things that appear in any HLG-layer should be the paths and storage options needed to open a file-system, not the file-system itself.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Serialization issue with distributed, h5netcdf, and fsspec (ImplicitToExplicitIndexingAdapter) 745801652
870152019 https://github.com/pydata/xarray/issues/4591#issuecomment-870152019 https://api.github.com/repos/pydata/xarray/issues/4591 MDEyOklzc3VlQ29tbWVudDg3MDE1MjAxOQ== amatsukawa 463809 2021-06-29T01:10:30Z 2021-06-29T01:14:58Z CONTRIBUTOR

This issue appears to be back in some form, with engine=zarr.

The code looks like this, using fsspec's mapper API to access Azure blob store: fs = fsspec.filesystem("az://...") ds = xr.open_dataset(fs.get_mapper(path), engine="zarr", chunks="auto"): ...

I have not tracked down a self-contained reproducer, as it only fails for one call but not others of a similar form. Reporting it while I dig into it further, in case you have any suggestions.

[2021-06-29 00:44:47] [2021-06-29 00:44:47 core.py:74 CRITICAL] Failed to Serialize [2021-06-29 00:44:47] Traceback (most recent call last): [2021-06-29 00:44:47] File "/deps/envs/deps/lib/python3.7/site-packages/distributed/protocol/core.py", line 70, in dumps [2021-06-29 00:44:47] frames[0] = msgpack.dumps(msg, default=_encode_default, use_bin_type=True) [2021-06-29 00:44:47] File "/deps/envs/deps/lib/python3.7/site-packages/msgpack/__init__.py", line 35, in packb [2021-06-29 00:44:47] return Packer(**kwargs).pack(o) [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 286, in msgpack._cmsgpack.Packer.pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 292, in msgpack._cmsgpack.Packer.pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 289, in msgpack._cmsgpack.Packer.pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 258, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 258, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "msgpack/_packer.pyx", line 279, in msgpack._cmsgpack.Packer._pack [2021-06-29 00:44:47] File "/deps/envs/deps/lib/python3.7/site-packages/distributed/protocol/core.py", line 56, in _encode_default [2021-06-29 00:44:47] obj, serializers=serializers, on_error=on_error, context=context [2021-06-29 00:44:47] File "/deps/envs/deps/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 422, in serialize_and_split [2021-06-29 00:44:47] header, frames = serialize(x, serializers, on_error, context) [2021-06-29 00:44:47] File "/deps/envs/deps/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 256, in serialize [2021-06-29 00:44:47] iterate_collection=True, [2021-06-29 00:44:47] File "/deps/envs/deps/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 348, in serialize [2021-06-29 00:44:47] raise TypeError(msg, str(x)[:10000]) [2021-06-29 00:44:47] TypeError: ('Could not serialize object of type ImplicitToExplicitIndexingAdapter.', 'ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyIndexedArray(array=<xarray.backends.zarr.ZarrArrayWrapper object at 0x7f52dedbb690>, key=BasicIndexer((slice(None, None, None), slice(None, None, None))))))') [2021-06-29 00:44:47] [2021-06-29 00:44:47 utils.py:37 ERROR] ('Could not serialize object of type ImplicitToExplicitIndexingAdapter.', 'ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyIndexedArray(array=<xarray.backends.zarr.ZarrArrayWrapper object at 0x7f52dedbb690>, key=BasicIndexer((slice(None, None, None), slice(None, None, None))))))')

pip list | grep 'dask\|distributed\|xarray\|zarr\|msgpack\|adlfs' adlfs 0.7.7 dask 2021.6.2 distributed 2021.6.2 msgpack 1.0.0 xarray 0.18.2 zarr 2.8.3

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Serialization issue with distributed, h5netcdf, and fsspec (ImplicitToExplicitIndexingAdapter) 745801652
729863434 https://github.com/pydata/xarray/issues/4591#issuecomment-729863434 https://api.github.com/repos/pydata/xarray/issues/4591 MDEyOklzc3VlQ29tbWVudDcyOTg2MzQzNA== martindurant 6042212 2020-11-18T18:14:28Z 2020-11-18T18:14:28Z CONTRIBUTOR

The xarray.backends.h5netcdf_.H5NetCDFArrayWrapper seems to keep a reference to the open file, which for HTTP contains the open session. The linked PR fixes the serialization of those files, for the HTTP case.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Serialization issue with distributed, h5netcdf, and fsspec (ImplicitToExplicitIndexingAdapter) 745801652
729803257 https://github.com/pydata/xarray/issues/4591#issuecomment-729803257 https://api.github.com/repos/pydata/xarray/issues/4591 MDEyOklzc3VlQ29tbWVudDcyOTgwMzI1Nw== martindurant 6042212 2020-11-18T16:42:30Z 2020-11-18T16:42:30Z CONTRIBUTOR

OK, I can see a thing after all... please stand by

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Serialization issue with distributed, h5netcdf, and fsspec (ImplicitToExplicitIndexingAdapter) 745801652
729795030 https://github.com/pydata/xarray/issues/4591#issuecomment-729795030 https://api.github.com/repos/pydata/xarray/issues/4591 MDEyOklzc3VlQ29tbWVudDcyOTc5NTAzMA== martindurant 6042212 2020-11-18T16:29:18Z 2020-11-18T16:29:18Z CONTRIBUTOR

I don't think it's fsspec, the HTTPFileSystem and file objects are known to serialise.

However ```

distributed.protocol.serialize(dsc.surface.mean().data.dask['open_dataset-27832a1f850736a8d9a11a882ad06230surface-3b6f5b6a90c2cfa65379d3bfae22126f']) ({'serializer': 'error'}, ...) ``` (that's one of the keys I picked from the graph at random, your keys may differ) I can't say why this object is in the graph where perhaps it wasn't before, but it has a reference to a "CopyOnWriteArray", which sounds like a buffer owned by something else and probably the non-serializable part. Digging find a contained "<xarray.backends.h5netcdf_.H5NetCDFArrayWrapper at 0x17e669ad0>" which is not serializable - so maybe xarray can do something about this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Serialization issue with distributed, h5netcdf, and fsspec (ImplicitToExplicitIndexingAdapter) 745801652

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 6164.443ms · About: xarray-datasette