issue_comments: 502481584
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/pull/2706#issuecomment-502481584 | https://api.github.com/repos/pydata/xarray/issues/2706 | 502481584 | MDEyOklzc3VlQ29tbWVudDUwMjQ4MTU4NA== | 9658781 | 2019-06-16T20:05:04Z | 2019-06-16T20:23:54Z | CONTRIBUTOR | Hey there everyone, sorry for not working on this for so long from my side. I just picked it up again and realised that the way the encoding works, all the datatypes and the maximum string lengths in the first xarray have to be representative for all others. Otherwise the following cuts away every char after the second:
It is solvable when explicitly setting the type before writing:
It becomes however worse when using non-ascii characters, as they get encoded in zarr.py l:218, but with the next chunk that is coming in the check in conventions.py l:86 fails. So I think we actually have to resolve the the TODO in zarr.py l:215 before this is able to be merged. Otherwise, the following leads to multiple issues:
The only way to work around this issue is to explicitly encode the data beforehand to utf-8:
Even though this is doable if it is known in advance, we should definitely mention this in the documentation or fix this by fixing the encoding itself. What do you think? Cheers, Jendrik |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
402908148 |