home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 314444743 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • shoyer 1
  • NowanIlfideme 1
  • kmuehlbauer 1
  • fmaussion 1
  • rhkleijn 1

author_association 3

  • MEMBER 3
  • CONTRIBUTOR 1
  • NONE 1

issue 1

  • How should xarray serialize bytes/unicode strings across Python/netCDF versions? · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
730267306 https://github.com/pydata/xarray/issues/2059#issuecomment-730267306 https://api.github.com/repos/pydata/xarray/issues/2059 MDEyOklzc3VlQ29tbWVudDczMDI2NzMwNg== kmuehlbauer 5821660 2020-11-19T10:08:16Z 2020-11-19T10:08:16Z MEMBER

@NowanIlfideme h5py 3 changes with regard to strings is tracked also in #4570

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray serialize bytes/unicode strings across Python/netCDF versions? 314444743
730263703 https://github.com/pydata/xarray/issues/2059#issuecomment-730263703 https://api.github.com/repos/pydata/xarray/issues/2059 MDEyOklzc3VlQ29tbWVudDczMDI2MzcwMw== NowanIlfideme 2067093 2020-11-19T10:02:35Z 2020-11-19T10:02:35Z NONE

This may be relevant here, maybe not, but it appears the HDF5 backend is also at odds with all the above serialization.

Our internal project's dependencies changed, and that moved the h5py version from 2.10 to 3.1; apparently there was a breaking change that meant unicode strings were either encoded or decoded as bytes. Thankfully we had a test for that, but figuring out what was wrong was difficult.

Essentially, netCDF4 files that were round-tripped to a BytesIO (via an HDF5 backend) had unicode strings converted to bytes. I'm not sure whether it was the encoding or decoding part, likely decoding, judging by the docs:

https://docs.h5py.org/en/stable/strings.html https://docs.h5py.org/en/stable/whatsnew/3.0.html#breaking-changes-deprecations

This might require even more special-casing to achieve consistent behavior for xarray users who don't really want to go into backend details (like me 😋).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray serialize bytes/unicode strings across Python/netCDF versions? 314444743
412319738 https://github.com/pydata/xarray/issues/2059#issuecomment-412319738 https://api.github.com/repos/pydata/xarray/issues/2059 MDEyOklzc3VlQ29tbWVudDQxMjMxOTczOA== shoyer 1217238 2018-08-12T05:27:10Z 2018-08-12T05:27:10Z MEMBER

Is it possible to preserve dtype when persisting xarray Datasets/DataArrays to disk?

Unfortunately, there is a frustrating disconnect between string data types in NumPy and netCDF.

This could be done in principle, but it would require adding our xarray specific convention on top of netCDF. I'm not sure this would be worth it -- we already end up converting np.unicode_ to object dtypes in many operations because we need a string dtype that can support missing values.

For reading data from disk, we use object dtype because we don't know the length of the longest string until we actually read the data, so this would be incompatible with lazy loading.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray serialize bytes/unicode strings across Python/netCDF versions? 314444743
412095066 https://github.com/pydata/xarray/issues/2059#issuecomment-412095066 https://api.github.com/repos/pydata/xarray/issues/2059 MDEyOklzc3VlQ29tbWVudDQxMjA5NTA2Ng== rhkleijn 32801740 2018-08-10T14:14:12Z 2018-08-10T14:14:12Z CONTRIBUTOR

Currently, the dtype does not seem to roundtrip faithfully. When I write np.unicode_ / str to file, it gets transformed to object when I subsequently read it from disk. I am using xarray 0.10.8 with Python 3 on Windows.

This can be reproduced by inserting the following line in the script above (and adjusting the print statement accordingly)

python with xr.open_dataset(filename) as ds: read_dtype = ds['data'].dtype which gives:

Python version | NetCDF version | NumPy datatype | NetCDF datatype|Numpy datatype (read) -- | -- | -- | --|-- | Python 3 | NETCDF3 | np.string_ / bytes | NC_CHAR | \|S3 | | Python 3 | NETCDF4 | np.string_ / bytes | NC_CHAR | \|S3 | | Python 3 | NETCDF3 | np.unicode_ / str | NC_CHAR with UTF-8 encoding | object | | Python 3 | NETCDF4 | np.unicode_ / str | NC_STRING | object | | Python 3 | NETCDF3 | object bytes/bytes | NC_CHAR | \|S3 | | Python 3 | NETCDF4 | object bytes/bytes | NC_CHAR | \|S3 | | Python 3 | NETCDF3 | object unicode/str | NC_CHAR with UTF-8 encoding | object | | Python 3 | NETCDF4 | object unicode/str | NC_STRING | object |

Also object bytes/bytes seems not to roundtrip nicely as it seems to be converted to np.string_ / bytes.

Is it possible to preserve dtype when persisting xarray Datasets/DataArrays to disk?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray serialize bytes/unicode strings across Python/netCDF versions? 314444743
381620236 https://github.com/pydata/xarray/issues/2059#issuecomment-381620236 https://api.github.com/repos/pydata/xarray/issues/2059 MDEyOklzc3VlQ29tbWVudDM4MTYyMDIzNg== fmaussion 10050469 2018-04-16T14:33:20Z 2018-04-16T14:33:20Z MEMBER

Thanks a lot Stephan for writing that up!

The counter-argument is that it may not be worth changing this at this late point, given that we will be sunsetting Python 2 support by year's end.

This would be my personal opinion here. I you feel like this is something you'd like to provide before the last py2-compatible xarray comes out than I'm fine with it, but it shouldn't have top-priority...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray serialize bytes/unicode strings across Python/netCDF versions? 314444743

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.854ms · About: xarray-datasette