home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where issue = 1722417436 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • kmuehlbauer 5
  • ghiggi 2

author_association 2

  • MEMBER 5
  • NONE 2

issue 1

  • `open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1561584592 https://github.com/pydata/xarray/issues/7868#issuecomment-1561584592 https://api.github.com/repos/pydata/xarray/issues/7868 IC_kwDOAMm_X85dE-PQ kmuehlbauer 5821660 2023-05-24T16:50:34Z 2023-05-24T16:50:34Z MEMBER

Thanks @ghiggi for your comment.

The problem is we have at least two contradicting user requests here, see #7328 and #7862.

I'm sure there is a solution to accommodate both sides.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` 1722417436
1561358915 https://github.com/pydata/xarray/issues/7868#issuecomment-1561358915 https://api.github.com/repos/pydata/xarray/issues/7868 IC_kwDOAMm_X85dEHJD ghiggi 19285200 2023-05-24T15:20:00Z 2023-05-24T15:20:00Z NONE

Dask array with dtype object can contain whatever python object (i.e. I saw examples of geometry and matplotlib collections within dask arrays with object dtype). As a consequence, dask do not try the conversion to i.e. str to estimate the array size, since there is no clean way AFAIK to attach an attribute to dtype suggesting that the object is actually a string.

With your PR, the dtype is not anymore object when creating the dask.array and this solves the issue I guess. Did I overlooked something?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` 1722417436
1561214028 https://github.com/pydata/xarray/issues/7868#issuecomment-1561214028 https://api.github.com/repos/pydata/xarray/issues/7868 IC_kwDOAMm_X85dDjxM kmuehlbauer 5821660 2023-05-24T13:58:16Z 2023-05-24T13:58:16Z MEMBER

My main question here is, why is dask not trying to retrieve the object types from dtype.metadata? Or does it and fail for some reason?.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` 1722417436
1560674198 https://github.com/pydata/xarray/issues/7868#issuecomment-1560674198 https://api.github.com/repos/pydata/xarray/issues/7868 IC_kwDOAMm_X85dBf-W kmuehlbauer 5821660 2023-05-24T08:27:11Z 2023-05-24T08:27:11Z MEMBER

@ghiggi Glad it works, but we still have to check if that is the correct location for the fix, as it's not CF specific.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` 1722417436
1560651807 https://github.com/pydata/xarray/issues/7868#issuecomment-1560651807 https://api.github.com/repos/pydata/xarray/issues/7868 IC_kwDOAMm_X85dBagf ghiggi 19285200 2023-05-24T08:12:18Z 2023-05-24T08:12:18Z NONE

Thanks @kmuehlbauer ! https://github.com/pydata/xarray/pull/7869 solve the issues !

Summarizing: - With #7869, netCDF4 with NC_STRING variable arrays are now read into xarray as Unicode dtype (instead of object) - As a consequence dask can estimate the array's size and xr.open_dataset(fpath, chunks="auto") does not raise anymore the NotImplementedError. - NC_CHAR variable arrays continue to be read into xarray as fixed-length byte-string dtype. Maybe something more could be done to deserialize also NC_CHAR to Unicode. However, this might cause some backward incompatibilities and might be better to address this in a separate PR.

Thanks again @kmuehlbauer for having resolved the problem in less than 2 hours :1st_place_medal:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` 1722417436
1559959581 https://github.com/pydata/xarray/issues/7868#issuecomment-1559959581 https://api.github.com/repos/pydata/xarray/issues/7868 IC_kwDOAMm_X85c-xgd kmuehlbauer 5821660 2023-05-23T18:42:55Z 2023-05-23T19:01:00Z MEMBER

@ghiggi Thanks for getting this back into action. I got dragged away from the one string object issue in #7654. I'll split this out and add a PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` 1722417436
1559973194 https://github.com/pydata/xarray/issues/7868#issuecomment-1559973194 https://api.github.com/repos/pydata/xarray/issues/7868 IC_kwDOAMm_X85c-01K kmuehlbauer 5821660 2023-05-23T18:55:46Z 2023-05-23T18:55:46Z MEMBER

@ghiggi I'd appreciate if you could test your workflows against #7869. Your example and the one over in #7652 are working AFAICT.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` 1722417436

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.07ms · About: xarray-datasette