home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 2060490766

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2060490766 PR_kwDOAMm_X85i9R7z 8575 Add chunk-friendly code path to `encode_cf_datetime` and `encode_cf_timedelta` 6628425 closed 0     6 2023-12-30T01:25:17Z 2024-01-30T02:17:58Z 2024-01-29T19:12:30Z MEMBER   0 pydata/xarray/pulls/8575

I finally had a moment to think about this some more following discussion in https://github.com/pydata/xarray/pull/8253. This PR adds a chunk-friendly code path to encode_cf_datetime and encode_cf_timedelta, which enables lazy encoding of time-like values, and by extension, preservation of chunks when writing time-like values to zarr. With these changes, the test added by @malmans2 in #8253 passes.

Though it largely reuses existing code, the lazy encoding implemented in this PR is stricter than eager encoding in a couple ways: 1. It requires either both the encoding units and dtype be prescribed, or neither be prescribed; prescribing one or the other is not supported, since it requires inferring one or the other from the data. In the case that neither is specified, the dtype is set to np.int64 and the units are either "nanoseconds since 1970-01-01" or "microseconds since 1970-01-01" depending on whether we are encoding np.datetime64[ns] values or cftime.datetime objects. In the case of timedelta64[ns] values, the units are set to "nanoseconds". 2. In addition, if an integer dtype is prescribed, but the units are set such that floating point values would be required, it raises instead of modifying the units to enable integer encoding. This is a requirement since the data units may differ between chunks, so overriding could result in inconsistent units.

As part of this PR, since dask requires we know the dtype of the array returned by the function passed to map_blocks, I also added logic to handle casting to the specified encoding dtype in an overflow-and-integer safe manner. This means an informative error message would be raised in the situation described in #8542:

OverflowError: Not possible to cast encoded times from dtype('int64') to dtype('int16') without overflow. Consider removing the dtype encoding, at which point xarray will make an appropriate choice, or explicitly switching to a larger integer dtype.

I eventually want to think about this on the decoding side as well, but that can wait for another PR.

  • [x] Closes #7132
  • [x] Closes #8230
  • [x] Closes #8432
  • [x] Closes #8253
  • [x] Addresses #8542
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8575/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 160.328ms · About: xarray-datasette