home / github / pull_requests

Menu
  • Search all tables
  • GraphQL API

pull_requests: 1660231411

This data as json

id node_id number state locked title user body created_at updated_at closed_at merged_at merge_commit_sha assignee milestone draft head base author_association auto_merge repo url merged_by
1660231411 PR_kwDOAMm_X85i9R7z 8575 closed 0 Add chunk-friendly code path to `encode_cf_datetime` and `encode_cf_timedelta` 6628425 <!-- Feel free to remove check-list items aren't relevant to your change --> I finally had a moment to think about this some more following discussion in https://github.com/pydata/xarray/pull/8253. This PR adds a chunk-friendly code path to `encode_cf_datetime` and `encode_cf_timedelta`, which enables lazy encoding of time-like values, and by extension, preservation of chunks when writing time-like values to zarr. With these changes, the test added by @malmans2 in #8253 passes. Though it largely reuses existing code, the lazy encoding implemented in this PR is stricter than eager encoding in a couple ways: 1. It requires either both the encoding units and dtype be prescribed, or neither be prescribed; prescribing one or the other is not supported, since it requires inferring one or the other from the data. In the case that neither is specified, the dtype is set to `np.int64` and the units are either `"nanoseconds since 1970-01-01"` or `"microseconds since 1970-01-01"` depending on whether we are encoding `np.datetime64[ns]` values or `cftime.datetime` objects. In the case of `timedelta64[ns]` values, the units are set to `"nanoseconds"`. 2. In addition, if an integer dtype is prescribed, but the units are set such that floating point values would be required, it raises instead of modifying the units to enable integer encoding. This is a requirement since the data units may differ between chunks, so overriding could result in inconsistent units. As part of this PR, since dask requires we know the dtype of the array returned by the function passed to `map_blocks`, I also added logic to handle casting to the specified encoding dtype in an overflow-and-integer safe manner. This means an informative error message would be raised in the situation described in #8542: ``` OverflowError: Not possible to cast encoded times from dtype('int64') to dtype('int16') without overflow. Consider removing the dtype encoding, at which point xarray will make an appropriate choice, or explicitly switching to a larger integer dtype. ``` I eventually want to think about this on the decoding side as well, but that can wait for another PR. - [x] Closes #7132 - [x] Closes #8230 - [x] Closes #8432 - [x] Closes #8253 - [x] Addresses #8542 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` 2023-12-30T01:25:17Z 2024-01-30T02:17:58Z 2024-01-29T19:12:30Z 2024-01-29T19:12:30Z d8c3b1ac591914998ce608159a15b4b41cc53c73     0 d9d9701545c330075184e9bf30fb54fb2db46aee e22b47511f4188e2203c5753de4a0a36094c2e83 MEMBER   13221727 https://github.com/pydata/xarray/pull/8575  

Links from other tables

  • 0 rows from pull_requests_id in labels_pull_requests
Powered by Datasette · Queries took 163.29ms