pull_requests: 1660231411
This data as json
id | node_id | number | state | locked | title | user | body | created_at | updated_at | closed_at | merged_at | merge_commit_sha | assignee | milestone | draft | head | base | author_association | auto_merge | repo | url | merged_by |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1660231411 | PR_kwDOAMm_X85i9R7z | 8575 | closed | 0 | Add chunk-friendly code path to `encode_cf_datetime` and `encode_cf_timedelta` | 6628425 | <!-- Feel free to remove check-list items aren't relevant to your change --> I finally had a moment to think about this some more following discussion in https://github.com/pydata/xarray/pull/8253. This PR adds a chunk-friendly code path to `encode_cf_datetime` and `encode_cf_timedelta`, which enables lazy encoding of time-like values, and by extension, preservation of chunks when writing time-like values to zarr. With these changes, the test added by @malmans2 in #8253 passes. Though it largely reuses existing code, the lazy encoding implemented in this PR is stricter than eager encoding in a couple ways: 1. It requires either both the encoding units and dtype be prescribed, or neither be prescribed; prescribing one or the other is not supported, since it requires inferring one or the other from the data. In the case that neither is specified, the dtype is set to `np.int64` and the units are either `"nanoseconds since 1970-01-01"` or `"microseconds since 1970-01-01"` depending on whether we are encoding `np.datetime64[ns]` values or `cftime.datetime` objects. In the case of `timedelta64[ns]` values, the units are set to `"nanoseconds"`. 2. In addition, if an integer dtype is prescribed, but the units are set such that floating point values would be required, it raises instead of modifying the units to enable integer encoding. This is a requirement since the data units may differ between chunks, so overriding could result in inconsistent units. As part of this PR, since dask requires we know the dtype of the array returned by the function passed to `map_blocks`, I also added logic to handle casting to the specified encoding dtype in an overflow-and-integer safe manner. This means an informative error message would be raised in the situation described in #8542: ``` OverflowError: Not possible to cast encoded times from dtype('int64') to dtype('int16') without overflow. Consider removing the dtype encoding, at which point xarray will make an appropriate choice, or explicitly switching to a larger integer dtype. ``` I eventually want to think about this on the decoding side as well, but that can wait for another PR. - [x] Closes #7132 - [x] Closes #8230 - [x] Closes #8432 - [x] Closes #8253 - [x] Addresses #8542 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` | 2023-12-30T01:25:17Z | 2024-01-30T02:17:58Z | 2024-01-29T19:12:30Z | 2024-01-29T19:12:30Z | d8c3b1ac591914998ce608159a15b4b41cc53c73 | 0 | d9d9701545c330075184e9bf30fb54fb2db46aee | e22b47511f4188e2203c5753de4a0a36094c2e83 | MEMBER | 13221727 | https://github.com/pydata/xarray/pull/8575 |
Links from other tables
- 0 rows from pull_requests_id in labels_pull_requests