home / github / pull_requests

Menu
  • Search all tables
  • GraphQL API

pull_requests: 1660523912

This data as json

id node_id number state locked title user body created_at updated_at closed_at merged_at merge_commit_sha assignee milestone draft head base author_association auto_merge repo url merged_by
1660523912 PR_kwDOAMm_X85i-ZWI 8577 open 0 Interpolate na: Fix #7665 and introduce arguments similar to pandas 42680748 - [x] Closes #7665 - [x] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` This is an attempt to close #7665 and combine the current possibilities from xarray (max_gap) and pandas (limit_direction, limit_area) regarding interpolation of nan values. Please see also my comments in #7665 for the motivation. This PR already involves a full implementation, documentation and corresponding tests, but before any final polishing, I want to hear your thoughts. Specifically, I think the API and default options need to be discussed. (See the proposed documentation of DataArray.interpolate_na() / Dataset.interpolate_na() for the current state) Implementation: Basically, I use ffill and bfill to calculate the coordinate of the left/right edge for every gap in the data. Based on edge coordinates, all masks (limit, limit_area, max_gap) are created. On the long term, it might be interesting to provide those arguments to other na-filling methods as well (ffill, bfill, fillna). # Things to consider ## limit_direction=forward Pros: - Backward compatible: If limit is not None, this is the current behaviour (see #7665) - Pandas compatible: Forward is the pandas default. Cons: - `limit_direction=both` feels more natural as default. If the user does `interpolate_na('x', fill_value='extrapolate')`, in my opinion they will expect all nans to be filled, including both boundaries. In contrast to pandas, this was the case in xarray before, but not anymore now if we follow pandas and set `limit_direction=forward`. `both` would also increase performance, since no restrictions need to be applied. ## limit_use_coordinates=False Pros: - Backward compatible - Pandas compatible -> Both xarray and pandas have no support for coordinate based limits so far. Cons: - Inconsistent with the current default of `use_coordinates=True` Generally, one might discuss if this separate argument is necessary or only one argument `use_coordinates` is sufficient. Imo, if the grid is irregular and `use_coordinates=True`, there is not a lot of sense in specifying the limit as a fixed number of grid cells. Alternatively, we could allow a three-tuple like `use_coordinates=(True, True, False)` to specify the index for interpolation, limit and max_gap separately (or something similar). ## use_coordinates=True So far, if there is no coordinate for `dim`, interpolation will succeed, falling silently back to a linearly increasing index. I feel, for `use_coordinate=True`, we should fail and inform the user to set use_coordinate=False if they really want a linear index. However, this is a breaking change. Maybe we can keep this behaviour with `use_coordinate=None` as new default option (= True if coord existent, else linear). ## Performance On my machine, the new limit implementation based on ffill/bfill seems to be a little less performant (10%) than the old one (based on rolling). There might be potential for improvements. 2023-12-30T23:28:47Z 2023-12-30T23:28:47Z     0a587b58ce21db4f758fe6756864e5ad92ad318f     0 6b811ba9d0ab9222bd444ad6206945b33a2850b3 c47f35db4e73507b4ac729ec23b0faf8ceb5ace1 CONTRIBUTOR   13221727 https://github.com/pydata/xarray/pull/8577  

Links from other tables

  • 0 rows from pull_requests_id in labels_pull_requests
Powered by Datasette ยท Queries took 240.585ms