id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1637898633,I_kwDOAMm_X85hoFmJ,7665,Interpolate_na: Rework 'limit' argument documentation/implementation,42680748,open,0,,,6,2023-03-23T16:46:39Z,2024-03-13T17:53:58Z,,CONTRIBUTOR,,,,"### What is your issue?
Currently, the 'limit' argument of `interpolate_na` shows some counterintuitive/undocumented behaviour.
Take the following example:
```python
import xarray as xr
import numpy as np
n=np.nan
da=xr.DataArray([n, n, n, 4, 5, n ,n ,n], dims=[""y""])
da.interpolate_na('y', limit=1, fill_value='extrapolate')
```
This will produce the following result:
```
array([ 1., nan, nan, 4., 5., 6., nan, nan])
```
Two things are surprising, in my opinion:
1. The interpolated value `1` at the beginning is far from any of the given values
2. The filling is done only towards the 'right'. This asymmetric behaviour is not mentioned in the documentation.
## Comparison to pandas
Similar behaviour can be created using pandas with the following arguments:
```python
da=xr.DataArray([n, n, n, 4, 5, n ,n ,n], dims=[""y""])
dap=da.to_pandas()
dap.interpolate(method='slinear', limit=1, limit_direction='forward', fill_value='extrapolate')
```
Output
```
y
0 NaN
1 NaN
2 NaN
3 4.0
4 5.0
5 6.0
6 NaN
7 NaN
dtype: float64
```
This is equivalent to the current xarray behaviour, except there is no `1` at the beginning.
## Cause
Currently, the fill mask in xarray is implemented using a rolling window operation, where values outside the array are assumed to be valid (therefore the `1`). See `xarray.core.missing._get_valid_fill_mask`
## Possible Solutions
### Boundary Issue
Concerning the `1` at the beginning: I think this should be considered a bug. It is likely not what you would expect if you specify a limit. As stated, pandas does not create it as well.
### Asymmetric Filling
Concerning the asymmetric filling, I see two options:
1. No changes to the code, but mention in the documentation that (effectively), a forward-fill is done.
2. Make something similar to [what pandas is doing](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html). In pandas, there are two additional arguments controlling the limit behaviour: `limit_direction` is controlling the fill direction (left, right or both). `limit_area` effectively controls if we only do interpolation or allow for extrapolation as well.
What do you think?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7665/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2174011115,I_kwDOAMm_X86BlMbr,8811,Rolling operations with numbagg produce invalid values after numpy.inf,42680748,open,0,,,7,2024-03-07T14:35:24Z,2024-03-12T17:42:33Z,,CONTRIBUTOR,,,,"### What is your issue?
If an array contains `np.inf` and a rolling operation is applied, all values after this one are `nan` if numbagg is used. Take the following example:
```python
import xarray as xr
import numpy as np
xr.set_options(use_numbagg=False)
da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x'])
da.rolling(x=2).sum()
```
Output
```
Size: 88B
array([nan, 3., 5., inf, inf, 9., 11., 13., 15., 17., 19.])
Dimensions without coordinates: x
```
With Numbagg:
```python
xr.set_options(use_numbagg=True)
da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x'])
print(da.rolling(x=2).sum())
```
Output
```
Size: 88B
array([nan, 3., 5., inf, inf, nan, nan, nan, nan, nan, nan])
Dimensions without coordinates: x
```
### What did I expect?
I expected no user-visible changes in the output values if numbagg is activated.
Maybe, this is not a bug, but expected behaviour for numbagg. The following warning was raised from the second call:
```
.../Local/virtual_environments/xarray_performance/lib/python3.10/site-packages/numbagg/decorators.py:247: RuntimeWarning: invalid value encountered in move_sum
return gufunc(*arr, window, min_count, axis=axis, **kwargs)
```
If this is expected, I think it would be good to have a page in the documentation which lists the downsides and limitations of the various tool to accelerate xarray. From the current [installation docs](https://docs.xarray.dev/en/v2024.02.0/getting-started-guide/installing.html#for-accelerating-xarray), I assumed I just need to install numbagg/bottleneck to make xarray faster without any changes in output values.
### Environment
```
xarray==2024.2.0
numbagg==0.8.0
```
Package Versions
```txt
anyio==4.3.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
Babel==2.14.0
beautifulsoup4==4.12.3
bleach==6.1.0
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
comm==0.2.1
contourpy==1.2.0
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
exceptiongroup==1.2.0
executing==2.0.1
fastjsonschema==2.19.1
fonttools==4.49.0
fqdn==1.5.1
h11==0.14.0
httpcore==1.0.4
httpx==0.27.0
idna==3.6
ipykernel==6.29.3
ipython==8.22.2
ipywidgets==8.1.2
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.3
json5==0.9.22
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.9.0
jupyter-lsp==2.2.4
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyter_server==2.13.0
jupyter_server_terminals==0.5.2
jupyterlab==4.1.4
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.3
jupyterlab_widgets==3.0.10
kiwisolver==1.4.5
llvmlite==0.42.0
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mistune==3.0.2
nbclient==0.9.0
nbconvert==7.16.2
nbformat==5.9.2
nest-asyncio==1.6.0
notebook==7.1.1
notebook_shim==0.2.4
numba==0.59.0
numbagg==0.8.0
numpy==1.26.4
overrides==7.7.0
packaging==23.2
pandas==2.2.1
pandocfilters==1.5.1
parso==0.8.3
pexpect==4.9.0
pillow==10.2.0
platformdirs==4.2.0
prometheus_client==0.20.0
prompt-toolkit==3.0.43
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pycparser==2.21
Pygments==2.17.2
pyparsing==3.1.2
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
pytz==2024.1
PyYAML==6.0.1
pyzmq==25.1.2
qtconsole==5.5.1
QtPy==2.4.1
referencing==0.33.0
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.18.0
Send2Trash==1.8.2
six==1.16.0
sniffio==1.3.1
soupsieve==2.5
stack-data==0.6.3
terminado==0.18.0
tinycss2==1.2.1
tomli==2.0.1
tornado==6.4
traitlets==5.14.1
types-python-dateutil==2.8.19.20240106
typing_extensions==4.10.0
tzdata==2024.1
uri-template==1.3.0
urllib3==2.2.1
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
widgetsnbextension==4.0.10
xarray==2024.2.0
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8811/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2060883540,PR_kwDOAMm_X85i-ZWI,8577,Interpolate na: Fix #7665 and introduce arguments similar to pandas,42680748,open,0,,,0,2023-12-30T23:28:47Z,2023-12-30T23:28:47Z,,CONTRIBUTOR,,0,pydata/xarray/pulls/8577,"- [x] Closes #7665
- [x] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
This is an attempt to close #7665 and combine the current possibilities from xarray (max_gap) and pandas (limit_direction, limit_area) regarding interpolation of nan values. Please see also my comments in #7665 for the motivation.
This PR already involves a full implementation, documentation and corresponding tests, but before any final polishing, I want to hear your thoughts. Specifically, I think the API and default options need to be discussed. (See the proposed documentation of DataArray.interpolate_na() / Dataset.interpolate_na() for the current state)
Implementation: Basically, I use ffill and bfill to calculate the coordinate of the left/right edge for every gap in the data. Based on edge coordinates, all masks (limit, limit_area, max_gap) are created.
On the long term, it might be interesting to provide those arguments to other na-filling methods as well (ffill, bfill, fillna).
# Things to consider
## limit_direction=forward
Pros:
- Backward compatible: If limit is not None, this is the current behaviour (see #7665)
- Pandas compatible: Forward is the pandas default.
Cons:
- `limit_direction=both` feels more natural as default. If the user does `interpolate_na('x', fill_value='extrapolate')`, in my opinion they will expect all nans to be filled, including both boundaries. In contrast to pandas, this was the case in xarray before, but not anymore now if we follow pandas and set `limit_direction=forward`. `both` would also increase performance, since no restrictions need to be applied.
## limit_use_coordinates=False
Pros:
- Backward compatible
- Pandas compatible
-> Both xarray and pandas have no support for coordinate based limits so far.
Cons:
- Inconsistent with the current default of `use_coordinates=True`
Generally, one might discuss if this separate argument is necessary or only one argument `use_coordinates` is sufficient. Imo, if the grid is irregular and `use_coordinates=True`, there is not a lot of sense in specifying the limit as a fixed number of grid cells. Alternatively, we could allow a three-tuple like `use_coordinates=(True, True, False)` to specify the index for interpolation, limit and max_gap separately (or something similar).
## use_coordinates=True
So far, if there is no coordinate for `dim`, interpolation will succeed, falling silently back to a linearly increasing index. I feel, for `use_coordinate=True`, we should fail and inform the user to set use_coordinate=False if they really want a linear index. However, this is a breaking change.
Maybe we can keep this behaviour with `use_coordinate=None` as new default option (= True if coord existent, else linear).
## Performance
On my machine, the new limit implementation based on ffill/bfill seems to be a little less performant (10%) than the old one (based on rolling). There might be potential for improvements.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8577/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1615645463,PR_kwDOAMm_X85Ll0US,7598,Fix missing 'dim' argument in _get_nan_block_lengths,42680748,closed,0,,,3,2023-03-08T17:28:56Z,2023-03-23T16:04:32Z,2023-03-16T18:55:56Z,CONTRIBUTOR,,0,pydata/xarray/pulls/7598,"* Add missing dim argument (GH7597)
* Append a nan gap at the end of existing tests cases
- [x] Closes #7597
- [x] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7598/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1615599224,I_kwDOAMm_X85gTBZ4,7597,Interpolate_na: max_map argument not working at array boundaries,42680748,closed,0,,,6,2023-03-08T16:56:36Z,2023-03-16T18:55:58Z,2023-03-16T18:55:58Z,CONTRIBUTOR,,,,"### What happened?
In the case of multidimensional arrays, the `max_gap` argument of `interpolate_na` is currently not working correctly at the array boundaries. This is likely due to a missing ""dim"" argument in the max() aggregation in `xarray.core.missing._get_nan_block_lengths`, I think.
### What did you expect to happen?
In the following code example, due to `max_gap=2`, no extrapolation should be performed for the second row. Currently, this is the case, the output created is:
```
array([[1., 2., 3., 4., 5.],
[1., 2., 3., 4., 5.]])
Coordinates:
* x (x) int64 0 1
* y (y) int64 0 1 2 3 4
```
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
da=xr.DataArray([[1, 2,3,4, np.nan],[1,2, np.nan, np.nan, np.nan]], coords=[('x', [0,1]), ('y', [0,1,2,3,4])])
da_interp=da.interpolate_na(dim='y', max_gap=2, fill_value='extrapolate')
print(da_interp)
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
I added the missing dim argument and adapted the test cases (Currently, there was no test case for fully multidimensional arrays with a gap at the end).
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-135-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.0
xarray: 2023.2.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.8.1
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.10.2
distributed: None
matplotlib: 3.6.3
cartopy: None
seaborn: None
numbagg: 0.2.1
fsspec: 2022.10.0
cupy: None
pint: 0.20.1
sparse: None
flox: 0.6.8
numpy_groupies: 0.9.20
setuptools: 58.1.0
pip: 23.0.1
conda: None
pytest: None
mypy: None
IPython: 8.6.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7597/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue