id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1637898633,I_kwDOAMm_X85hoFmJ,7665,Interpolate_na: Rework 'limit' argument documentation/implementation,42680748,open,0,,,6,2023-03-23T16:46:39Z,2024-03-13T17:53:58Z,,CONTRIBUTOR,,,,"### What is your issue?
Currently, the 'limit' argument of `interpolate_na` shows some counterintuitive/undocumented behaviour.
Take the following example:
```python
import xarray as xr
import numpy as np
n=np.nan
da=xr.DataArray([n, n, n, 4, 5, n ,n ,n], dims=[""y""])
da.interpolate_na('y', limit=1, fill_value='extrapolate')
```
This will produce the following result:
```
array([ 1., nan, nan, 4., 5., 6., nan, nan])
```
Two things are surprising, in my opinion:
1. The interpolated value `1` at the beginning is far from any of the given values
2. The filling is done only towards the 'right'. This asymmetric behaviour is not mentioned in the documentation.
## Comparison to pandas
Similar behaviour can be created using pandas with the following arguments:
```python
da=xr.DataArray([n, n, n, 4, 5, n ,n ,n], dims=[""y""])
dap=da.to_pandas()
dap.interpolate(method='slinear', limit=1, limit_direction='forward', fill_value='extrapolate')
```
Output
```
y
0 NaN
1 NaN
2 NaN
3 4.0
4 5.0
5 6.0
6 NaN
7 NaN
dtype: float64
```
This is equivalent to the current xarray behaviour, except there is no `1` at the beginning.
## Cause
Currently, the fill mask in xarray is implemented using a rolling window operation, where values outside the array are assumed to be valid (therefore the `1`). See `xarray.core.missing._get_valid_fill_mask`
## Possible Solutions
### Boundary Issue
Concerning the `1` at the beginning: I think this should be considered a bug. It is likely not what you would expect if you specify a limit. As stated, pandas does not create it as well.
### Asymmetric Filling
Concerning the asymmetric filling, I see two options:
1. No changes to the code, but mention in the documentation that (effectively), a forward-fill is done.
2. Make something similar to [what pandas is doing](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html). In pandas, there are two additional arguments controlling the limit behaviour: `limit_direction` is controlling the fill direction (left, right or both). `limit_area` effectively controls if we only do interpolation or allow for extrapolation as well.
What do you think?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7665/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2174011115,I_kwDOAMm_X86BlMbr,8811,Rolling operations with numbagg produce invalid values after numpy.inf,42680748,open,0,,,7,2024-03-07T14:35:24Z,2024-03-12T17:42:33Z,,CONTRIBUTOR,,,,"### What is your issue?
If an array contains `np.inf` and a rolling operation is applied, all values after this one are `nan` if numbagg is used. Take the following example:
```python
import xarray as xr
import numpy as np
xr.set_options(use_numbagg=False)
da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x'])
da.rolling(x=2).sum()
```
Output
```
Size: 88B
array([nan, 3., 5., inf, inf, 9., 11., 13., 15., 17., 19.])
Dimensions without coordinates: x
```
With Numbagg:
```python
xr.set_options(use_numbagg=True)
da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x'])
print(da.rolling(x=2).sum())
```
Output
```
Size: 88B
array([nan, 3., 5., inf, inf, nan, nan, nan, nan, nan, nan])
Dimensions without coordinates: x
```
### What did I expect?
I expected no user-visible changes in the output values if numbagg is activated.
Maybe, this is not a bug, but expected behaviour for numbagg. The following warning was raised from the second call:
```
.../Local/virtual_environments/xarray_performance/lib/python3.10/site-packages/numbagg/decorators.py:247: RuntimeWarning: invalid value encountered in move_sum
return gufunc(*arr, window, min_count, axis=axis, **kwargs)
```
If this is expected, I think it would be good to have a page in the documentation which lists the downsides and limitations of the various tool to accelerate xarray. From the current [installation docs](https://docs.xarray.dev/en/v2024.02.0/getting-started-guide/installing.html#for-accelerating-xarray), I assumed I just need to install numbagg/bottleneck to make xarray faster without any changes in output values.
### Environment
```
xarray==2024.2.0
numbagg==0.8.0
```
Package Versions
```txt
anyio==4.3.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
Babel==2.14.0
beautifulsoup4==4.12.3
bleach==6.1.0
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
comm==0.2.1
contourpy==1.2.0
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
exceptiongroup==1.2.0
executing==2.0.1
fastjsonschema==2.19.1
fonttools==4.49.0
fqdn==1.5.1
h11==0.14.0
httpcore==1.0.4
httpx==0.27.0
idna==3.6
ipykernel==6.29.3
ipython==8.22.2
ipywidgets==8.1.2
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.3
json5==0.9.22
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.9.0
jupyter-lsp==2.2.4
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyter_server==2.13.0
jupyter_server_terminals==0.5.2
jupyterlab==4.1.4
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.3
jupyterlab_widgets==3.0.10
kiwisolver==1.4.5
llvmlite==0.42.0
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mistune==3.0.2
nbclient==0.9.0
nbconvert==7.16.2
nbformat==5.9.2
nest-asyncio==1.6.0
notebook==7.1.1
notebook_shim==0.2.4
numba==0.59.0
numbagg==0.8.0
numpy==1.26.4
overrides==7.7.0
packaging==23.2
pandas==2.2.1
pandocfilters==1.5.1
parso==0.8.3
pexpect==4.9.0
pillow==10.2.0
platformdirs==4.2.0
prometheus_client==0.20.0
prompt-toolkit==3.0.43
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pycparser==2.21
Pygments==2.17.2
pyparsing==3.1.2
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
pytz==2024.1
PyYAML==6.0.1
pyzmq==25.1.2
qtconsole==5.5.1
QtPy==2.4.1
referencing==0.33.0
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.18.0
Send2Trash==1.8.2
six==1.16.0
sniffio==1.3.1
soupsieve==2.5
stack-data==0.6.3
terminado==0.18.0
tinycss2==1.2.1
tomli==2.0.1
tornado==6.4
traitlets==5.14.1
types-python-dateutil==2.8.19.20240106
typing_extensions==4.10.0
tzdata==2024.1
uri-template==1.3.0
urllib3==2.2.1
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
widgetsnbextension==4.0.10
xarray==2024.2.0
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8811/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1615599224,I_kwDOAMm_X85gTBZ4,7597,Interpolate_na: max_map argument not working at array boundaries,42680748,closed,0,,,6,2023-03-08T16:56:36Z,2023-03-16T18:55:58Z,2023-03-16T18:55:58Z,CONTRIBUTOR,,,,"### What happened?
In the case of multidimensional arrays, the `max_gap` argument of `interpolate_na` is currently not working correctly at the array boundaries. This is likely due to a missing ""dim"" argument in the max() aggregation in `xarray.core.missing._get_nan_block_lengths`, I think.
### What did you expect to happen?
In the following code example, due to `max_gap=2`, no extrapolation should be performed for the second row. Currently, this is the case, the output created is:
```
array([[1., 2., 3., 4., 5.],
[1., 2., 3., 4., 5.]])
Coordinates:
* x (x) int64 0 1
* y (y) int64 0 1 2 3 4
```
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
da=xr.DataArray([[1, 2,3,4, np.nan],[1,2, np.nan, np.nan, np.nan]], coords=[('x', [0,1]), ('y', [0,1,2,3,4])])
da_interp=da.interpolate_na(dim='y', max_gap=2, fill_value='extrapolate')
print(da_interp)
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
I added the missing dim argument and adapted the test cases (Currently, there was no test case for fully multidimensional arrays with a gap at the end).
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-135-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.0
xarray: 2023.2.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.8.1
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.10.2
distributed: None
matplotlib: 3.6.3
cartopy: None
seaborn: None
numbagg: 0.2.1
fsspec: 2022.10.0
cupy: None
pint: 0.20.1
sparse: None
flox: 0.6.8
numpy_groupies: 0.9.20
setuptools: 58.1.0
pip: 23.0.1
conda: None
pytest: None
mypy: None
IPython: 8.6.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7597/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue