id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1637898633,I_kwDOAMm_X85hoFmJ,7665,Interpolate_na: Rework 'limit' argument documentation/implementation,42680748,open,0,,,6,2023-03-23T16:46:39Z,2024-03-13T17:53:58Z,,CONTRIBUTOR,,,,"### What is your issue? Currently, the 'limit' argument of `interpolate_na` shows some counterintuitive/undocumented behaviour. Take the following example: ```python import xarray as xr import numpy as np n=np.nan da=xr.DataArray([n, n, n, 4, 5, n ,n ,n], dims=[""y""]) da.interpolate_na('y', limit=1, fill_value='extrapolate') ``` This will produce the following result: ``` array([ 1., nan, nan, 4., 5., 6., nan, nan]) ``` Two things are surprising, in my opinion: 1. The interpolated value `1` at the beginning is far from any of the given values 2. The filling is done only towards the 'right'. This asymmetric behaviour is not mentioned in the documentation. ## Comparison to pandas Similar behaviour can be created using pandas with the following arguments: ```python da=xr.DataArray([n, n, n, 4, 5, n ,n ,n], dims=[""y""]) dap=da.to_pandas() dap.interpolate(method='slinear', limit=1, limit_direction='forward', fill_value='extrapolate') ```
Output ``` y 0 NaN 1 NaN 2 NaN 3 4.0 4 5.0 5 6.0 6 NaN 7 NaN dtype: float64 ```
This is equivalent to the current xarray behaviour, except there is no `1` at the beginning. ## Cause Currently, the fill mask in xarray is implemented using a rolling window operation, where values outside the array are assumed to be valid (therefore the `1`). See `xarray.core.missing._get_valid_fill_mask` ## Possible Solutions ### Boundary Issue Concerning the `1` at the beginning: I think this should be considered a bug. It is likely not what you would expect if you specify a limit. As stated, pandas does not create it as well. ### Asymmetric Filling Concerning the asymmetric filling, I see two options: 1. No changes to the code, but mention in the documentation that (effectively), a forward-fill is done. 2. Make something similar to [what pandas is doing](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html). In pandas, there are two additional arguments controlling the limit behaviour: `limit_direction` is controlling the fill direction (left, right or both). `limit_area` effectively controls if we only do interpolation or allow for extrapolation as well. What do you think?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7665/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2174011115,I_kwDOAMm_X86BlMbr,8811,Rolling operations with numbagg produce invalid values after numpy.inf,42680748,open,0,,,7,2024-03-07T14:35:24Z,2024-03-12T17:42:33Z,,CONTRIBUTOR,,,,"### What is your issue? If an array contains `np.inf` and a rolling operation is applied, all values after this one are `nan` if numbagg is used. Take the following example: ```python import xarray as xr import numpy as np xr.set_options(use_numbagg=False) da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x']) da.rolling(x=2).sum() ``` Output ``` Size: 88B array([nan, 3., 5., inf, inf, 9., 11., 13., 15., 17., 19.]) Dimensions without coordinates: x ``` With Numbagg: ```python xr.set_options(use_numbagg=True) da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x']) print(da.rolling(x=2).sum()) ``` Output ``` Size: 88B array([nan, 3., 5., inf, inf, nan, nan, nan, nan, nan, nan]) Dimensions without coordinates: x ``` ### What did I expect? I expected no user-visible changes in the output values if numbagg is activated. Maybe, this is not a bug, but expected behaviour for numbagg. The following warning was raised from the second call: ``` .../Local/virtual_environments/xarray_performance/lib/python3.10/site-packages/numbagg/decorators.py:247: RuntimeWarning: invalid value encountered in move_sum return gufunc(*arr, window, min_count, axis=axis, **kwargs) ``` If this is expected, I think it would be good to have a page in the documentation which lists the downsides and limitations of the various tool to accelerate xarray. From the current [installation docs](https://docs.xarray.dev/en/v2024.02.0/getting-started-guide/installing.html#for-accelerating-xarray), I assumed I just need to install numbagg/bottleneck to make xarray faster without any changes in output values. ### Environment ``` xarray==2024.2.0 numbagg==0.8.0 ```
Package Versions ```txt anyio==4.3.0 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asttokens==2.4.1 async-lru==2.0.4 attrs==23.2.0 Babel==2.14.0 beautifulsoup4==4.12.3 bleach==6.1.0 certifi==2024.2.2 cffi==1.16.0 charset-normalizer==3.3.2 comm==0.2.1 contourpy==1.2.0 cycler==0.12.1 debugpy==1.8.1 decorator==5.1.1 defusedxml==0.7.1 exceptiongroup==1.2.0 executing==2.0.1 fastjsonschema==2.19.1 fonttools==4.49.0 fqdn==1.5.1 h11==0.14.0 httpcore==1.0.4 httpx==0.27.0 idna==3.6 ipykernel==6.29.3 ipython==8.22.2 ipywidgets==8.1.2 isoduration==20.11.0 jedi==0.19.1 Jinja2==3.1.3 json5==0.9.22 jsonpointer==2.4 jsonschema==4.21.1 jsonschema-specifications==2023.12.1 jupyter==1.0.0 jupyter-console==6.6.3 jupyter-events==0.9.0 jupyter-lsp==2.2.4 jupyter_client==8.6.0 jupyter_core==5.7.1 jupyter_server==2.13.0 jupyter_server_terminals==0.5.2 jupyterlab==4.1.4 jupyterlab_pygments==0.3.0 jupyterlab_server==2.25.3 jupyterlab_widgets==3.0.10 kiwisolver==1.4.5 llvmlite==0.42.0 MarkupSafe==2.1.5 matplotlib==3.8.3 matplotlib-inline==0.1.6 mistune==3.0.2 nbclient==0.9.0 nbconvert==7.16.2 nbformat==5.9.2 nest-asyncio==1.6.0 notebook==7.1.1 notebook_shim==0.2.4 numba==0.59.0 numbagg==0.8.0 numpy==1.26.4 overrides==7.7.0 packaging==23.2 pandas==2.2.1 pandocfilters==1.5.1 parso==0.8.3 pexpect==4.9.0 pillow==10.2.0 platformdirs==4.2.0 prometheus_client==0.20.0 prompt-toolkit==3.0.43 psutil==5.9.8 ptyprocess==0.7.0 pure-eval==0.2.2 pycparser==2.21 Pygments==2.17.2 pyparsing==3.1.2 python-dateutil==2.9.0.post0 python-json-logger==2.0.7 pytz==2024.1 PyYAML==6.0.1 pyzmq==25.1.2 qtconsole==5.5.1 QtPy==2.4.1 referencing==0.33.0 requests==2.31.0 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rpds-py==0.18.0 Send2Trash==1.8.2 six==1.16.0 sniffio==1.3.1 soupsieve==2.5 stack-data==0.6.3 terminado==0.18.0 tinycss2==1.2.1 tomli==2.0.1 tornado==6.4 traitlets==5.14.1 types-python-dateutil==2.8.19.20240106 typing_extensions==4.10.0 tzdata==2024.1 uri-template==1.3.0 urllib3==2.2.1 wcwidth==0.2.13 webcolors==1.13 webencodings==0.5.1 websocket-client==1.7.0 widgetsnbextension==4.0.10 xarray==2024.2.0 ```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8811/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1615599224,I_kwDOAMm_X85gTBZ4,7597,Interpolate_na: max_map argument not working at array boundaries,42680748,closed,0,,,6,2023-03-08T16:56:36Z,2023-03-16T18:55:58Z,2023-03-16T18:55:58Z,CONTRIBUTOR,,,,"### What happened? In the case of multidimensional arrays, the `max_gap` argument of `interpolate_na` is currently not working correctly at the array boundaries. This is likely due to a missing ""dim"" argument in the max() aggregation in `xarray.core.missing._get_nan_block_lengths`, I think. ### What did you expect to happen? In the following code example, due to `max_gap=2`, no extrapolation should be performed for the second row. Currently, this is the case, the output created is: ``` array([[1., 2., 3., 4., 5.], [1., 2., 3., 4., 5.]]) Coordinates: * x (x) int64 0 1 * y (y) int64 0 1 2 3 4 ``` ### Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np da=xr.DataArray([[1, 2,3,4, np.nan],[1,2, np.nan, np.nan, np.nan]], coords=[('x', [0,1]), ('y', [0,1,2,3,4])]) da_interp=da.interpolate_na(dim='y', max_gap=2, fill_value='extrapolate') print(da_interp) ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? I added the missing dim argument and adapted the test cases (Currently, there was no test case for fully multidimensional arrays with a gap at the end). ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-135-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0 xarray: 2023.2.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.8.1 netCDF4: 1.6.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.10.2 distributed: None matplotlib: 3.6.3 cartopy: None seaborn: None numbagg: 0.2.1 fsspec: 2022.10.0 cupy: None pint: 0.20.1 sparse: None flox: 0.6.8 numpy_groupies: 0.9.20 setuptools: 58.1.0 pip: 23.0.1 conda: None pytest: None mypy: None IPython: 8.6.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7597/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue