id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1637898633,I_kwDOAMm_X85hoFmJ,7665,Interpolate_na: Rework 'limit' argument documentation/implementation,42680748,open,0,,,6,2023-03-23T16:46:39Z,2024-03-13T17:53:58Z,,CONTRIBUTOR,,,,"### What is your issue? Currently, the 'limit' argument of `interpolate_na` shows some counterintuitive/undocumented behaviour. Take the following example: ```python import xarray as xr import numpy as np n=np.nan da=xr.DataArray([n, n, n, 4, 5, n ,n ,n], dims=[""y""]) da.interpolate_na('y', limit=1, fill_value='extrapolate') ``` This will produce the following result: ``` array([ 1., nan, nan, 4., 5., 6., nan, nan]) ``` Two things are surprising, in my opinion: 1. The interpolated value `1` at the beginning is far from any of the given values 2. The filling is done only towards the 'right'. This asymmetric behaviour is not mentioned in the documentation. ## Comparison to pandas Similar behaviour can be created using pandas with the following arguments: ```python da=xr.DataArray([n, n, n, 4, 5, n ,n ,n], dims=[""y""]) dap=da.to_pandas() dap.interpolate(method='slinear', limit=1, limit_direction='forward', fill_value='extrapolate') ```
Output ``` y 0 NaN 1 NaN 2 NaN 3 4.0 4 5.0 5 6.0 6 NaN 7 NaN dtype: float64 ```
This is equivalent to the current xarray behaviour, except there is no `1` at the beginning. ## Cause Currently, the fill mask in xarray is implemented using a rolling window operation, where values outside the array are assumed to be valid (therefore the `1`). See `xarray.core.missing._get_valid_fill_mask` ## Possible Solutions ### Boundary Issue Concerning the `1` at the beginning: I think this should be considered a bug. It is likely not what you would expect if you specify a limit. As stated, pandas does not create it as well. ### Asymmetric Filling Concerning the asymmetric filling, I see two options: 1. No changes to the code, but mention in the documentation that (effectively), a forward-fill is done. 2. Make something similar to [what pandas is doing](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html). In pandas, there are two additional arguments controlling the limit behaviour: `limit_direction` is controlling the fill direction (left, right or both). `limit_area` effectively controls if we only do interpolation or allow for extrapolation as well. What do you think?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7665/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2174011115,I_kwDOAMm_X86BlMbr,8811,Rolling operations with numbagg produce invalid values after numpy.inf,42680748,open,0,,,7,2024-03-07T14:35:24Z,2024-03-12T17:42:33Z,,CONTRIBUTOR,,,,"### What is your issue? If an array contains `np.inf` and a rolling operation is applied, all values after this one are `nan` if numbagg is used. Take the following example: ```python import xarray as xr import numpy as np xr.set_options(use_numbagg=False) da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x']) da.rolling(x=2).sum() ``` Output ``` Size: 88B array([nan, 3., 5., inf, inf, 9., 11., 13., 15., 17., 19.]) Dimensions without coordinates: x ``` With Numbagg: ```python xr.set_options(use_numbagg=True) da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x']) print(da.rolling(x=2).sum()) ``` Output ``` Size: 88B array([nan, 3., 5., inf, inf, nan, nan, nan, nan, nan, nan]) Dimensions without coordinates: x ``` ### What did I expect? I expected no user-visible changes in the output values if numbagg is activated. Maybe, this is not a bug, but expected behaviour for numbagg. The following warning was raised from the second call: ``` .../Local/virtual_environments/xarray_performance/lib/python3.10/site-packages/numbagg/decorators.py:247: RuntimeWarning: invalid value encountered in move_sum return gufunc(*arr, window, min_count, axis=axis, **kwargs) ``` If this is expected, I think it would be good to have a page in the documentation which lists the downsides and limitations of the various tool to accelerate xarray. From the current [installation docs](https://docs.xarray.dev/en/v2024.02.0/getting-started-guide/installing.html#for-accelerating-xarray), I assumed I just need to install numbagg/bottleneck to make xarray faster without any changes in output values. ### Environment ``` xarray==2024.2.0 numbagg==0.8.0 ```
Package Versions ```txt anyio==4.3.0 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asttokens==2.4.1 async-lru==2.0.4 attrs==23.2.0 Babel==2.14.0 beautifulsoup4==4.12.3 bleach==6.1.0 certifi==2024.2.2 cffi==1.16.0 charset-normalizer==3.3.2 comm==0.2.1 contourpy==1.2.0 cycler==0.12.1 debugpy==1.8.1 decorator==5.1.1 defusedxml==0.7.1 exceptiongroup==1.2.0 executing==2.0.1 fastjsonschema==2.19.1 fonttools==4.49.0 fqdn==1.5.1 h11==0.14.0 httpcore==1.0.4 httpx==0.27.0 idna==3.6 ipykernel==6.29.3 ipython==8.22.2 ipywidgets==8.1.2 isoduration==20.11.0 jedi==0.19.1 Jinja2==3.1.3 json5==0.9.22 jsonpointer==2.4 jsonschema==4.21.1 jsonschema-specifications==2023.12.1 jupyter==1.0.0 jupyter-console==6.6.3 jupyter-events==0.9.0 jupyter-lsp==2.2.4 jupyter_client==8.6.0 jupyter_core==5.7.1 jupyter_server==2.13.0 jupyter_server_terminals==0.5.2 jupyterlab==4.1.4 jupyterlab_pygments==0.3.0 jupyterlab_server==2.25.3 jupyterlab_widgets==3.0.10 kiwisolver==1.4.5 llvmlite==0.42.0 MarkupSafe==2.1.5 matplotlib==3.8.3 matplotlib-inline==0.1.6 mistune==3.0.2 nbclient==0.9.0 nbconvert==7.16.2 nbformat==5.9.2 nest-asyncio==1.6.0 notebook==7.1.1 notebook_shim==0.2.4 numba==0.59.0 numbagg==0.8.0 numpy==1.26.4 overrides==7.7.0 packaging==23.2 pandas==2.2.1 pandocfilters==1.5.1 parso==0.8.3 pexpect==4.9.0 pillow==10.2.0 platformdirs==4.2.0 prometheus_client==0.20.0 prompt-toolkit==3.0.43 psutil==5.9.8 ptyprocess==0.7.0 pure-eval==0.2.2 pycparser==2.21 Pygments==2.17.2 pyparsing==3.1.2 python-dateutil==2.9.0.post0 python-json-logger==2.0.7 pytz==2024.1 PyYAML==6.0.1 pyzmq==25.1.2 qtconsole==5.5.1 QtPy==2.4.1 referencing==0.33.0 requests==2.31.0 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rpds-py==0.18.0 Send2Trash==1.8.2 six==1.16.0 sniffio==1.3.1 soupsieve==2.5 stack-data==0.6.3 terminado==0.18.0 tinycss2==1.2.1 tomli==2.0.1 tornado==6.4 traitlets==5.14.1 types-python-dateutil==2.8.19.20240106 typing_extensions==4.10.0 tzdata==2024.1 uri-template==1.3.0 urllib3==2.2.1 wcwidth==0.2.13 webcolors==1.13 webencodings==0.5.1 websocket-client==1.7.0 widgetsnbextension==4.0.10 xarray==2024.2.0 ```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8811/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue