home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

3 rows where type = "issue" and user = 42680748 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • open 2
  • closed 1

type 1

  • issue · 3 ✖

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1637898633 I_kwDOAMm_X85hoFmJ 7665 Interpolate_na: Rework 'limit' argument documentation/implementation Ockenfuss 42680748 open 0     6 2023-03-23T16:46:39Z 2024-03-13T17:53:58Z   CONTRIBUTOR      

What is your issue?

Currently, the 'limit' argument of interpolate_na shows some counterintuitive/undocumented behaviour. Take the following example: python import xarray as xr import numpy as np n=np.nan da=xr.DataArray([n, n, n, 4, 5, n ,n ,n], dims=["y"]) da.interpolate_na('y', limit=1, fill_value='extrapolate') This will produce the following result: array([ 1., nan, nan, 4., 5., 6., nan, nan]) Two things are surprising, in my opinion:

  1. The interpolated value 1 at the beginning is far from any of the given values
  2. The filling is done only towards the 'right'. This asymmetric behaviour is not mentioned in the documentation.

Comparison to pandas

Similar behaviour can be created using pandas with the following arguments: python da=xr.DataArray([n, n, n, 4, 5, n ,n ,n], dims=["y"]) dap=da.to_pandas() dap.interpolate(method='slinear', limit=1, limit_direction='forward', fill_value='extrapolate')

Output ``` y 0 NaN 1 NaN 2 NaN 3 4.0 4 5.0 5 6.0 6 NaN 7 NaN dtype: float64 ```

This is equivalent to the current xarray behaviour, except there is no 1 at the beginning.

Cause

Currently, the fill mask in xarray is implemented using a rolling window operation, where values outside the array are assumed to be valid (therefore the 1). See xarray.core.missing._get_valid_fill_mask

Possible Solutions

Boundary Issue

Concerning the 1 at the beginning: I think this should be considered a bug. It is likely not what you would expect if you specify a limit. As stated, pandas does not create it as well.

Asymmetric Filling

Concerning the asymmetric filling, I see two options: 1. No changes to the code, but mention in the documentation that (effectively), a forward-fill is done. 2. Make something similar to what pandas is doing. In pandas, there are two additional arguments controlling the limit behaviour: limit_direction is controlling the fill direction (left, right or both). limit_area effectively controls if we only do interpolation or allow for extrapolation as well.

What do you think?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7665/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2174011115 I_kwDOAMm_X86BlMbr 8811 Rolling operations with numbagg produce invalid values after numpy.inf Ockenfuss 42680748 open 0     7 2024-03-07T14:35:24Z 2024-03-12T17:42:33Z   CONTRIBUTOR      

What is your issue?

If an array contains np.inf and a rolling operation is applied, all values after this one are nan if numbagg is used. Take the following example:

python import xarray as xr import numpy as np xr.set_options(use_numbagg=False) da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x']) da.rolling(x=2).sum() Output <xarray.DataArray (x: 11)> Size: 88B array([nan, 3., 5., inf, inf, 9., 11., 13., 15., 17., 19.]) Dimensions without coordinates: x With Numbagg: python xr.set_options(use_numbagg=True) da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x']) print(da.rolling(x=2).sum()) Output <xarray.DataArray (x: 11)> Size: 88B array([nan, 3., 5., inf, inf, nan, nan, nan, nan, nan, nan]) Dimensions without coordinates: x

What did I expect?

I expected no user-visible changes in the output values if numbagg is activated.

Maybe, this is not a bug, but expected behaviour for numbagg. The following warning was raised from the second call: .../Local/virtual_environments/xarray_performance/lib/python3.10/site-packages/numbagg/decorators.py:247: RuntimeWarning: invalid value encountered in move_sum return gufunc(*arr, window, min_count, axis=axis, **kwargs)

If this is expected, I think it would be good to have a page in the documentation which lists the downsides and limitations of the various tool to accelerate xarray. From the current installation docs, I assumed I just need to install numbagg/bottleneck to make xarray faster without any changes in output values.

Environment

xarray==2024.2.0 numbagg==0.8.0

Package Versions ```txt anyio==4.3.0 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asttokens==2.4.1 async-lru==2.0.4 attrs==23.2.0 Babel==2.14.0 beautifulsoup4==4.12.3 bleach==6.1.0 certifi==2024.2.2 cffi==1.16.0 charset-normalizer==3.3.2 comm==0.2.1 contourpy==1.2.0 cycler==0.12.1 debugpy==1.8.1 decorator==5.1.1 defusedxml==0.7.1 exceptiongroup==1.2.0 executing==2.0.1 fastjsonschema==2.19.1 fonttools==4.49.0 fqdn==1.5.1 h11==0.14.0 httpcore==1.0.4 httpx==0.27.0 idna==3.6 ipykernel==6.29.3 ipython==8.22.2 ipywidgets==8.1.2 isoduration==20.11.0 jedi==0.19.1 Jinja2==3.1.3 json5==0.9.22 jsonpointer==2.4 jsonschema==4.21.1 jsonschema-specifications==2023.12.1 jupyter==1.0.0 jupyter-console==6.6.3 jupyter-events==0.9.0 jupyter-lsp==2.2.4 jupyter_client==8.6.0 jupyter_core==5.7.1 jupyter_server==2.13.0 jupyter_server_terminals==0.5.2 jupyterlab==4.1.4 jupyterlab_pygments==0.3.0 jupyterlab_server==2.25.3 jupyterlab_widgets==3.0.10 kiwisolver==1.4.5 llvmlite==0.42.0 MarkupSafe==2.1.5 matplotlib==3.8.3 matplotlib-inline==0.1.6 mistune==3.0.2 nbclient==0.9.0 nbconvert==7.16.2 nbformat==5.9.2 nest-asyncio==1.6.0 notebook==7.1.1 notebook_shim==0.2.4 numba==0.59.0 numbagg==0.8.0 numpy==1.26.4 overrides==7.7.0 packaging==23.2 pandas==2.2.1 pandocfilters==1.5.1 parso==0.8.3 pexpect==4.9.0 pillow==10.2.0 platformdirs==4.2.0 prometheus_client==0.20.0 prompt-toolkit==3.0.43 psutil==5.9.8 ptyprocess==0.7.0 pure-eval==0.2.2 pycparser==2.21 Pygments==2.17.2 pyparsing==3.1.2 python-dateutil==2.9.0.post0 python-json-logger==2.0.7 pytz==2024.1 PyYAML==6.0.1 pyzmq==25.1.2 qtconsole==5.5.1 QtPy==2.4.1 referencing==0.33.0 requests==2.31.0 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rpds-py==0.18.0 Send2Trash==1.8.2 six==1.16.0 sniffio==1.3.1 soupsieve==2.5 stack-data==0.6.3 terminado==0.18.0 tinycss2==1.2.1 tomli==2.0.1 tornado==6.4 traitlets==5.14.1 types-python-dateutil==2.8.19.20240106 typing_extensions==4.10.0 tzdata==2024.1 uri-template==1.3.0 urllib3==2.2.1 wcwidth==0.2.13 webcolors==1.13 webencodings==0.5.1 websocket-client==1.7.0 widgetsnbextension==4.0.10 xarray==2024.2.0 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8811/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1615599224 I_kwDOAMm_X85gTBZ4 7597 Interpolate_na: max_map argument not working at array boundaries Ockenfuss 42680748 closed 0     6 2023-03-08T16:56:36Z 2023-03-16T18:55:58Z 2023-03-16T18:55:58Z CONTRIBUTOR      

What happened?

In the case of multidimensional arrays, the max_gap argument of interpolate_na is currently not working correctly at the array boundaries. This is likely due to a missing "dim" argument in the max() aggregation in xarray.core.missing._get_nan_block_lengths, I think.

What did you expect to happen?

In the following code example, due to max_gap=2, no extrapolation should be performed for the second row. Currently, this is the case, the output created is:

<xarray.DataArray (x: 2, y: 5)> array([[1., 2., 3., 4., 5.], [1., 2., 3., 4., 5.]]) Coordinates: * x (x) int64 0 1 * y (y) int64 0 1 2 3 4

Minimal Complete Verifiable Example

Python import xarray as xr import numpy as np da=xr.DataArray([[1, 2,3,4, np.nan],[1,2, np.nan, np.nan, np.nan]], coords=[('x', [0,1]), ('y', [0,1,2,3,4])]) da_interp=da.interpolate_na(dim='y', max_gap=2, fill_value='extrapolate') print(da_interp)

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I added the missing dim argument and adapted the test cases (Currently, there was no test case for fully multidimensional arrays with a gap at the end).

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-135-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0 xarray: 2023.2.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.8.1 netCDF4: 1.6.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.10.2 distributed: None matplotlib: 3.6.3 cartopy: None seaborn: None numbagg: 0.2.1 fsspec: 2022.10.0 cupy: None pint: 0.20.1 sparse: None flox: 0.6.8 numpy_groupies: 0.9.20 setuptools: 58.1.0 pip: 23.0.1 conda: None pytest: None mypy: None IPython: 8.6.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7597/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 45.46ms · About: xarray-datasette