home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1022478180

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1022478180 I_kwDOAMm_X8488cdk 5852 Surprising behaviour of Dataset/DataArray.interp() with NaN entries 3958036 open 0     0 2021-10-11T09:33:11Z 2021-10-11T09:33:11Z   CONTRIBUTOR      

I think this is due to documented 'undefined behaviour' of scipy.interpolate.interp1d, so not really a bug, but I think it would be more user-friendly if xarray gave an error in this case rather than producing an 'incorrect' result.

What happened:

If a DataArray contains a NaN value and is interpolated, output values that do not depend on the entry that was NaN may still be NaN.

What you expected to happen:

The docs for scipy.interpolate.interp1d say

Calling interp1d with NaNs present in input values results in undefined behaviour.

which explain the output below, and presumably mean it is not fixable on the xarray side (short of some ugly work-around). I think it would be good though to check for NaNs in DataArray/Dataset.interp(), and if they are present raise an exception (or possibly a warning?) about 'undefined behaviour'.

scipy.interpolate.interp2d has a similar note, while scipy.interpolate.interpn does not mention it (but has very limited information).

What I'd initially expected was an output would be valid at locations in the array that shouldn't depend on the NaN input: interpolating a 2d DataArray (with dims x and y) in the x-dimension, if only one y-index in the input has a NaN value, that y-index in the output might contain NaNs, but the others should be OK.

Minimal Complete Verifiable Example:

```python import numpy as np import xarray as xr

da = xr.DataArray(np.ones([3, 4]), dims=("x", "y"))

da[0, 0] = float("nan")

newx = np.linspace(0., 3., 5)

interp_da = da.interp(x=newx)

print(interp_da) ```

On my system, this gives output: <xarray.DataArray (x: 5, y: 4)> array([[nan, 1., 1., 1.], [nan, 1., 1., 1.], [ 1., 1., 1., 1.], [nan, nan, nan, nan], [nan, nan, nan, nan]]) Coordinates: * x (x) float64 0.0 0.75 1.5 2.25 3.0 Dimensions without coordinates: y [Surprisingly, I get the same output even using method="nearest".]

You might expect at least the following, with NaN only at y=0: <xarray.DataArray (x: 5, y: 4)> array([[nan, 1., 1., 1.], [nan, 1., 1., 1.], [ 1., 1., 1., 1.], [nan, 1., 1., 1.], [nan, 1., 1., 1.]]) Coordinates: * x (x) float64 0.0 0.75 1.5 2.25 3.0 Dimensions without coordinates: y

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.11.0-37-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.3.1 numpy: 1.21.1 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: None h5py: 3.3.0 Nio: None zarr: None cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.07.2 distributed: 2021.07.2 matplotlib: 3.4.2 cartopy: None seaborn: 0.11.1 numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.2.4 conda: 4.10.3 pytest: 6.2.4 IPython: 7.26.0 sphinx: 4.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5852/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.861ms · About: xarray-datasette