home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1951543761

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1951543761 I_kwDOAMm_X850UjHR 8335 ```DataArray.sel``` can silently pick up the nearest point, even if it is far away and the query is out of bounds 8382834 open 0     13 2023-10-19T08:02:44Z 2024-04-29T23:02:31Z   CONTRIBUTOR      

What is your issue?

@paulina-t (who found a bug caused by the behavior we report here in a codebase, where it was badly messing things up).

See the example notebook at https://github.com/jerabaul29/public_bug_reports/blob/main/xarray/2023_10_18/interp.ipynb .


Problem

It is always a bit risky to interpolate / find the nearest neighbor to a query or similar, as bad things can happen if querying a value for a point that is outside of the area that is represented. Fortunately, xarray returns NaN if performing interp outside of the bounds of a dataset:

```python import xarray as xr import numpy as np

xr.version

'2023.9.0'

data = np.array([[1, 2, 3], [4, 5, 6]]) lat = [10, 20] lon = [120, 130, 140]

data_xr = xr.DataArray(data, coords={'lat':lat, 'lon':lon}, dims=['lat', 'lon'])

data_xr

<xarray.DataArray (lat: 2, lon: 3)> array([[1, 2, 3], [4, 5, 6]]) Coordinates: * lat (lat) int64 10 20 * lon (lon) int64 120 130 140

interp is civilized: rather than wildly extrapolating, it returns NaN

data_xr.interp(lat=15, lon=125)

<xarray.DataArray ()> array(3.) Coordinates: lat int64 15 lon int64 125

data_xr.interp(lat=5, lon=125)

<xarray.DataArray ()> array(nan) Coordinates: lat int64 5 lon int64 125 ```

Unfortunately, .sel will happily find the nearest neighbor of a point, even if the input point is outside of the dataset range:

```python

sel is not as civilized: it happily finds the neares neighbor, even if it is "on the one side" of the example data

data_xr.sel(lat=5, lon=125, method='nearest')

<xarray.DataArray ()> array(2) Coordinates: lat int64 10 lon int64 130 ```

This can easily cause tricky bugs.


Discussion

Would it be possible for .sel to have a behavior that makes the user aware of such issues? I.e. either:

  • print a warning on stderr
  • return NaN
  • raise an exception

when performing a .sel query that is outside of a dataset range / not in between of 2 dataset points?

I understand that finding the nearest neighbor may still be useful / wanted in some cases even when being outside of the bounds of the dataset, but the fact that this happens silently by default has been causing bugs for us. Could either this default behavior be changed, or maybe enabled with a flag (allow_extrapolate=False by default for example, so users can consciously opt it in)?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8335/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.751ms · About: xarray-datasette