home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 822320976 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • dcherian 3
  • observingClouds 2
  • xiongxiongufl 1
  • snowman2 1
  • lewisblake 1

author_association 3

  • CONTRIBUTOR 3
  • MEMBER 3
  • NONE 2

issue 1

  • KeyError when selecting "nearest" data with given tolerance · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1057699042 https://github.com/pydata/xarray/issues/4995#issuecomment-1057699042 https://api.github.com/repos/pydata/xarray/issues/4995 IC_kwDOAMm_X84_CzTi xiongxiongufl 3604210 2022-03-03T05:47:56Z 2022-10-25T14:35:35Z NONE

@observingClouds I think a fill_value arg in sel as in reindex is still warranted. Although reindex as @dcherian suggested works for cases the dims match the target dims, in cases where the dims don't match, e.g., in the examples of sel: https://xarray.pydata.org/en/stable/generated/xarray.DataArray.sel.html. It'd cause error: ValueError: Indexer has dimensions ('points',) that are different from that to be indexed along x

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  KeyError when selecting "nearest" data with given tolerance  822320976
1290446119 https://github.com/pydata/xarray/issues/4995#issuecomment-1290446119 https://api.github.com/repos/pydata/xarray/issues/4995 IC_kwDOAMm_X85M6qUn lewisblake 24661500 2022-10-25T12:11:45Z 2022-10-25T12:11:45Z NONE

I think the original scope of this issue is still valid. I also would expect that indices that are not within the tolerance would simply be dropped. While it might be nice in some situations, I don't really think that specifying a fill value is needed in order to accomplish this.

The issue I'm facing with reindex is that it doesn't really scale as well as sel does, significantly reducing the amount of data I can handle. I would like to humbly suggest that there still might be interest in seeing this functionality.

Unfortunately the testing logs from #4996 have expired so it's not clear why the tests failed for this PR before it was closed.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  KeyError when selecting "nearest" data with given tolerance  822320976
1110101560 https://github.com/pydata/xarray/issues/4995#issuecomment-1110101560 https://api.github.com/repos/pydata/xarray/issues/4995 IC_kwDOAMm_X85CKs44 snowman2 8699967 2022-04-26T18:09:18Z 2022-04-26T18:12:14Z CONTRIBUTOR

Example using nearest & tolerance with reindex & sel when dims don't match based on the example in sel:

```python import numpy import xarray

da = xarray.DataArray( numpy.arange(25).reshape(5, 5), coords={"x": numpy.arange(5), "y": numpy.arange(5)}, dims=("x", "y"), ) tgt_x = numpy.linspace(0, 4, num=5) + 0.5 tgt_y = numpy.linspace(0, 4, num=5) + 0.5 da = da.reindex( x=tgt_x, y=tgt_y, method="nearest", tolerance=0.2, fill_value=numpy.nan ).sel( x=xarray.DataArray(tgt_x, dims="points"), y=xarray.DataArray(tgt_y, dims="points"), ) Output: <xarray.DataArray (points: 5)> array([nan, nan, nan, nan, nan]) Coordinates: x (points) float64 0.5 1.5 2.5 3.5 4.5 y (points) float64 0.5 1.5 2.5 3.5 4.5 Dimensions without coordinates: points `` Side note: I don't think it makes sense to addfill_valuetoselas it would require adding new coordinates that didn't exist previously. Callingreindex` first makes that more clear in my opinion.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  KeyError when selecting "nearest" data with given tolerance  822320976
799047819 https://github.com/pydata/xarray/issues/4995#issuecomment-799047819 https://api.github.com/repos/pydata/xarray/issues/4995 MDEyOklzc3VlQ29tbWVudDc5OTA0NzgxOQ== observingClouds 43613877 2021-03-15T02:28:51Z 2021-03-15T02:28:51Z CONTRIBUTOR

Thanks @dcherian, this is doing the job. I'll close this issue as there seems to be no need to implement this into the sel method.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  KeyError when selecting "nearest" data with given tolerance  822320976
791145448 https://github.com/pydata/xarray/issues/4995#issuecomment-791145448 https://api.github.com/repos/pydata/xarray/issues/4995 MDEyOklzc3VlQ29tbWVudDc5MTE0NTQ0OA== dcherian 2448579 2021-03-05T04:32:29Z 2021-03-05T04:32:29Z MEMBER

Actually does reindex do what you want, the returned coordinate labels will be what you provide.

```

ds.reindex(lat=[5,15,40], method="nearest", tolerance=5, fill_value=-999) <xarray.DataArray (lat: 2)> array([1, 2, -999]) Coordinates: * lat (lat) int64 5 15 40 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  KeyError when selecting "nearest" data with given tolerance  822320976
791021835 https://github.com/pydata/xarray/issues/4995#issuecomment-791021835 https://api.github.com/repos/pydata/xarray/issues/4995 MDEyOklzc3VlQ29tbWVudDc5MTAyMTgzNQ== dcherian 2448579 2021-03-04T23:16:00Z 2021-03-04T23:16:00Z MEMBER

in using a fill_value is that the indexing has to modify the data ( insert e.g. -999) and also 'invent' a new coordinate point ( here 40).

This seems totally doable though.

One fill_value might not fit to all data arrays

In quite a few functions, fill_value can be a dict mapping variable name to a value so this is workable.

Let's see what others think.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  KeyError when selecting "nearest" data with given tolerance  822320976
791019238 https://github.com/pydata/xarray/issues/4995#issuecomment-791019238 https://api.github.com/repos/pydata/xarray/issues/4995 MDEyOklzc3VlQ29tbWVudDc5MTAxOTIzOA== observingClouds 43613877 2021-03-04T23:10:11Z 2021-03-04T23:10:11Z CONTRIBUTOR

Introducing a fill_value seems like a good idea, such that the size of the output does not change compared to the intended selection. Choosing the original/requested coordinate as a label for the missing datapoint seems to be a valid choice because this position has been checked for valid data nearby without success. I would suggest, that the fill_value should then be automatically determined from the _FillValue, the datatype and only at last requires the fill_value to be set.

However, the shortcoming that I see in using a fill_value is that the indexing has to modify the data ( insert e.g. -999) and also 'invent' a new coordinate point ( here 40). This gets reasonably complex, when applying to a dataset with DataArrays of different types, e.g. ```python import numpy as np import xarray as xr

ds = xr.Dataset() ds['data1'] = xr.DataArray(np.array([1,2,3,4,5], dtype=int), dims=["lat"], coords={'lat':[10,20,30,50,60]}) ds['data2'] = xr.DataArray(np.array([1,2,3,4,5], dtype=float), dims=["lat"], coords={'lat':[10,20,30,50,60]}) `` Onefill_valuemight not fit to all data arrays being it because of the datatype or the actual data. E.g.-999might be a goodfill_value` for one DataArray but a valid datapoint in another one.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  KeyError when selecting "nearest" data with given tolerance  822320976
790878651 https://github.com/pydata/xarray/issues/4995#issuecomment-790878651 https://api.github.com/repos/pydata/xarray/issues/4995 MDEyOklzc3VlQ29tbWVudDc5MDg3ODY1MQ== dcherian 2448579 2021-03-04T19:40:29Z 2021-03-04T19:40:29Z MEMBER

```

ds.sel(lat=[5,15,40], method="nearest", tolerance=5) <xarray.DataArray (lat: 2)> array([1, 2]) Coordinates: * lat (lat) int64 10 20 ```

This is a very surprising result, you've asked for values at three points but received two back.

The following (specifying fill_value) seems like better behaviour to me but how do you choose the coordinate label (here I picked 40 since that was provided to sel) ```

ds.sel(lat=[5,15,40], method="nearest", tolerance=5, fill_value=-999) <xarray.DataArray (lat: 2)> array([1, 2, -999]) Coordinates: * lat (lat) int64 10 20 40 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  KeyError when selecting "nearest" data with given tolerance  822320976

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.097ms · About: xarray-datasette