home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "CONTRIBUTOR", issue = 1173497454 and user = 13662783 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • Huite · 2 ✖

issue 1

  • [FEATURE]: Add a replace method · 2 ✖

author_association 1

  • CONTRIBUTOR · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1084277555 https://github.com/pydata/xarray/issues/6377#issuecomment-1084277555 https://api.github.com/repos/pydata/xarray/issues/6377 IC_kwDOAMm_X85AoMMz Huite 13662783 2022-03-31T08:45:18Z 2022-03-31T08:45:18Z CONTRIBUTOR

@Jeitan

The coordinate is a DataArray as well, so the following would work:

```python

Example DataArray

da = xr.DataArray(np.ones((3, 3)), {"y": [50.0, 60.0, 70.0], "x": [1.0, 2.0, 3.0]}, ("y", "x"))

Replace 50.0 and 60.0 by 5.0 and 6.0 in the y coordinate

da["y"] = da["y"].replace_values([50.0, 60.0], [5.0, 6.0]) ```

Your example in the other issue mentions one of the ways you'd replace in pandas, but for a dataframe. With a dataframe, there's quite some flexibility:

python df.replace({0: 10, 1: 100}) df.replace({'A': 0, 'B': 5}, 100) df.replace({'A': {0: 100, 4: 400}})

I'd say the xarray counterpart of a Dataframe is a Dataset; the counterpart of a DataArray is a Series. Replacing the coordinates in a DataArray is akin to replacing the values of the index of a Series, which is apparently possible with series.rename(index={from: to}).

Other thoughts: some complexity comes in when implementing a replace_values method for a Dataset. I also think the pandas replace method signature is too complicated (scalars, lists, dicts, dicts of dicts, probably more?) and the docstring is quite extensive (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html)

I think the question is what the signature should be. You could compare to reindex (https://xarray.pydata.org/en/stable/generated/xarray.Dataset.reindex.html) and have an "replacer" argument:

```python da = da.replace({"y": ([50.0, 60.0], [5.0, 6.0])})

da["y"] = da["y"].replace([50.0, 60.0], [5.0, 6.0]) ```

The first one would also work for Datasets, but I personally prefer the second one for it's simplicity (and which is maybe closer to .where : https://xarray.pydata.org/en/stable/generated/xarray.DataArray.where.html).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [FEATURE]: Add a replace method 1173497454
1073776411 https://github.com/pydata/xarray/issues/6377#issuecomment-1073776411 https://api.github.com/repos/pydata/xarray/issues/6377 IC_kwDOAMm_X85AAIcb Huite 13662783 2022-03-21T11:20:26Z 2022-03-21T11:30:53Z CONTRIBUTOR

Yeah I think maybe replace_values is better name. "search and replace values" is maybe how you'd describe it colloquially?remap is an option too, but I think many users won't have the right assocation with it (if they're coming from a less technical background).

I don't think you'd want to this with np.select. If I understand correctly, you'd have to broadcast for the number of values to replace. This work okay with a small number of replacement values, but not with 10 000 like in my example above (but my understanding might be lacking).

Having said that, there is a faster and much cleaner implementation using np.seachsorted on da instead.

```python def custom_replace2(da, to_replace, value): flat = da.values.ravel()

sorter = np.argsort(to_replace)
insertion = np.searchsorted(to_replace, flat, sorter=sorter)
indices = np.take(sorter, insertion, mode="clip")
replaceable = (to_replace[indices] == flat)

out = flat.copy()
out[replaceable] = value[indices[replaceable]]
return da.copy(data=out.reshape(da.shape))

For small example: 4.1 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

For the larger example: # 14.4 ms ± 592 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit custom_replace2(da, to_replace, value) ```

This is equal to the implementation of remap in numpy-indexed (which is MIT-licensed): https://github.com/EelcoHoogendoorn/Numpy_arraysetops_EP

The key trick is the same, relying on sorting.

See e.g. also: https://stackoverflow.com/questions/16992713/translate-every-element-in-numpy-array-according-to-key

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [FEATURE]: Add a replace method 1173497454

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.995ms · About: xarray-datasette