home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where author_association = "CONTRIBUTOR" and user = 13662783 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 8

  • [FEATURE]: Add a replace method 2
  • DataArray constructor still coerces to np.datetime64[ns], not cftime in 0.11.0 1
  • xr.merge always sorts indexes ascending 1
  • ENH: Preserve monotonic descending index order when merging 1
  • Regression 0.14.0 -> 0.14.1: __setattr__ no longer updates matching index 1
  • Zarr ZipStore versus DirectoryStore: ZipStore requires .close() 1
  • Extension proposal: extension for UGRID and unstructured mesh utils 1
  • xarray imshow and pcolormesh behave badly when the array does not contain values larger the BoundaryNorm vmax 1

user 1

  • Huite · 9 ✖

author_association 1

  • CONTRIBUTOR · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1412027546 https://github.com/pydata/xarray/issues/7014#issuecomment-1412027546 https://api.github.com/repos/pydata/xarray/issues/7014 IC_kwDOAMm_X85UKdSa Huite 13662783 2023-02-01T13:06:09Z 2023-02-01T13:06:45Z CONTRIBUTOR

Debugging this, @headtr1ck points correctly to _determine_cmap_params:

python if levels is not None or isinstance(norm, mpl.colors.BoundaryNorm): cmap, newnorm = _build_discrete_cmap(cmap, levels, extend, filled) norm = newnorm if norm is None else norm

The problem lies in the second line. In _build_discrete_cmap, a new cmap is returned with a different number of levels. However, if the original norm is not None, you end up with a mismatch, as the old norm expects the old cmap.

What then happens, is that the norm calls into the cmap. Calling into cmap doesn't do any checks whether the value is larger than N, it just takes the highest available value. The examples in #4061 show this quite clearly, but to illustrate:

```python import matplotlib as mpl import matplotlib.pyplot as plt import numpy as np

data = np.arange(100).reshape((10, 10)) cmap = mpl.cm.get_cmap("viridis") print(cmap.N) # 256 boundaries = [0, 25, 50, 75, 100] norm = mpl.colors.BoundaryNorm(boundaries, cmap.N) fig, ax = plt.subplots() ax.imshow(data, norm=norm, cmap=cmap)

%%

colors = [cmap(i/255) for i in np.linspace(0, cmap.N, len(boundaries) - 1)] new_cmap, new_norm = mpl.colors.from_levels_and_colors(boundaries, colors) print(new_cmap.N) # 4 fig, ax = plt.subplots() ax.imshow(data, norm=new_norm, cmap=new_cmap)

%%

Mismatched

fig, ax = plt.subplots() ax.imshow(data, norm=norm, cmap=new_cmap)

%%

```

This is avoided here by removing the conditional in the second line, or just making sure both cmap and norm are replaced by their new values:

python if levels is not None or isinstance(norm, mpl.colors.BoundaryNorm): cmap, norm = _build_discrete_cmap(cmap, levels, extend, filled)

Then, the cmap and norm remain in sync.

However, when running the tests in test_plot.py, this gives a single questionable(?) failure:

```python def test_norm_sets_vmin_vmax(self) -> None: vmin = self.data.min() vmax = self.data.max()

    for norm, extend, levels in zip(
        [
            mpl.colors.Normalize(),
            mpl.colors.Normalize(),
            mpl.colors.Normalize(vmin + 0.1, vmax - 0.1),
            mpl.colors.Normalize(None, vmax - 0.1),
            mpl.colors.Normalize(vmin + 0.1, None),
        ],
        ["neither", "neither", "both", "max", "min"],
        [7, None, None, None, None],
    ):

        test_min = vmin if norm.vmin is None else norm.vmin
        test_max = vmax if norm.vmax is None else norm.vmax

        cmap_params = _determine_cmap_params(self.data, norm=norm, levels=levels)
        assert cmap_params["vmin"] is None
        assert cmap_params["vmax"] is None
        assert cmap_params["norm"].vmin == test_min
        assert cmap_params["norm"].vmax == test_max
        assert cmap_params["extend"] == extend
      assert cmap_params["norm"] == norm

E assert <matplotlib.colors.BoundaryNorm object at 0x000001C7A4CB07C0> == <matplotlib.colors.Normalize object at 0x000001C7A50C4760> ```

I don't understand why the conditional is there. At first sight, it doesn't make a lot of sense to create a new norm and cmap, but then take the original norm?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray imshow and pcolormesh behave badly when the array does not contain values larger the BoundaryNorm vmax 1368027148
1084277555 https://github.com/pydata/xarray/issues/6377#issuecomment-1084277555 https://api.github.com/repos/pydata/xarray/issues/6377 IC_kwDOAMm_X85AoMMz Huite 13662783 2022-03-31T08:45:18Z 2022-03-31T08:45:18Z CONTRIBUTOR

@Jeitan

The coordinate is a DataArray as well, so the following would work:

```python

Example DataArray

da = xr.DataArray(np.ones((3, 3)), {"y": [50.0, 60.0, 70.0], "x": [1.0, 2.0, 3.0]}, ("y", "x"))

Replace 50.0 and 60.0 by 5.0 and 6.0 in the y coordinate

da["y"] = da["y"].replace_values([50.0, 60.0], [5.0, 6.0]) ```

Your example in the other issue mentions one of the ways you'd replace in pandas, but for a dataframe. With a dataframe, there's quite some flexibility:

python df.replace({0: 10, 1: 100}) df.replace({'A': 0, 'B': 5}, 100) df.replace({'A': {0: 100, 4: 400}})

I'd say the xarray counterpart of a Dataframe is a Dataset; the counterpart of a DataArray is a Series. Replacing the coordinates in a DataArray is akin to replacing the values of the index of a Series, which is apparently possible with series.rename(index={from: to}).

Other thoughts: some complexity comes in when implementing a replace_values method for a Dataset. I also think the pandas replace method signature is too complicated (scalars, lists, dicts, dicts of dicts, probably more?) and the docstring is quite extensive (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html)

I think the question is what the signature should be. You could compare to reindex (https://xarray.pydata.org/en/stable/generated/xarray.Dataset.reindex.html) and have an "replacer" argument:

```python da = da.replace({"y": ([50.0, 60.0], [5.0, 6.0])})

da["y"] = da["y"].replace([50.0, 60.0], [5.0, 6.0]) ```

The first one would also work for Datasets, but I personally prefer the second one for it's simplicity (and which is maybe closer to .where : https://xarray.pydata.org/en/stable/generated/xarray.DataArray.where.html).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [FEATURE]: Add a replace method 1173497454
1073776411 https://github.com/pydata/xarray/issues/6377#issuecomment-1073776411 https://api.github.com/repos/pydata/xarray/issues/6377 IC_kwDOAMm_X85AAIcb Huite 13662783 2022-03-21T11:20:26Z 2022-03-21T11:30:53Z CONTRIBUTOR

Yeah I think maybe replace_values is better name. "search and replace values" is maybe how you'd describe it colloquially?remap is an option too, but I think many users won't have the right assocation with it (if they're coming from a less technical background).

I don't think you'd want to this with np.select. If I understand correctly, you'd have to broadcast for the number of values to replace. This work okay with a small number of replacement values, but not with 10 000 like in my example above (but my understanding might be lacking).

Having said that, there is a faster and much cleaner implementation using np.seachsorted on da instead.

```python def custom_replace2(da, to_replace, value): flat = da.values.ravel()

sorter = np.argsort(to_replace)
insertion = np.searchsorted(to_replace, flat, sorter=sorter)
indices = np.take(sorter, insertion, mode="clip")
replaceable = (to_replace[indices] == flat)

out = flat.copy()
out[replaceable] = value[indices[replaceable]]
return da.copy(data=out.reshape(da.shape))

For small example: 4.1 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

For the larger example: # 14.4 ms ± 592 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit custom_replace2(da, to_replace, value) ```

This is equal to the implementation of remap in numpy-indexed (which is MIT-licensed): https://github.com/EelcoHoogendoorn/Numpy_arraysetops_EP

The key trick is the same, relying on sorting.

See e.g. also: https://stackoverflow.com/questions/16992713/translate-every-element-in-numpy-array-according-to-key

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [FEATURE]: Add a replace method 1173497454
658044484 https://github.com/pydata/xarray/issues/4222#issuecomment-658044484 https://api.github.com/repos/pydata/xarray/issues/4222 MDEyOklzc3VlQ29tbWVudDY1ODA0NDQ4NA== Huite 13662783 2020-07-14T08:24:26Z 2020-07-14T08:24:26Z CONTRIBUTOR

@ChrisBarker-NOAA: thanks, I'll reply in the gridded issue: https://github.com/NOAA-ORR-ERD/gridded/issues/55

@TomNicholas Thanks for the comment! I've put a clarification in the post, that I don't intend this to live in Xarray. The stuff I suggest is specifically about mesh topology, which I'd agree on as being outside of the scope of xarray. I realize the post wasn't that clear in that regard. I was indeed thinking of the accessor methods, but failed to explicitly mention them. Hopefully should be bit clearer now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extension proposal: extension for UGRID and unstructured mesh utils 656165548
630662285 https://github.com/pydata/xarray/issues/4076#issuecomment-630662285 https://api.github.com/repos/pydata/xarray/issues/4076 MDEyOklzc3VlQ29tbWVudDYzMDY2MjI4NQ== Huite 13662783 2020-05-19T08:15:56Z 2020-05-19T08:15:56Z CONTRIBUTOR

Opening (and closing) a ZipStore when given a path ending in ".zip" would largely solve it I believe -- in fact, I tried just providing a path with ".zip" to to_zarr before, hoping it would work. (And indeed, I did so because the many small files are troublesome.)

open_zarr already automatically accepts path ".zip", so it would be nicely symmetric as well. However, this behavior exists within Zarr, not Xarray.

https://github.com/zarr-developers/zarr-python/blob/473576f3b17ac4a460dfccbcc4860606cd8123ed/zarr/creation.py#L130-142

I'd be happy to have a try at a PR in principle.

Maybe it could be as simple as adding a with? ```python

if isinstance(store, (str, pathlib.Path)): if str(store).endswith('.zip'): with zarr.ZipStore(store) as zstore:

etc

```

In the xarray.backends.api.to_zarr() function.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr ZipStore versus DirectoryStore: ZipStore requires .close() 620468256
571196866 https://github.com/pydata/xarray/issues/3664#issuecomment-571196866 https://api.github.com/repos/pydata/xarray/issues/3664 MDEyOklzc3VlQ29tbWVudDU3MTE5Njg2Ng== Huite 13662783 2020-01-06T16:03:16Z 2020-01-06T16:03:16Z CONTRIBUTOR

Ah, thanks. Somehow I didn't search with quite the right search terms...

For posterity, the right way of doing this is by replacing: python da2["x"].values = da1["x"].values by ```python da2 = da2.assign_coords(x=da1["x"]) ````

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Regression 0.14.0 -> 0.14.1: __setattr__ no longer updates matching index 545762025
493798254 https://github.com/pydata/xarray/pull/2972#issuecomment-493798254 https://api.github.com/repos/pydata/xarray/issues/2972 MDEyOklzc3VlQ29tbWVudDQ5Mzc5ODI1NA== Huite 13662783 2019-05-19T22:41:49Z 2019-05-19T22:46:06Z CONTRIBUTOR

I'm definitely not convinced it's a great idea either; a pull request is hopefully the best way to get some discussion going! Putting it in the _get_joiner is definitely the sensible choice, and the last suggestion is definitely more clear to me.

I've had a bit more thought and maybe it's useful to consider what xarray's philosophy about these things is. I think flexibility is a primary concern and generally, things *just work*. The reason I'm running into issues here is because I'm writing some code which operates on DataArrays, which isn't nearly as robust or flexible, and doesn't just work.

The answer in this case might be that I should be using xarray's powerful alignment machinery to do the work for me, rather than to assume/demand certain features of the data. Of course, that requires some digging into how xarray does alignment. But I'd end up with a more flexible tool in the end.

Perhaps that should be the general rule: if you're extending xarray, like xarray, don't rely too much about your coordinates staying the way they are. Maybe such a description could belong in the xarray Internals page just to make it explicit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Preserve monotonic descending index order when merging 445745470
490255662 https://github.com/pydata/xarray/issues/2947#issuecomment-490255662 https://api.github.com/repos/pydata/xarray/issues/2947 MDEyOklzc3VlQ29tbWVudDQ5MDI1NTY2Mg== Huite 13662783 2019-05-07T21:07:26Z 2019-05-07T21:07:26Z CONTRIBUTOR

I had the same thought, but I think we do need sorting in general; just descending rather than ascending for this kind of grid. Otherwise you might end up with non-monotonic coordinates, which is rarely desirable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.merge always sorts indexes ascending 441341340
444071471 https://github.com/pydata/xarray/issues/2587#issuecomment-444071471 https://api.github.com/repos/pydata/xarray/issues/2587 MDEyOklzc3VlQ29tbWVudDQ0NDA3MTQ3MQ== Huite 13662783 2018-12-04T11:42:47Z 2018-12-04T11:42:58Z CONTRIBUTOR

Thanks, I'll indeed use cftime.datetime objects directly.

cftime.DatetimeProlepticGregorian seems the obvious default to me as well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray constructor still coerces to np.datetime64[ns], not cftime in 0.11.0 386596872

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 4402.176ms · About: xarray-datasette