html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7014#issuecomment-1412027546,https://api.github.com/repos/pydata/xarray/issues/7014,1412027546,IC_kwDOAMm_X85UKdSa,13662783,2023-02-01T13:06:09Z,2023-02-01T13:06:45Z,CONTRIBUTOR,"Debugging this, @headtr1ck points correctly to _determine_cmap_params:
```python
if levels is not None or isinstance(norm, mpl.colors.BoundaryNorm):
cmap, newnorm = _build_discrete_cmap(cmap, levels, extend, filled)
norm = newnorm if norm is None else norm
```
The problem lies in the second line. In `_build_discrete_cmap`, a new cmap is returned with a different number of levels. However, if the original `norm` is not None, you end up with a mismatch, as the old norm expects the old cmap.
What then happens, is that the norm calls into the cmap. Calling into cmap doesn't do any checks whether the value is larger than N, it just takes the highest available value. The examples in #4061 show this quite clearly, but to illustrate:
```python
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
data = np.arange(100).reshape((10, 10))
cmap = mpl.cm.get_cmap(""viridis"")
print(cmap.N) # 256
boundaries = [0, 25, 50, 75, 100]
norm = mpl.colors.BoundaryNorm(boundaries, cmap.N)
fig, ax = plt.subplots()
ax.imshow(data, norm=norm, cmap=cmap)
# %%
colors = [cmap(i/255) for i in np.linspace(0, cmap.N, len(boundaries) - 1)]
new_cmap, new_norm = mpl.colors.from_levels_and_colors(boundaries, colors)
print(new_cmap.N) # 4
fig, ax = plt.subplots()
ax.imshow(data, norm=new_norm, cmap=new_cmap)
# %%
# Mismatched
fig, ax = plt.subplots()
ax.imshow(data, norm=norm, cmap=new_cmap)
# %%
```
This is avoided here by removing the conditional in the second line, or just making sure both cmap and norm are replaced by their new values:
```python
if levels is not None or isinstance(norm, mpl.colors.BoundaryNorm):
cmap, norm = _build_discrete_cmap(cmap, levels, extend, filled)
```
Then, the cmap and norm remain in sync.
However, when running the tests in `test_plot.py`, this gives a single questionable(?) failure:
```python
def test_norm_sets_vmin_vmax(self) -> None:
vmin = self.data.min()
vmax = self.data.max()
for norm, extend, levels in zip(
[
mpl.colors.Normalize(),
mpl.colors.Normalize(),
mpl.colors.Normalize(vmin + 0.1, vmax - 0.1),
mpl.colors.Normalize(None, vmax - 0.1),
mpl.colors.Normalize(vmin + 0.1, None),
],
[""neither"", ""neither"", ""both"", ""max"", ""min""],
[7, None, None, None, None],
):
test_min = vmin if norm.vmin is None else norm.vmin
test_max = vmax if norm.vmax is None else norm.vmax
cmap_params = _determine_cmap_params(self.data, norm=norm, levels=levels)
assert cmap_params[""vmin""] is None
assert cmap_params[""vmax""] is None
assert cmap_params[""norm""].vmin == test_min
assert cmap_params[""norm""].vmax == test_max
assert cmap_params[""extend""] == extend
> assert cmap_params[""norm""] == norm
E assert ==
```
I don't understand why the conditional is there. At first sight, it doesn't make a lot of sense to create a new norm and cmap, but then take the original norm?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1368027148
https://github.com/pydata/xarray/issues/6377#issuecomment-1084277555,https://api.github.com/repos/pydata/xarray/issues/6377,1084277555,IC_kwDOAMm_X85AoMMz,13662783,2022-03-31T08:45:18Z,2022-03-31T08:45:18Z,CONTRIBUTOR,"@Jeitan
The coordinate is a DataArray as well, so the following would work:
```python
# Example DataArray
da = xr.DataArray(np.ones((3, 3)), {""y"": [50.0, 60.0, 70.0], ""x"": [1.0, 2.0, 3.0]}, (""y"", ""x""))
# Replace 50.0 and 60.0 by 5.0 and 6.0 in the y coordinate
da[""y""] = da[""y""].replace_values([50.0, 60.0], [5.0, 6.0])
```
Your example in the other issue mentions one of the ways you'd replace in pandas, but for a dataframe. With a dataframe, there's quite some flexibility:
```python
df.replace({0: 10, 1: 100})
df.replace({'A': 0, 'B': 5}, 100)
df.replace({'A': {0: 100, 4: 400}})
```
I'd say the xarray counterpart of a Dataframe is a Dataset; the counterpart of a DataArray is a Series.
Replacing the coordinates in a DataArray is akin to replacing the values of the index of a Series, which is apparently possible with `series.rename(index={from: to})`.
Other thoughts: some complexity comes in when implementing a `replace_values` method for a Dataset.
I also think the pandas `replace` method signature is too complicated (scalars, lists, dicts, dicts of dicts, probably more?) and the docstring is quite extensive (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html)
I think the question is what the signature should be. You could compare to reindex (https://xarray.pydata.org/en/stable/generated/xarray.Dataset.reindex.html) and have an ""replacer"" argument:
```python
da = da.replace({""y"": ([50.0, 60.0], [5.0, 6.0])})
da[""y""] = da[""y""].replace([50.0, 60.0], [5.0, 6.0])
```
The first one would also work for Datasets, but I personally prefer the second one for it's simplicity (and which is maybe closer to `.where` : https://xarray.pydata.org/en/stable/generated/xarray.DataArray.where.html).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1173497454
https://github.com/pydata/xarray/issues/6377#issuecomment-1073776411,https://api.github.com/repos/pydata/xarray/issues/6377,1073776411,IC_kwDOAMm_X85AAIcb,13662783,2022-03-21T11:20:26Z,2022-03-21T11:30:53Z,CONTRIBUTOR,"Yeah I think maybe `replace_values` is better name. ""search and replace values"" is maybe how you'd describe it colloquially?`remap` is an option too, but I think many users won't have the right assocation with it (if they're coming from a less technical background).
I don't think you'd want to this with `np.select`. If I understand correctly, you'd have to broadcast for the number of values to replace. This work okay with a small number of replacement values, but not with 10 000 like in my example above (but my understanding might be lacking).
Having said that, there is a faster and much cleaner implementation using `np.seachsorted` on `da` instead.
```python
def custom_replace2(da, to_replace, value):
flat = da.values.ravel()
sorter = np.argsort(to_replace)
insertion = np.searchsorted(to_replace, flat, sorter=sorter)
indices = np.take(sorter, insertion, mode=""clip"")
replaceable = (to_replace[indices] == flat)
out = flat.copy()
out[replaceable] = value[indices[replaceable]]
return da.copy(data=out.reshape(da.shape))
# For small example: 4.1 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# For the larger example: # 14.4 ms ± 592 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit custom_replace2(da, to_replace, value)
```
This is equal to the implementation of `remap` in `numpy-indexed` (which is MIT-licensed):
https://github.com/EelcoHoogendoorn/Numpy_arraysetops_EP
The key trick is the same, relying on sorting.
See e.g. also: https://stackoverflow.com/questions/16992713/translate-every-element-in-numpy-array-according-to-key
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1173497454
https://github.com/pydata/xarray/issues/4222#issuecomment-658044484,https://api.github.com/repos/pydata/xarray/issues/4222,658044484,MDEyOklzc3VlQ29tbWVudDY1ODA0NDQ4NA==,13662783,2020-07-14T08:24:26Z,2020-07-14T08:24:26Z,CONTRIBUTOR,"@ChrisBarker-NOAA: thanks, I'll reply in the gridded issue: https://github.com/NOAA-ORR-ERD/gridded/issues/55
@TomNicholas Thanks for the comment! I've put a clarification in the post, that I don't intend this to live in Xarray. The stuff I suggest is specifically about mesh topology, which I'd agree on as being outside of the scope of xarray. I realize the post wasn't that clear in that regard. I was indeed thinking of the accessor methods, but failed to explicitly mention them. Hopefully should be bit clearer now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,656165548
https://github.com/pydata/xarray/issues/4076#issuecomment-630662285,https://api.github.com/repos/pydata/xarray/issues/4076,630662285,MDEyOklzc3VlQ29tbWVudDYzMDY2MjI4NQ==,13662783,2020-05-19T08:15:56Z,2020-05-19T08:15:56Z,CONTRIBUTOR,"Opening (and closing) a ZipStore when given a path ending in `"".zip""` would largely solve it I believe -- in fact, I tried just providing a path with `"".zip""` to `to_zarr` before, hoping it would work. (And indeed, I did so because the many small files are troublesome.)
`open_zarr` already automatically accepts path `"".zip""`, so it would be nicely symmetric as well.
However, this behavior exists within Zarr, not Xarray.
https://github.com/zarr-developers/zarr-python/blob/473576f3b17ac4a460dfccbcc4860606cd8123ed/zarr/creation.py#L130-142
I'd be happy to have a try at a PR in principle.
Maybe it could be as simple as adding a `with`?
```python
if isinstance(store, (str, pathlib.Path)):
if str(store).endswith('.zip'):
with zarr.ZipStore(store) as zstore:
# etc
```
In the `xarray.backends.api.to_zarr()` function.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620468256
https://github.com/pydata/xarray/issues/3664#issuecomment-571196866,https://api.github.com/repos/pydata/xarray/issues/3664,571196866,MDEyOklzc3VlQ29tbWVudDU3MTE5Njg2Ng==,13662783,2020-01-06T16:03:16Z,2020-01-06T16:03:16Z,CONTRIBUTOR,"Ah, thanks. Somehow I didn't search with quite the right search terms...
For posterity, the right way of doing this is by replacing:
```python
da2[""x""].values = da1[""x""].values
```
by
```python
da2 = da2.assign_coords(x=da1[""x""])
````","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,545762025
https://github.com/pydata/xarray/pull/2972#issuecomment-493798254,https://api.github.com/repos/pydata/xarray/issues/2972,493798254,MDEyOklzc3VlQ29tbWVudDQ5Mzc5ODI1NA==,13662783,2019-05-19T22:41:49Z,2019-05-19T22:46:06Z,CONTRIBUTOR,"I'm definitely not convinced it's a great idea either; a pull request is hopefully the best way to get some discussion going! Putting it in the `_get_joiner` is definitely the sensible choice, and the last suggestion is definitely more clear to me.
I've had a bit more thought and maybe it's useful to consider what xarray's philosophy about these things is. I think flexibility is a primary concern and generally, things \*just work\*. The reason I'm running into issues here is because I'm writing some code which operates on DataArrays, which isn't nearly as robust or flexible, and **doesn't** just work.
The answer in this case might be that I should be using xarray's powerful alignment machinery to do the work for me, rather than to assume/demand certain features of the data. Of course, that requires some digging into how xarray does alignment. But I'd end up with a more flexible tool in the end.
Perhaps that should be the general rule: if you're extending xarray, like xarray, *don't* rely too much about your coordinates staying the way they are. Maybe such a description could belong in the [xarray Internals page](http://xarray.pydata.org/en/stable/internals.html) just to make it explicit.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,445745470
https://github.com/pydata/xarray/issues/2947#issuecomment-490255662,https://api.github.com/repos/pydata/xarray/issues/2947,490255662,MDEyOklzc3VlQ29tbWVudDQ5MDI1NTY2Mg==,13662783,2019-05-07T21:07:26Z,2019-05-07T21:07:26Z,CONTRIBUTOR,"I had the same thought, but I think we do need sorting in general; just descending rather than ascending for this kind of grid. Otherwise you might end up with non-monotonic coordinates, which is rarely desirable.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,441341340
https://github.com/pydata/xarray/issues/2587#issuecomment-444071471,https://api.github.com/repos/pydata/xarray/issues/2587,444071471,MDEyOklzc3VlQ29tbWVudDQ0NDA3MTQ3MQ==,13662783,2018-12-04T11:42:47Z,2018-12-04T11:42:58Z,CONTRIBUTOR,"Thanks, I'll indeed use `cftime.datetime` objects directly.
`cftime.DatetimeProlepticGregorian` seems the obvious default to me as well.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,386596872