id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1674818753,I_kwDOAMm_X85j07TB,7768,Supplying multidimensional initial guess to `curvefit`,20118130,closed,0,,,5,2023-04-19T12:37:53Z,2024-03-25T20:02:14Z,2023-05-31T12:43:09Z,CONTRIBUTOR,,,,"### Is your feature request related to a problem? Hi, I'm trying to use `DataArray.curvefit` to fit a bunch of data. Let's say the data dimensions are `(x, experiment_index)`, and I'm trying to fit `m * x + b`, where `m` will be different for each `experiment_index`. I would like to supply an initial guess `p0` to `curvefit` that depends on `experiment_index`, but it seems like this is not supported. Here's a minimal example: ```python import numpy as np import xarray as xr x = xr.DataArray(coords=[(""x"", np.linspace(0, 10, 101))]).x i = xr.DataArray(coords=[(""experiment_index"", [1, 2, 3])]).experiment_index data = 2.0 * i * x + 5 m_guess = 2 * i data.curvefit( ""x"", lambda x, m, b: m * x + b, p0={""m"": m_guess} # I would like to provide a guess for 'm' as a function of `experiment_index` ) ``` ### Describe the solution you'd like I would like to be able to provide arrays as the values of `p0`, so that I can have different initial guesses for different slices of the data. I suppose this could also be implemented for bounds. ### Describe alternatives you've considered I could wrap `curvefit` in a for-loop, for example ```python result = [] for y in data.transpose(""experiment_index"", ...): result.append(y.curvefit( ""x"", lambda x, m, b: m * x + b, p0={""m"": m_guess.sel(experiment_index=y.experiment_index).item()}, )) result = xr.concat(result, dim=""experiment_index"") ``` But this is quite cumbersome, especially for multidimensional data. ### Additional context The above example gives the error ``` *** ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part. ``` because `curve_fit` tries to do `np.atleast_1d([m_guess, 1])`, but it should be `np.atleast_1d([m_guess[0], 1])`. ~~The above example gives the error~~ ``` ValueError: operands could not be broadcast together with shapes (3,) (101,) ``` ~~which comes from `scipy.curve_fit` tying to compute `m * x`, where `m` is the DataArray `m_guess`, but `x` is a plain Numpy array, basically `x.data`.~~ - this applies for scipy 1.7. This toy example of course works with just a scalar guess like `p0={""m"": 2}`, but in my case the function is more complicated and fit might fail if the initial guess is too far off. The initial guess is inserted into `kwargs` passed to `curve_fit` here: https://github.com/pydata/xarray/blob/c75ac8b7ab33be95f74d9e6f10b8173c68828751/xarray/core/dataset.py#L8659","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7768/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1698656265,I_kwDOAMm_X85lP3AJ,7823,DataArray.to_dataset(dim) silently drops variable if it is already a dim,20118130,closed,0,,,3,2023-05-06T14:37:59Z,2023-11-14T22:28:18Z,2023-11-14T22:28:18Z,CONTRIBUTOR,,,,"### What happened? If I have a DataArray `da` which I split into a Dataset using `da.to_dataset(dim)`, and one of the values of `da[dim]` also happens to be one of the dimensions of `da`, that variable is silently missing from the resulting dataset. ### What did you expect to happen? If a variable cannot be created because it is already a dimension, it should raise an exception, or possibly issue a warning and rename the variable, so that no data is lost. ### Minimal Complete Verifiable Example ```Python import xarray as xr da = xr.DataArray( np.zeros((3, 3)), coords={ # note how 'foo' is one of the coordinate values, and also the name of a dimension ""x"": [""foo"", ""bar"", ""baz""], ""foo"": [1, 2, 3], } ) # this produces a Dataset with the variables 'bar' and 'baz', 'foo' is missing (because it is already a coordinate) print(da.to_dataset(""x"")) # this produces a dataset with the variables 'foo', 'bar', and 'baz', as epected print(da.rename({""foo"": ""qux""}).to_dataset(""x"")) ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output ```Python # Output of first conversion Dimensions: (foo: 3) Coordinates: * foo (foo) int64 1 2 3 Data variables: bar (foo) float64 0.0 0.0 0.0 baz (foo) float64 0.0 0.0 0.0 # Output of second conversion Dimensions: (qux: 3) Coordinates: * qux (qux) int64 1 2 3 Data variables: foo (qux) float64 0.0 0.0 0.0 bar (qux) float64 0.0 0.0 0.0 baz (qux) float64 0.0 0.0 0.0 ``` ### Anything else we need to know? This came up when I did `to_dataset(""param"")` on the fit result returned by `curvefit`, and one of the data dimensions happened to be the same as one of the arguments of the function which I was fitting. I was initially very confused by this. ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.10 (main, Mar 01 2023, 21:10:14) [GCC] python-bits: 64 OS: Linux OS-release: 6.2.12-1-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2023.4.2 pandas: 2.0.1 numpy: 1.23.5 scipy: 1.10.1 netCDF4: None pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.14.2 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.4.1 distributed: None matplotlib: 3.7.1 cartopy: None seaborn: 0.12.2 numbagg: None fsspec: 2023.4.0 cupy: None pint: None sparse: 0.14.0 flox: None numpy_groupies: None setuptools: 65.5.0 pip: 22.3.1 conda: None pytest: 7.3.1 mypy: None IPython: 8.13.2 sphinx: 6.2.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7823/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1985010450,PR_kwDOAMm_X85fAHx-,8433,Raise exception in to_dataset if resulting variable is also the name of a coordinate,20118130,closed,0,,,12,2023-11-09T07:38:20Z,2023-11-14T22:28:17Z,2023-11-14T22:28:17Z,CONTRIBUTOR,,0,pydata/xarray/pulls/8433," Let me know if you think the error message is unclear or too verbose or too fancy or something. - [x] Closes #7823 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8433/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1857713530,PR_kwDOAMm_X85YTTNH,8089,WIP: Factor out a function for checking dimension-related errors,20118130,open,0,,,4,2023-08-19T13:35:29Z,2023-09-12T18:59:32Z,,CONTRIBUTOR,,1,pydata/xarray/pulls/8089,"This is a WIP follow-up for #8079 and I think also for #7051. The pattern ```python missing_dims = set(dims) - set(self.dims) if missing_dims: raise ValueError(f""Dimensions {missing_dims} not found in data dimensions {tuple(self.dims)}"") ``` occurs in many methods, with small variations in the way `missing_dims` is calculated, the error message, and also if it's `ValueError` or `KeyError`. So it would make sense to factor it out. But I'm not familiar enough with the context around #7051 to know how to deal with sets vs tuples, so this is just a sketch for now. - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8089/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1855291078,PR_kwDOAMm_X85YLGz2,8079,Consistently report all dimensions in error messages if invalid dimensions are given,20118130,closed,0,,,11,2023-08-17T16:03:53Z,2023-09-09T04:55:43Z,2023-09-09T04:55:43Z,CONTRIBUTOR,,0,pydata/xarray/pulls/8079,"Hello, I noticed that `arr.min(""nonexistent"")` raises an error with a very helpful message ``` ValueError: 'nonexistent' not found in array dimensions ('x', 'y', 'z') ``` while `arr.idxmin(""nonexistent"")` raises ``` KeyError: 'Dimension ""nonexistent"" not in dimension' [sic] ``` IMO, the list of dimensions should always be shown in the error message for these kinds of errors, it makes debugging much easier. With this PR, I have implemented this behavior for all such functions that I could find. There is quite a consistent pattern which I think could be factored out into a function, but I didn't have a clear enough picture of the structure of the whole code to do it. I didn't fix the tests yet, I'll do it if you think this can be merged. - [x] Searched list of issues, couldn't find one related to this - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8079/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1752541983,I_kwDOAMm_X85odasf,7908,"`plot.scatter(hue_style=""invalid"")` does not raise an exception",20118130,closed,0,,,0,2023-06-12T11:30:22Z,2023-07-13T23:17:50Z,2023-07-13T23:17:50Z,CONTRIBUTOR,,,,"### What happened? If I do a scatterplot with `hue_style=x`, where `x` is not ""continuous"" or ""discrete"", the result is the same as passing `hue_style=""continuous""`. Probably related to #7907. ### What did you expect to happen? An invalid value should raise an exception. ### Minimal Complete Verifiable Example ```Python import matplotlib.pyplot as plt import numpy as np import xarray as xr x = xr.DataArray( np.random.default_rng().random((10, 3)), coords=[ (""idx"", np.linspace(0, 1, 10)), (""color"", [1, 2, 3]), ] ) x.plot.scatter(x=""idx"", hue=""color"", hue_style=""invalid"", ax=plt.figure().gca()) plt.show() ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.14.0-1059-oem machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.1.0 pandas: 1.4.3 numpy: 1.23.0 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.3 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 44.0.0 pip: 20.0.2 conda: None pytest: None mypy: None IPython: 8.12.2 sphinx: None
I also tried this on main at 3459e6fa, the behavior is the same.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7908/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1752520008,I_kwDOAMm_X85odVVI,7907,"`plot.scatter(hue_style=""discrete"")` does nothing",20118130,closed,0,,,4,2023-06-12T11:21:33Z,2023-07-13T23:17:49Z,2023-07-13T23:17:49Z,CONTRIBUTOR,,,,"### What happened? I was trying to do a scatterplot of my data with one dimension determining the color. The dimension has only a few values so I used `hue_style=""discrete""` to have a different color for each value. However, the resulting scatterplot has a continuous colorbar, which is the same as when I pass `hue_style=""continuous""`: ![image](https://github.com/pydata/xarray/assets/20118130/5767e8ba-4635-463f-9f4b-b405b72a3597) ### What did you expect to happen? The colorbar should have discrete colors. I was also expecting the colors to be from the default matplotlib color palette, C0, C1, etc, when there's less than 10 items, like this: ![image](https://github.com/pydata/xarray/assets/20118130/7cd2978e-2556-4aee-86cf-f4a5f3647ed1) Although the [examples in the documentation](https://docs.xarray.dev/en/stable/user-guide/plotting.html#scatter) show the discrete case also using viridis. What I was *really* expecting is a plot like one would get by passing `add_colorbar=False, add_legend=True`: ![image](https://github.com/pydata/xarray/assets/20118130/f0bef900-d701-4817-bff9-514b668172d8) But that may be a bit too automagical. ### Minimal Complete Verifiable Example ```Python import matplotlib.pyplot as plt import numpy as np import xarray as xr x = xr.DataArray( np.random.default_rng().random((10, 3)), coords=[ (""idx"", np.linspace(0, 1, 10)), (""color"", [1, 2, 3]), ] ) y = x + np.random.default_rng().random(x.shape) ds = xr.Dataset({ ""x"": x, ""y"": y, }) # the output is the same regardless of hue_style=""discrete"" or ""continuous"" or just leaving it out ds.plot.scatter(x=""x"", y=""y"", hue=""color"", hue_style=""discrete"", ax=plt.figure().gca()) ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? This is the code for the ""expected"" plot: ```python from matplotlib.colors import ListedColormap ds.plot.scatter( x=""x"", y=""y"", hue=""color"", hue_style=""discrete"", ax=plt.figure().gca(), # these lines added in addition to the MVCE cmap=ListedColormap([""C0"", ""C1"", ""C2""]), vmin=0.5, vmax=3.5, cbar_kwargs=dict(ticks=ds.color.data), ) ``` ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.14.0-1059-oem machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.1.0 pandas: 1.4.3 numpy: 1.23.0 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.3 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 44.0.0 pip: 20.0.2 conda: None pytest: None mypy: None IPython: 8.12.2 sphinx: None
I also tried this on main at 3459e6fa, the behavior is the same.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7907/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1740268634,PR_kwDOAMm_X85SHW1Z,7891,Add errors option to curvefit,20118130,closed,0,,,3,2023-06-04T09:43:06Z,2023-06-16T03:15:07Z,2023-06-16T03:15:06Z,CONTRIBUTOR,,0,pydata/xarray/pulls/7891,"- [x] Closes #6317 and closes #6515 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` This is a rebased version of #6515, with the arg `errors = ""raise"" | ""ignore""` added to `Dataset` and `DataArray`, and with tests. Let me know if the tests should be expanded further.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7891/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1741050111,PR_kwDOAMm_X85SJ-xN,7893,Fix flaky doctest for curvefit,20118130,closed,0,,,1,2023-06-05T06:10:30Z,2023-06-09T15:38:58Z,2023-06-09T15:38:58Z,CONTRIBUTOR,,0,pydata/xarray/pulls/7893,"Fix flaky doctest introduced in #7821, see https://github.com/pydata/xarray/pull/7821#issuecomment-1537142237. This uses the `NUMBER` option to compare the output with less decimal precision. It's not part of standard doctest but an extension from pytest: https://docs.pytest.org/en/7.1.x/how-to/doctest.html#using-doctest-options Another option would be to use `...` and the built-in `+ELLIPSIS` option, but IMO the current version is less confusing for someone reading the example.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7893/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1698626185,PR_kwDOAMm_X85P6owK,7821,Implement multidimensional initial guess and bounds for `curvefit`,20118130,closed,0,,,6,2023-05-06T13:09:49Z,2023-06-01T15:51:40Z,2023-05-31T12:43:07Z,CONTRIBUTOR,,0,pydata/xarray/pulls/7821,"- [x] Closes #7768 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` With this PR, it's possible to pass an initial guess to `curvefit` that is a DataArray, which will be broadcast to the data dimensions. This way, the initial guess can vary with the data coordinates. I also added examples of using `curvefit` to the documentation, both a basic example and one with the multidimensional guess. I have a couple of questions: - Should we change the signature to `p0: dict[str, float | DataArray] | None`, instead of `dict[str, Any]` (and same for bounds)? scipy only optimizes over scalars, so I think it would be safe to assume that the values should either be those, or arrays that can be broadcast. - The usage example of curvefit is only in the docstring for DataArray, so now the docs differ between DA and dataset. But the example uses a DataArray only, so this should be ok, right?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7821/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1698632575,PR_kwDOAMm_X85P6qCY,7822,Fix typos in contribution guide,20118130,closed,0,,,1,2023-05-06T13:29:22Z,2023-05-07T09:12:57Z,2023-05-07T07:34:56Z,CONTRIBUTOR,,0,pydata/xarray/pulls/7822,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7822/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1345816120,PR_kwDOAMm_X849h97w,6944,Fix step plots with hue,20118130,closed,0,,,2,2022-08-22T05:00:14Z,2022-08-28T12:39:33Z,2022-08-25T15:56:11Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6944,"This PR fixes the broadcasting error when trying to plot multiple step plots, like `arr.plot.step(..., hue=...)` or `arr.plot(..., drawstyle=""steps-mid"")`. Previously, this raised a shape error, as mentioned in https://github.com/pydata/xarray/issues/4288#issuecomment-666485140. Some other relevant work was started (but apparently unfinished) in #4868 and #4866, this doesn't implement those. - [x] Tests added - [x] Fixes applied - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6944/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull