html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/7821#issuecomment-1537162245,https://api.github.com/repos/pydata/xarray/issues/7821,1537162245,IC_kwDOAMm_X85bnzwF,39069044,2023-05-06T15:12:57Z,2023-05-06T15:12:57Z,CONTRIBUTOR,This looks pretty good to me on first glance! I would vote to do `p0` and `bounds` in one PR. It could surprise a user if one of these can take an array and the other only scalars.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1698626185
https://github.com/pydata/xarray/pull/7798#issuecomment-1532986706,https://api.github.com/repos/pydata/xarray/issues/7798,1532986706,IC_kwDOAMm_X85bX4VS,39069044,2023-05-03T12:58:35Z,2023-05-03T12:58:35Z,CONTRIBUTOR,Would it be possible to run another bug fix release with this incorporated? Or I guess we're already on to `2023.5.0` according to the date.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1689773381
https://github.com/pydata/xarray/pull/7798#issuecomment-1529050645,https://api.github.com/repos/pydata/xarray/issues/7798,1529050645,IC_kwDOAMm_X85bI3YV,39069044,2023-04-30T15:20:47Z,2023-04-30T15:20:47Z,CONTRIBUTOR,Thanks for the quick fix! Not sure about the bitshift test but I'm assuming @headtr1ck is right.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1689773381
https://github.com/pydata/xarray/issues/7768#issuecomment-1516681755,https://api.github.com/repos/pydata/xarray/issues/7768,1516681755,IC_kwDOAMm_X85aZrob,39069044,2023-04-20T17:16:39Z,2023-04-20T17:16:39Z,CONTRIBUTOR,"This should be doable. I think we would have to rework the `apply_ufunc` wrapper to pass `p0` and `bounds` as DataArrays through `*args` instead of simple dictionaries through`kwargs`, so that `apply_ufunc` can broadcast them and handle dask chunks.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1674818753
https://github.com/pydata/xarray/issues/7587#issuecomment-1457345587,https://api.github.com/repos/pydata/xarray/issues/7587,1457345587,IC_kwDOAMm_X85W3VQz,39069044,2023-03-07T01:34:12Z,2023-03-07T01:34:12Z,CONTRIBUTOR,"Your `m0tot` variable is also being broadcast in the `fami` dimension. So, an additional 10x384x1233x8/1e6=37MB.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1611288905
https://github.com/pydata/xarray/issues/7587#issuecomment-1457080267,https://api.github.com/repos/pydata/xarray/issues/7587,1457080267,IC_kwDOAMm_X85W2UfL,39069044,2023-03-06T22:06:11Z,2023-03-06T22:06:11Z,CONTRIBUTOR,Same issue as #1234. This has tripped me up before as well. A kwarg to control this behavior would be a nice enhancement to `.where()`.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1611288905
https://github.com/pydata/xarray/issues/7587#issuecomment-1457061064,https://api.github.com/repos/pydata/xarray/issues/7587,1457061064,IC_kwDOAMm_X85W2PzI,39069044,2023-03-06T21:55:14Z,2023-03-06T21:55:14Z,CONTRIBUTOR,"Since you're using `tp` (dims `fami, time, site`) as the condition, these dimensions are broadcast across all other variables in the dataset. The problem looks to be your variable `wshedOut`, which is now broadcast across all 5 dimensions in the dataset, hence greatly increased memory usage.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1611288905
https://github.com/pydata/xarray/issues/7581#issuecomment-1454968267,https://api.github.com/repos/pydata/xarray/issues/7581,1454968267,IC_kwDOAMm_X85WuQ3L,39069044,2023-03-05T02:45:42Z,2023-03-05T02:45:42Z,CONTRIBUTOR,"`xr.where(cond, x, y, keep_attrs=True)` unambiguously takes the attributes of `x` as described in the docs, even if `x` is a scalar and has no attributes. This can be inconvenient at times but is intentional for consistency. You can invert the condition as you suggest, or better yet use `DataArray.where`/`Dataset.where` to ensure you are getting the attrs you expect.
In what case would `a!=0` not mirror `a==0`?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1608352581
https://github.com/pydata/xarray/issues/7522#issuecomment-1450841385,https://api.github.com/repos/pydata/xarray/issues/7522,1450841385,IC_kwDOAMm_X85WehUp,39069044,2023-03-01T21:01:48Z,2023-03-01T21:01:48Z,CONTRIBUTOR,"Yeah that seems to be it. Dask's write neatly packs all the needed metadata at the beginning of the file, since we can scale this up to a many GB file with dozens of variables and still read in ~100ms. While xarray is doing a less well organized write of the metadata and we have to go seeking in the middle of the byte range. `cache_type=""first""` does provide some improvement but still not as good as on the dask-written file.
FWIW, I inspected the actual bytes of the dask and xarray written files and they are identical for a single variable, but diverge when multiple variables are being written. So, the important differences are probably associated with this step:
> It does set up the whole set of variables as a initialisation stage before writing any data - I don't know if xarray does this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1581046647
https://github.com/pydata/xarray/issues/7522#issuecomment-1449302032,https://api.github.com/repos/pydata/xarray/issues/7522,1449302032,IC_kwDOAMm_X85WYpgQ,39069044,2023-03-01T04:04:25Z,2023-03-01T04:04:25Z,CONTRIBUTOR,"The slow file:

And the fast file:

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1581046647
https://github.com/pydata/xarray/issues/7522#issuecomment-1428872842,https://api.github.com/repos/pydata/xarray/issues/7522,1428872842,IC_kwDOAMm_X85VKt6K,39069044,2023-02-13T23:49:31Z,2023-02-13T23:49:31Z,CONTRIBUTOR,"I did try many loops and different order of operations to make sure this isn't a caching or auth issue. You can see the std dev of the `timeit` calls above is pretty consistent.
For my actual use case, the difference is very apparent, with `open_dataset` taking about 9 seconds on the numpy-saved file and <1 second on the dask-saved one. I can also clearly see when monitoring network traffic that the slow version has to read in hundreds of MB of data to open the dataset, while the fast one only reads the tiny headers.
I also inspected the actual header bytes of these two files and see they are indeed different.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1581046647
https://github.com/pydata/xarray/issues/7435#issuecomment-1401116054,https://api.github.com/repos/pydata/xarray/issues/7435,1401116054,IC_kwDOAMm_X85Tg1WW,39069044,2023-01-23T22:53:14Z,2023-01-23T22:53:14Z,CONTRIBUTOR,"You do get a `FileNotFoundError` if you explicitly specify an engine with `xarray.load_dataset('not-existing-file.h5', engine='h5netcdf')`.
It looks like neither `NetCDF4BackendEntrypoint` or `H5netcdfBackendEntrypoint` include `.h5` in the set of openable extensions they handle in `guess_can_open`, but they will work if they can peer into the file and detect a valid `.h5`. Probably some good reason for this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1532317033
https://github.com/pydata/xarray/issues/7403#issuecomment-1379504132,https://api.github.com/repos/pydata/xarray/issues/7403,1379504132,IC_kwDOAMm_X85SOZAE,39069044,2023-01-11T21:26:10Z,2023-01-11T21:26:10Z,CONTRIBUTOR,"> Personally, I'll make the argument that we should switch (carefully) from `-w` to `w` for the default case.
This makes sense in principal for consistency with other i/o methods, but I think could go very wrong in practice, especially for inexperienced users.
For example, a misplaced command of `ds.to_zarr(""gs://bucket_name"")` with default `mode=""w""` overwrites a user's entire bucket, and same thing for local storage. `to_netcdf` doesn't have the capability to `rm -r` a directory, so I would argue `to_zarr(mode=""w"")` is **much** more dangerous.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1512290017
https://github.com/pydata/xarray/issues/7403#issuecomment-1366882900,https://api.github.com/repos/pydata/xarray/issues/7403,1366882900,IC_kwDOAMm_X85RePpU,39069044,2022-12-28T19:53:05Z,2022-12-28T19:53:05Z,CONTRIBUTOR,"Your MCVE has `mode=""a""`, is that intentional? As far as defaults, what the docstring says is:
> The default mode is “a” if append_dim is set. Otherwise, it is “r+” if region is set and w- otherwise.
So without specifying mode, you get `mode=""w-""` which will not overwrite anything. With `mode=""a""`, your MCVE runs fine for me.
From using these a bit, my understanding is:
- `mode=""w-""` is the safe default that won't overwrite anything
- `mode=""w""` is only if you want to rewrite the **entire** zarr store, not for rewriting a subset of existing data
- `mode=""r+""` is for modifying values on some subset of the existing store specified by the index with `region`, but doesn't allow changing any dimension sizes
- `mode=""a""` is for completely overwriting existing variables*, appending new variables, or appending data along an existing dimension, paired with `append_dim`
\* I actually didn't realize this behavior until testing it out just now, but it is described in the docstring
","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,1512290017
https://github.com/pydata/xarray/issues/7362#issuecomment-1341818601,https://api.github.com/repos/pydata/xarray/issues/7362,1341818601,IC_kwDOAMm_X85P-obp,39069044,2022-12-08T00:54:26Z,2022-12-08T00:54:26Z,CONTRIBUTOR,"So many edge cases and different ways to use this function! I'm also a bit surprised `numpy`-only inputs work here, but if it does, we may as well support it. #7364 would be an easy way to handle this case.","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1478060173
https://github.com/pydata/xarray/pull/7229#issuecomment-1320983738,https://api.github.com/repos/pydata/xarray/issues/7229,1320983738,IC_kwDOAMm_X85OvJy6,39069044,2022-11-19T22:32:51Z,2022-11-19T22:32:51Z,CONTRIBUTOR,"> the change to explicitly constructing the `attrs` instead of working around quirks of `apply_ufunc` sounds good to me: when discussing this in the last meeting we did get the feeling that in the long run it would be better to think about redesigning that part of `apply_ufunc`.
Yeah I think this would be worth doing eventually. Trying to index a list of attributes of unpredictable length doesn't feel very xarray-like.
Any further refinements to the current approach of reconstructing attributes after `apply_ufunc` here, or is this good to go?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1424732975
https://github.com/pydata/xarray/pull/7229#issuecomment-1306498356,https://api.github.com/repos/pydata/xarray/issues/7229,1306498356,IC_kwDOAMm_X85N35U0,39069044,2022-11-08T01:45:59Z,2022-11-08T01:45:59Z,CONTRIBUTOR,"The latest commit should do what we want, consistently taking attrs of `x`. Casting to `DataArray` along with the explicit `broadcast` operation gets us empty attrs on both the variable and the coords.
The only way it deviates from this (spelled out in the tests) is to pull coord attrs from x, then y, then cond if any of these are scalars. I think this makes sense because if I pass `[DataArray, scalar, scalar]` for example I wouldn't expect it to drop the coordinate attrs.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1424732975
https://github.com/pydata/xarray/pull/7229#issuecomment-1304244938,https://api.github.com/repos/pydata/xarray/issues/7229,1304244938,IC_kwDOAMm_X85NvTLK,39069044,2022-11-04T20:55:02Z,2022-11-05T03:37:35Z,CONTRIBUTOR,"I considered the `[scalar, data_array, whatever]` possibility but this seems like a serious edge case. Passing a scalar condition is the same as just taking either `x` or `y` so what's the point?
As far as passing bare arrays, despite what the docstrings say it seems like you can actually do this with `DataArray.where`, etc:
```python
>>> x = xr.DataArray([1, 1], coords={""x"": [0, 1]})
>>> cond = xr.DataArray([True, False], coords={""x"": [0, 1]})
>>> x.where(cond, other=np.array([0, 2]))
array([1, 2])
Coordinates:
* x (x) int64 0 1
```
Which I don't think makes sense, but is mostly a separate issue. You do get a broadcast error if `y` is a bare array with different size.
After poking around I agree that this isn't easy to totally fix. I sort of started to go down the route of `1.` but it looked quite complicated. Option `2.` of wrapping non-xarray objects with `xr.Variable` only in `xr.where` might help us out a little here, I can try it. But, I think the current solution in this PR gets us back to the pre-#6461 behavior while still allowing for scalars and giving predictable behavior for everything except `cond=scalar` which I don't think is very important.
I'm just keen to get this merged in some form because the regression of #6461 is pretty bad. For example:
```python
ds = xr.tutorial.load_dataset('air_temperature')
xr.where(ds.air>10, ds.air, 10, keep_attrs=True).to_netcdf('foo.nc')
# completely fails because the time attrs have been overwritten by ds.air attrs
ValueError: failed to prevent overwriting existing key units in attrs on variable 'time'. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.
```
I hit exactly this issue on some existing scripts so this is preventing me from upgrading beyond `xarray=2022.03.0`.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1424732975
https://github.com/pydata/xarray/pull/7249#issuecomment-1302804069,https://api.github.com/repos/pydata/xarray/issues/7249,1302804069,IC_kwDOAMm_X85NpzZl,39069044,2022-11-03T23:57:33Z,2022-11-03T23:57:33Z,CONTRIBUTOR,Can we get #7229 in?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1433815234
https://github.com/pydata/xarray/issues/7220#issuecomment-1291957109,https://api.github.com/repos/pydata/xarray/issues/7220,1291957109,IC_kwDOAMm_X85NAbN1,39069044,2022-10-26T12:26:55Z,2022-10-26T12:26:55Z,CONTRIBUTOR,"Anything that uses `attrs[1]` and the `_get_all_of_type` helper is going to be hard to guarantee the behavior stated in the docstring, which is that we take the attrs of `x`. If `x` is a scalar, then `_get_all_of_type` returns a list of length 2 and `attrs[1]` ends up being the attrs of `y`. I think we may want to rework this, will try a few things later today.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423114234
https://github.com/pydata/xarray/pull/6978#issuecomment-1256225233,https://api.github.com/repos/pydata/xarray/issues/6978,1256225233,IC_kwDOAMm_X85K4HnR,39069044,2022-09-23T13:38:04Z,2022-09-23T13:38:04Z,CONTRIBUTOR,"> Is that something that will be deprecated or is it planned to keep the support for the kwargs dict forever?
Not sure if there are any strong opinions here? I don't see much harm in keeping it around but we could also deprecate.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1359368857
https://github.com/pydata/xarray/issues/7062#issuecomment-1254063029,https://api.github.com/repos/pydata/xarray/issues/7062,1254063029,IC_kwDOAMm_X85Kv3u1,39069044,2022-09-21T18:15:46Z,2022-09-21T18:15:46Z,CONTRIBUTOR,"Without `bottleneck`, I guess both options from the MVCE above get routed through `_mean`:
```python
>>> xr.set_options(use_bottleneck=False)
>>> da = xr.DataArray([1,2,3], coords={'x':[1,2,3]}).astype('float32')
>>> da.rolling(x=3, min_periods=1).mean().dtype
dtype('float64')
>>> da.chunk({'x':1}).rolling(x=3, min_periods=1).mean().dtype
dtype('float64')
```
And after #7063:
```python
>>> xr.set_options(use_bottleneck=False)
>>> da = xr.DataArray([1,2,3], coords={'x':[1,2,3]}).astype('float32')
>>> da.rolling(x=3, min_periods=1).mean().dtype
dtype('float32')
>>> da.chunk({'x':1}).rolling(x=3, min_periods=1).mean().dtype
dtype('float32')
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1381294181
https://github.com/pydata/xarray/pull/6978#issuecomment-1235974050,https://api.github.com/repos/pydata/xarray/issues/6978,1235974050,IC_kwDOAMm_X85Jq3ei,39069044,2022-09-02T23:23:51Z,2022-09-02T23:23:51Z,CONTRIBUTOR,"> Thanks for doing this @slevang ! Would you mind adding a tiny regression test too?
Right, I guess this actually breaks the previous way of passing kwargs and that is why the docs build failed. Hmmm. To go with the current changes, thoughts on adding something like this to the parsing logic:
```python
if kwargs is None:
kwargs = {}
elif ""kwargs"" in kwargs:
kwargs = {**kwargs.pop(""kwargs""), **kwargs}
```
to allow for the desired functionality but also handle the old case when someone passes `kwargs` as a dict? Feels wonky but it works. Is there a smarter way of doing this?
BTW it took me a minute to figure out what happened here because the docstring in the [original PR](https://github.com/pydata/xarray/pull/4849/files#diff-386f35dda14ca12d315f2eb71f0ab35d7fdc73419b5dc5175ea5720f3206903bR4470) was actually correct (requiring a dict, albeit maybe not the best way of passing kwargs), but then got changed in [#5182](https://github.com/pydata/xarray/commit/5b2257ee4fadcf0ebbf63ef30a836962dddde6f7#diff-763e3002fd954d544b05858d8d138b828b66b6a2a0ae3cd58d2040a652f14638R7173) to suggest that kwargs could be passed on their own. I see `.interp()` behaves the same, where only a dict of kwargs can be provided and passed to the underlying scipy interpolation. So it could be worth generalizing this for other methods where we are passing kwargs to a child function of some sort, to allow both of these patterns.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1359368857
https://github.com/pydata/xarray/issues/6968#issuecomment-1234368670,https://api.github.com/repos/pydata/xarray/issues/6968,1234368670,IC_kwDOAMm_X85Jkvie,39069044,2022-09-01T14:34:05Z,2022-09-01T14:34:05Z,CONTRIBUTOR,"I don't think `curvefit` works well with datetime coordinates at this point, because everything gets coerced to float by `apply_ufunc`. Probably room for improvement there. At a minimum you would need to specify good guesses (`p0`) and/or bounds in terms of `datetime64[ns]` values.
An easy enough workaround is to assign a separate non-datetime coordinate. This works:
```python
ds = xr.tutorial.open_dataset('air_temperature')
ds = ds.assign_coords({'day':(ds.time - ds.time[0]) / np.timedelta64(1, 'D')}).swap_dims({'time':'day'})
def periodic_season(x, a0, a1, a2, a3):
# periodic function with both phase amplitude and shift parameters
return a0 + a1 * np.cos(a2 * x - a3)
dn = ds.curvefit(
'day',
func=periodic_season,
p0={'a0':275, 'a1':15, 'a2':2*np.pi/365}
)
```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1355733363
https://github.com/pydata/xarray/issues/6799#issuecomment-1188348238,https://api.github.com/repos/pydata/xarray/issues/6799,1188348238,IC_kwDOAMm_X85G1MFO,39069044,2022-07-18T21:43:11Z,2022-07-18T21:43:11Z,CONTRIBUTOR,"The chunking structure on disk is pretty instrumental to my application, which requires fast retrievals of full slices in the time dimension. The loop option in my first post only takes about 10 seconds with `ni=1000` which is fine for my use case, so I'll probably go with that for now. It would be interesting to dig deeper though and see if there is a way to handle this better in the interp logic.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1307112340
https://github.com/pydata/xarray/issues/6799#issuecomment-1187729258,https://api.github.com/repos/pydata/xarray/issues/6799,1187729258,IC_kwDOAMm_X85Gy09q,39069044,2022-07-18T16:44:40Z,2022-07-18T16:44:40Z,CONTRIBUTOR,Interpolating on chunked dimensions doesn't work at all prior to #4155. The changes in #4069 are also relevant.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1307112340
https://github.com/pydata/xarray/pull/6461#issuecomment-1094166371,https://api.github.com/repos/pydata/xarray/issues/6461,1094166371,IC_kwDOAMm_X85BN6dj,39069044,2022-04-10T03:16:24Z,2022-04-10T03:16:24Z,CONTRIBUTOR,Good point @keewis. This version should maintain the expected behavior in that case.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1198058137
https://github.com/pydata/xarray/issues/6444#issuecomment-1093627256,https://api.github.com/repos/pydata/xarray/issues/6444,1093627256,IC_kwDOAMm_X85BL214,39069044,2022-04-09T03:09:49Z,2022-04-09T03:09:49Z,CONTRIBUTOR,I ran into this error as well with a bunch of broken xr.where calls where I had set a global option for keep_attrs=True. I pushed a possible fix in #6461.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1193704369
https://github.com/pydata/xarray/issues/6317#issuecomment-1057116313,https://api.github.com/repos/pydata/xarray/issues/6317,1057116313,IC_kwDOAMm_X84_AlCZ,39069044,2022-03-02T16:24:37Z,2022-03-02T16:24:37Z,CONTRIBUTOR,"These errors can often be eliminated by passing appropriate initial guesses for each parameter through the `p0` argument (which it looks like you're already doing), and/or by passing additional kwargs through to `scipy.optimize.leastsq()`, like `ftol` and `maxfev` which set the fit tolerance and max number of iterations.
Even so, there will be cases where some of your data just doesn't fit the desired function and you want to move on and return a result anyway. Your proposal seems fine as long as errors are raised by default and the documentation adequately describes the behavior when set to True.","{""total_count"": 5, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1157140667
https://github.com/pydata/xarray/pull/5933#issuecomment-1016805677,https://api.github.com/repos/pydata/xarray/issues/5933,1016805677,IC_kwDOAMm_X848mzkt,39069044,2022-01-19T19:39:57Z,2022-01-19T19:39:57Z,CONTRIBUTOR,"Not sure I understand the blockwise approach well enough to make a PR, but maybe I'll give it a try at some point.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1043746973
https://github.com/pydata/xarray/pull/5933#issuecomment-1015887354,https://api.github.com/repos/pydata/xarray/issues/5933,1015887354,IC_kwDOAMm_X848jTX6,39069044,2022-01-18T22:21:49Z,2022-01-18T22:45:16Z,CONTRIBUTOR,"> @slevang are you still interested to continue this PR? I think that would be a worthwhile addition and should not be too much left to do. (What would be nice, however, are tests for the issues this fixes.)
Definitely! I got distracted is all, and @dcherian posted a nice solution in [#5629](https://github.com/pydata/xarray/issues/5629#issuecomment-960133879) that could allow us to preserve the ability to fit along a chunked dimension using blockwise operations and the dask `lstsq` implementation used by the existing `polyfit` code.
I'm happy to pick this back up and finish it off if there is consensus on the right way forward, but the blockwise approach seemed promising so I put this on hold.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1043746973
https://github.com/pydata/xarray/issues/5629#issuecomment-885216203,https://api.github.com/repos/pydata/xarray/issues/5629,885216203,IC_kwDOAMm_X840w1PL,39069044,2021-07-22T20:37:04Z,2021-07-22T20:37:04Z,CONTRIBUTOR,"Potentially a useful comparison: here is the task graph for the same problem posed to `da.curvefit`, which uses `xr.apply_ufunc(scipy.optimize.curvefit, ..., vectorize=True)`. Looks a lot cleaner, and I think `xr.polyfit` could be re-implemented this way without much trouble.
```python
import xarray as xr
import dask.array as dsa
def linear(x, m, b):
return m * x + b
da = xr.DataArray(dsa.random.random((4,6, 100), chunks=(1,2,100)), dims=['x','y', 'time'])
fit = da.curvefit('time', linear).curvefit_coefficients
```

","{""total_count"": 3, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 3, ""rocket"": 0, ""eyes"": 0}",,950882492
https://github.com/pydata/xarray/issues/5165#issuecomment-820706339,https://api.github.com/repos/pydata/xarray/issues/5165,820706339,MDEyOklzc3VlQ29tbWVudDgyMDcwNjMzOQ==,39069044,2021-04-15T20:20:26Z,2021-04-15T20:20:26Z,CONTRIBUTOR,"In fact an even simpler calculation also shows the same problem:
```python
import xarray as xr
import pandas as pd
import numpy as np
import dask.array as da
dates = pd.date_range('1980-01-01', '2019-12-31', freq='D')
ds = xr.Dataset(
data_vars = {
'x1':(
('time', 'lat', 'lon'),
da.random.random(size=(dates.size, 360, 720), chunks=(1, -1, -1))),
'x2':(
('time', 'lat', 'lon'),
da.random.random(size=(dates.size, 360, 720), chunks=(1, -1, -1))),
},
coords = {
'time': dates,
'lat': np.arange(-90, 90, .5),
'lon': np.arange(-180, 180, .5),
}
)
ds['x3'] = ds.x2 - ds.x1
ds[['x3']].to_zarr('test-anom', mode='w')
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,859218255
https://github.com/pydata/xarray/pull/4849#issuecomment-810635014,https://api.github.com/repos/pydata/xarray/issues/4849,810635014,MDEyOklzc3VlQ29tbWVudDgxMDYzNTAxNA==,39069044,2021-03-30T23:04:52Z,2021-03-30T23:04:52Z,CONTRIBUTOR,I think so. I pushed a merge commit to get this up to date with the current release.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,797302408
https://github.com/pydata/xarray/pull/4849#issuecomment-782731258,https://api.github.com/repos/pydata/xarray/issues/4849,782731258,MDEyOklzc3VlQ29tbWVudDc4MjczMTI1OA==,39069044,2021-02-20T18:46:55Z,2021-02-20T18:46:55Z,CONTRIBUTOR,"Thanks for the review @dcherian! The latest commit has been refactored with a couple helper functions and associated tests, and any steps that served no purpose other than consistency with `polyfit` have been removed.
If you can think of any more specific test cases that should be included, happy to add them.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,797302408
https://github.com/pydata/xarray/issues/4894#issuecomment-780953464,https://api.github.com/repos/pydata/xarray/issues/4894,780953464,MDEyOklzc3VlQ29tbWVudDc4MDk1MzQ2NA==,39069044,2021-02-18T00:51:04Z,2021-02-18T00:51:04Z,CONTRIBUTOR,#4849 should be almost ready to go. Would be good for someone else to confirm the API makes sense. Additional features and performance enhancements could be added in later releases as long as the API is acceptable.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,806436724
https://github.com/pydata/xarray/issues/4826#issuecomment-780900086,https://api.github.com/repos/pydata/xarray/issues/4826,780900086,MDEyOklzc3VlQ29tbWVudDc4MDkwMDA4Ng==,39069044,2021-02-17T22:36:08Z,2021-02-17T22:36:08Z,CONTRIBUTOR,"I ran into this as well with the basic netcdf backends:
```python
import xarray as xr
ds = xr.Dataset(
data_vars={""foo"":([""x""], [False, True, False])},
coords={""x"": [1, 2, 3]},
)
ds.to_netcdf('test.nc')
ds = xr.load_dataset('test.nc')
print(ds.foo.dtype)
ds.to_netcdf('test.nc')
ds = xr.load_dataset('test.nc')
print(ds.foo.dtype)
```
Gives:
```
bool
int8
```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,789410367
https://github.com/pydata/xarray/pull/4849#issuecomment-777602757,https://api.github.com/repos/pydata/xarray/issues/4849,777602757,MDEyOklzc3VlQ29tbWVudDc3NzYwMjc1Nw==,39069044,2021-02-11T16:02:54Z,2021-02-11T16:02:54Z,CONTRIBUTOR,"I've been playing around with this some more, and found the performance to be much better using a process-heavy dask scheduler. For example:
```python
import xarray as xr
import numpy as np
import time
import dask
def exponential(x, a, xc):
return np.exp((x - xc) / a)
x = np.arange(-5, 5, 0.001)
t = np.arange(-5, 5, 0.01)
X, T = np.meshgrid(x, t)
Z1 = np.random.uniform(low=-5, high=5, size=X.shape)
Z2 = exponential(Z1, 3, X) + np.random.normal(scale=0.1, size=Z1.shape)
ds = xr.Dataset(
data_vars=dict(var1=([""t"", ""x""], Z1), var2=([""t"", ""x""], Z2)),
coords={""t"": t, ""x"": x},
)
ds = ds.chunk({'x':10})
def test_fit():
start = time.time()
fit = ds.var2.curvefit(
coords=ds.var1,
func=exponential,
reduce_dim=""t"",
).compute()
print(f'Fitting time: {time.time() - start:.2f}s')
with dask.config.set(scheduler='threads'):
test_fit()
with dask.config.set(scheduler='processes'):
test_fit()
with dask.distributed.Client() as client:
test_fit()
with dask.distributed.Client(n_workers=8, threads_per_worker=1) as client:
test_fit()
```
On my 8-core machine, takes:
```
Fitting time: 8.32s
Fitting time: 2.71s
Fitting time: 4.40s
Fitting time: 3.43s
```
According to [this](https://github.com/scipy/scipy/pull/8740) the underlying scipy routines should be thread safe.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,797302408
https://github.com/pydata/xarray/pull/4849#issuecomment-772874760,https://api.github.com/repos/pydata/xarray/issues/4849,772874760,MDEyOklzc3VlQ29tbWVudDc3Mjg3NDc2MA==,39069044,2021-02-03T22:37:50Z,2021-02-03T22:37:50Z,CONTRIBUTOR,"Added some checks that will raise errors if `inspect.signature` is not able to determine the function arguments, either for functions with variable-length args or things like numpy ufuncs that don't seem to work with `signature`. In these cases, `param_names` can be passed manually.
Also added minimal tests, but these should probably be expanded.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,797302408
https://github.com/pydata/xarray/pull/4849#issuecomment-771907303,https://api.github.com/repos/pydata/xarray/issues/4849,771907303,MDEyOklzc3VlQ29tbWVudDc3MTkwNzMwMw==,39069044,2021-02-02T19:19:01Z,2021-02-02T19:19:01Z,CONTRIBUTOR,"Added a couple usage examples in the docs, including one that replicates the [scipy](https://scipython.com/blog/non-linear-least-squares-fitting-of-a-two-dimensional-data/) example of fitting multiple peaks. Because of the wrapper function and variable args, this requires supplying `param_names` manually. Works nicely though.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,797302408
https://github.com/pydata/xarray/pull/4849#issuecomment-771320891,https://api.github.com/repos/pydata/xarray/issues/4849,771320891,MDEyOklzc3VlQ29tbWVudDc3MTMyMDg5MQ==,39069044,2021-02-02T03:12:41Z,2021-02-02T03:12:41Z,CONTRIBUTOR,"Some more progress here.
1. I liked the idea of being able to supply `p0` and `bounds` as a dictionary mapping between parameter names and values, which is much more in the style of xarray. So I've included the logic to translate between this and the scipy args which are just ordered lists. The logic first prioritizes any of these values supplied directly in keyword args, then looks for defaults in the function definition, and otherwise falls back to the scipy defaults.
2. There's now a working implementation of multi-dimensional fits using `ravel` to flatten the coordinate arrays. One limitation is that `func` must be specified like `f((x, y), z)` as scipy takes a single `x` argument for the fitting coordinate. Maybe there is a clever way around this, but I think it's ok.
3. Some changes to the API.
The best way to specify the fitting coordinates is a bit tricky to figure out. My original use case for this was needing to fit a relationship between two time/lat/lon dataarrays with the fit done over all time. But probably a more common use would be to just fit a curve over one or two dimensions that already exist in your data. So it would be great to handle these possibilities seamlessly.
What I've settled on for now is a `coords` argument that takes a list of coordinates that should be the same length as the input coordinates of your fitting function. Additionally, there is a `reduce_dim` arg, so that, say if you want to fit a function in time but aggregate over all lat and lon, you can supply those dimensions here. So with a 3D x/y/time array, any of the following should work:
```python
# Fit a 1d function in time, returns parameters with dims (x, y)
da.curvefit(coords='time', ...)
# Fit a 2d function in space, returns parameters with dims (t)
da.curvefit(coords=['x', 'y'], ...)
# Fit a 1d function with another 3d dataarray and aggregate over time, returns parameters with dims (x, y)
da.curvefit(coords=da1, reduce_dim='time', ...)
```
The logic to make this work got a bit complicated, since we need to supply the right `input_core_dims` to `apply_ufunc`, and also to explicitly broadcast the coordinates to ensure the `ravel` operation works. Any thoughts on cleanup here appreciated.
Will eventually need to add tests and improve docs and examples. Tests especially I could use some help on.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,797302408
https://github.com/pydata/xarray/pull/4849#issuecomment-770395357,https://api.github.com/repos/pydata/xarray/issues/4849,770395357,MDEyOklzc3VlQ29tbWVudDc3MDM5NTM1Nw==,39069044,2021-01-31T14:59:12Z,2021-01-31T14:59:12Z,CONTRIBUTOR,"> 1. Different fit coefficients as differently-named variables in the output, rather than indexed with a coordinate.
I think the way I configured things now does replicate the polyfit results. For example:
```python
ds = xr.tutorial.open_dataset('air_temperature')
ds['air2'] = ds.air.copy()
ds.polyfit(dim='time', deg=2)
Dimensions: (degree: 3, lat: 25, lon: 53)
Coordinates:
* degree (degree) int64 2 1 0
* lat (lat) float64 75.0 72.5 70.0 ... 20.0 17.5 15.0
* lon (lon) float64 200.0 202.5 205.0 ... 327.5 330.0
Data variables:
air_polyfit_coefficients (degree, lat, lon) float64 -1.162e-32 ... 1.13...
air2_polyfit_coefficients (degree, lat, lon) float64 -1.162e-32 ... 1.14...
```
Compared to this:
```python
def square(x, a, b ,c):
return a*np.power(x, 2) + b*x + c
ds.curvefit(x=ds.time, dim='time', func=square)
Dimensions: (lat: 25, lon: 53, param: 3)
Coordinates:
* lat (lat) float32 75.0 72.5 70.0 ... 20.0 17.5 15.0
* lon (lon) float32 200.0 202.5 205.0 ... 327.5 330.0
* param (param) 2. Initial guesses for each fit parameter.
Yeah this would be good. Should be easy to look for default values in the function itself using `inspect`.
> 3. (Stretch goal) Ability to fit >1D functions
Looks like this could be possible with a call to `ravel` like [this](https://scipython.com/blog/non-linear-least-squares-fitting-of-a-two-dimensional-data/). I'll do some experimenting.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,797302408
https://github.com/pydata/xarray/issues/4300#issuecomment-770133047,https://api.github.com/repos/pydata/xarray/issues/4300,770133047,MDEyOklzc3VlQ29tbWVudDc3MDEzMzA0Nw==,39069044,2021-01-30T01:38:18Z,2021-01-30T01:38:18Z,CONTRIBUTOR,"I needed this functionality for a project, and piggy-backing off the last couple of comments decided the `curve_fit` wrapped by `apply_ufunc` approach works quite well. I put together a PR in #4849. Any feedback welcome!","{""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 2, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,671609109