home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

43 rows where author_association = "CONTRIBUTOR" and user = 39069044 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 26

  • Basic curvefit implementation 7
  • Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` 3
  • Differences in `to_netcdf` for dask and numpy backed arrays 3
  • xr.where increase the bytes of the dataset 3
  • Reimplement `.polyfit()` with `apply_ufunc` 2
  • `interp` performance with chunked dimensions 2
  • fix passing of curvefit kwargs 2
  • Zarr error when trying to overwrite part of existing store 2
  • Fix groupby binary ops when grouped array is subset relative to other 2
  • General curve fitting method 1
  • Reading and writing a zarr dataset multiple times casts bools to int8 1
  • release 0.17.0 1
  • Poor memory management with dask=2021.4.0 1
  • Polyfit performance on large datasets - Suboptimal dask task graph 1
  • Handle scipy fitting errors in `xarray.curve_fit` 1
  • xr.where with scalar as second argument fails with keep_attrs=True 1
  • Fix `xr.where(..., keep_attrs=True)` bug 1
  • Extracting annual amplitude and phase of xarray dataset for each pixel using xarray.DataArray.curvefit 1
  • Rolling mean on dask array does not preserve dtype 1
  • `xr.where(..., keep_attrs=True)` overwrites coordinate attributes 1
  • whats-new for 2022.11.0 1
  • `xr.where()` with numpy inputs produces error if global options changed 1
  • misleading error message for attempting to load non existing file 1
  • xr.where loses attributes despite keep_attrs=True 1
  • Supplying multidimensional initial guess to `curvefit` 1
  • Implement multidimensional initial guess and bounds for `curvefit` 1

user 1

  • slevang · 43 ✖

author_association 1

  • CONTRIBUTOR · 43 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1537162245 https://github.com/pydata/xarray/pull/7821#issuecomment-1537162245 https://api.github.com/repos/pydata/xarray/issues/7821 IC_kwDOAMm_X85bnzwF slevang 39069044 2023-05-06T15:12:57Z 2023-05-06T15:12:57Z CONTRIBUTOR

This looks pretty good to me on first glance! I would vote to do p0 and bounds in one PR. It could surprise a user if one of these can take an array and the other only scalars.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement multidimensional initial guess and bounds for `curvefit` 1698626185
1532986706 https://github.com/pydata/xarray/pull/7798#issuecomment-1532986706 https://api.github.com/repos/pydata/xarray/issues/7798 IC_kwDOAMm_X85bX4VS slevang 39069044 2023-05-03T12:58:35Z 2023-05-03T12:58:35Z CONTRIBUTOR

Would it be possible to run another bug fix release with this incorporated? Or I guess we're already on to 2023.5.0 according to the date.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix groupby binary ops when grouped array is subset relative to other 1689773381
1529050645 https://github.com/pydata/xarray/pull/7798#issuecomment-1529050645 https://api.github.com/repos/pydata/xarray/issues/7798 IC_kwDOAMm_X85bI3YV slevang 39069044 2023-04-30T15:20:47Z 2023-04-30T15:20:47Z CONTRIBUTOR

Thanks for the quick fix! Not sure about the bitshift test but I'm assuming @headtr1ck is right.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix groupby binary ops when grouped array is subset relative to other 1689773381
1516681755 https://github.com/pydata/xarray/issues/7768#issuecomment-1516681755 https://api.github.com/repos/pydata/xarray/issues/7768 IC_kwDOAMm_X85aZrob slevang 39069044 2023-04-20T17:16:39Z 2023-04-20T17:16:39Z CONTRIBUTOR

This should be doable. I think we would have to rework the apply_ufunc wrapper to pass p0 and bounds as DataArrays through *args instead of simple dictionaries throughkwargs, so that apply_ufunc can broadcast them and handle dask chunks.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Supplying multidimensional initial guess to `curvefit` 1674818753
1457345587 https://github.com/pydata/xarray/issues/7587#issuecomment-1457345587 https://api.github.com/repos/pydata/xarray/issues/7587 IC_kwDOAMm_X85W3VQz slevang 39069044 2023-03-07T01:34:12Z 2023-03-07T01:34:12Z CONTRIBUTOR

Your m0tot variable is also being broadcast in the fami dimension. So, an additional 10x384x1233x8/1e6=37MB.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where increase the bytes of the dataset  1611288905
1457080267 https://github.com/pydata/xarray/issues/7587#issuecomment-1457080267 https://api.github.com/repos/pydata/xarray/issues/7587 IC_kwDOAMm_X85W2UfL slevang 39069044 2023-03-06T22:06:11Z 2023-03-06T22:06:11Z CONTRIBUTOR

Same issue as #1234. This has tripped me up before as well. A kwarg to control this behavior would be a nice enhancement to .where().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where increase the bytes of the dataset  1611288905
1457061064 https://github.com/pydata/xarray/issues/7587#issuecomment-1457061064 https://api.github.com/repos/pydata/xarray/issues/7587 IC_kwDOAMm_X85W2PzI slevang 39069044 2023-03-06T21:55:14Z 2023-03-06T21:55:14Z CONTRIBUTOR

Since you're using tp (dims fami, time, site) as the condition, these dimensions are broadcast across all other variables in the dataset. The problem looks to be your variable wshedOut, which is now broadcast across all 5 dimensions in the dataset, hence greatly increased memory usage.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where increase the bytes of the dataset  1611288905
1454968267 https://github.com/pydata/xarray/issues/7581#issuecomment-1454968267 https://api.github.com/repos/pydata/xarray/issues/7581 IC_kwDOAMm_X85WuQ3L slevang 39069044 2023-03-05T02:45:42Z 2023-03-05T02:45:42Z CONTRIBUTOR

xr.where(cond, x, y, keep_attrs=True) unambiguously takes the attributes of x as described in the docs, even if x is a scalar and has no attributes. This can be inconvenient at times but is intentional for consistency. You can invert the condition as you suggest, or better yet use DataArray.where/Dataset.where to ensure you are getting the attrs you expect.

In what case would a!=0 not mirror a==0?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where loses attributes despite keep_attrs=True 1608352581
1450841385 https://github.com/pydata/xarray/issues/7522#issuecomment-1450841385 https://api.github.com/repos/pydata/xarray/issues/7522 IC_kwDOAMm_X85WehUp slevang 39069044 2023-03-01T21:01:48Z 2023-03-01T21:01:48Z CONTRIBUTOR

Yeah that seems to be it. Dask's write neatly packs all the needed metadata at the beginning of the file, since we can scale this up to a many GB file with dozens of variables and still read in ~100ms. While xarray is doing a less well organized write of the metadata and we have to go seeking in the middle of the byte range. cache_type="first" does provide some improvement but still not as good as on the dask-written file.

FWIW, I inspected the actual bytes of the dask and xarray written files and they are identical for a single variable, but diverge when multiple variables are being written. So, the important differences are probably associated with this step:

It does set up the whole set of variables as a initialisation stage before writing any data - I don't know if xarray does this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Differences in `to_netcdf` for dask and numpy backed arrays 1581046647
1449302032 https://github.com/pydata/xarray/issues/7522#issuecomment-1449302032 https://api.github.com/repos/pydata/xarray/issues/7522 IC_kwDOAMm_X85WYpgQ slevang 39069044 2023-03-01T04:04:25Z 2023-03-01T04:04:25Z CONTRIBUTOR

The slow file:

And the fast file:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Differences in `to_netcdf` for dask and numpy backed arrays 1581046647
1428872842 https://github.com/pydata/xarray/issues/7522#issuecomment-1428872842 https://api.github.com/repos/pydata/xarray/issues/7522 IC_kwDOAMm_X85VKt6K slevang 39069044 2023-02-13T23:49:31Z 2023-02-13T23:49:31Z CONTRIBUTOR

I did try many loops and different order of operations to make sure this isn't a caching or auth issue. You can see the std dev of the timeit calls above is pretty consistent.

For my actual use case, the difference is very apparent, with open_dataset taking about 9 seconds on the numpy-saved file and <1 second on the dask-saved one. I can also clearly see when monitoring network traffic that the slow version has to read in hundreds of MB of data to open the dataset, while the fast one only reads the tiny headers.

I also inspected the actual header bytes of these two files and see they are indeed different.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Differences in `to_netcdf` for dask and numpy backed arrays 1581046647
1401116054 https://github.com/pydata/xarray/issues/7435#issuecomment-1401116054 https://api.github.com/repos/pydata/xarray/issues/7435 IC_kwDOAMm_X85Tg1WW slevang 39069044 2023-01-23T22:53:14Z 2023-01-23T22:53:14Z CONTRIBUTOR

You do get a FileNotFoundError if you explicitly specify an engine with xarray.load_dataset('not-existing-file.h5', engine='h5netcdf').

It looks like neither NetCDF4BackendEntrypoint or H5netcdfBackendEntrypoint include .h5 in the set of openable extensions they handle in guess_can_open, but they will work if they can peer into the file and detect a valid .h5. Probably some good reason for this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  misleading error message for attempting to load non existing file 1532317033
1379504132 https://github.com/pydata/xarray/issues/7403#issuecomment-1379504132 https://api.github.com/repos/pydata/xarray/issues/7403 IC_kwDOAMm_X85SOZAE slevang 39069044 2023-01-11T21:26:10Z 2023-01-11T21:26:10Z CONTRIBUTOR

Personally, I'll make the argument that we should switch (carefully) from -w to w for the default case.

This makes sense in principal for consistency with other i/o methods, but I think could go very wrong in practice, especially for inexperienced users.

For example, a misplaced command of ds.to_zarr("gs://bucket_name") with default mode="w" overwrites a user's entire bucket, and same thing for local storage. to_netcdf doesn't have the capability to rm -r a directory, so I would argue to_zarr(mode="w") is much more dangerous.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr error when trying to overwrite part of existing store 1512290017
1366882900 https://github.com/pydata/xarray/issues/7403#issuecomment-1366882900 https://api.github.com/repos/pydata/xarray/issues/7403 IC_kwDOAMm_X85RePpU slevang 39069044 2022-12-28T19:53:05Z 2022-12-28T19:53:05Z CONTRIBUTOR

Your MCVE has mode="a", is that intentional? As far as defaults, what the docstring says is:

The default mode is “a” if append_dim is set. Otherwise, it is “r+” if region is set and w- otherwise.

So without specifying mode, you get mode="w-" which will not overwrite anything. With mode="a", your MCVE runs fine for me.

From using these a bit, my understanding is:

  • mode="w-" is the safe default that won't overwrite anything
  • mode="w" is only if you want to rewrite the entire zarr store, not for rewriting a subset of existing data
  • mode="r+" is for modifying values on some subset of the existing store specified by the index with region, but doesn't allow changing any dimension sizes
  • mode="a" is for completely overwriting existing variables*, appending new variables, or appending data along an existing dimension, paired with append_dim

* I actually didn't realize this behavior until testing it out just now, but it is described in the docstring

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  Zarr error when trying to overwrite part of existing store 1512290017
1341818601 https://github.com/pydata/xarray/issues/7362#issuecomment-1341818601 https://api.github.com/repos/pydata/xarray/issues/7362 IC_kwDOAMm_X85P-obp slevang 39069044 2022-12-08T00:54:26Z 2022-12-08T00:54:26Z CONTRIBUTOR

So many edge cases and different ways to use this function! I'm also a bit surprised numpy-only inputs work here, but if it does, we may as well support it. #7364 would be an easy way to handle this case.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `xr.where()` with numpy inputs produces error if global options changed 1478060173
1320983738 https://github.com/pydata/xarray/pull/7229#issuecomment-1320983738 https://api.github.com/repos/pydata/xarray/issues/7229 IC_kwDOAMm_X85OvJy6 slevang 39069044 2022-11-19T22:32:51Z 2022-11-19T22:32:51Z CONTRIBUTOR

the change to explicitly constructing the attrs instead of working around quirks of apply_ufunc sounds good to me: when discussing this in the last meeting we did get the feeling that in the long run it would be better to think about redesigning that part of apply_ufunc.

Yeah I think this would be worth doing eventually. Trying to index a list of attributes of unpredictable length doesn't feel very xarray-like.

Any further refinements to the current approach of reconstructing attributes after apply_ufunc here, or is this good to go?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` 1424732975
1306498356 https://github.com/pydata/xarray/pull/7229#issuecomment-1306498356 https://api.github.com/repos/pydata/xarray/issues/7229 IC_kwDOAMm_X85N35U0 slevang 39069044 2022-11-08T01:45:59Z 2022-11-08T01:45:59Z CONTRIBUTOR

The latest commit should do what we want, consistently taking attrs of x. Casting to DataArray along with the explicit broadcast operation gets us empty attrs on both the variable and the coords.

The only way it deviates from this (spelled out in the tests) is to pull coord attrs from x, then y, then cond if any of these are scalars. I think this makes sense because if I pass [DataArray, scalar, scalar] for example I wouldn't expect it to drop the coordinate attrs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` 1424732975
1304244938 https://github.com/pydata/xarray/pull/7229#issuecomment-1304244938 https://api.github.com/repos/pydata/xarray/issues/7229 IC_kwDOAMm_X85NvTLK slevang 39069044 2022-11-04T20:55:02Z 2022-11-05T03:37:35Z CONTRIBUTOR

I considered the [scalar, data_array, whatever] possibility but this seems like a serious edge case. Passing a scalar condition is the same as just taking either x or y so what's the point?

As far as passing bare arrays, despite what the docstrings say it seems like you can actually do this with DataArray.where, etc: ```python

x = xr.DataArray([1, 1], coords={"x": [0, 1]}) cond = xr.DataArray([True, False], coords={"x": [0, 1]}) x.where(cond, other=np.array([0, 2]))

<xarray.DataArray (x: 2)> array([1, 2]) Coordinates: * x (x) int64 0 1 `` Which I don't think makes sense, but is mostly a separate issue. You do get a broadcast error ify` is a bare array with different size.

After poking around I agree that this isn't easy to totally fix. I sort of started to go down the route of 1. but it looked quite complicated. Option 2. of wrapping non-xarray objects with xr.Variable only in xr.where might help us out a little here, I can try it. But, I think the current solution in this PR gets us back to the pre-#6461 behavior while still allowing for scalars and giving predictable behavior for everything except cond=scalar which I don't think is very important.

I'm just keen to get this merged in some form because the regression of #6461 is pretty bad. For example: ```python ds = xr.tutorial.load_dataset('air_temperature') xr.where(ds.air>10, ds.air, 10, keep_attrs=True).to_netcdf('foo.nc')

completely fails because the time attrs have been overwritten by ds.air attrs

ValueError: failed to prevent overwriting existing key units in attrs on variable 'time'. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually. ```

I hit exactly this issue on some existing scripts so this is preventing me from upgrading beyond xarray=2022.03.0.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` 1424732975
1302804069 https://github.com/pydata/xarray/pull/7249#issuecomment-1302804069 https://api.github.com/repos/pydata/xarray/issues/7249 IC_kwDOAMm_X85NpzZl slevang 39069044 2022-11-03T23:57:33Z 2022-11-03T23:57:33Z CONTRIBUTOR

Can we get #7229 in?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  whats-new for 2022.11.0 1433815234
1291957109 https://github.com/pydata/xarray/issues/7220#issuecomment-1291957109 https://api.github.com/repos/pydata/xarray/issues/7220 IC_kwDOAMm_X85NAbN1 slevang 39069044 2022-10-26T12:26:55Z 2022-10-26T12:26:55Z CONTRIBUTOR

Anything that uses attrs[1] and the _get_all_of_type helper is going to be hard to guarantee the behavior stated in the docstring, which is that we take the attrs of x. If x is a scalar, then _get_all_of_type returns a list of length 2 and attrs[1] ends up being the attrs of y. I think we may want to rework this, will try a few things later today.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `xr.where(..., keep_attrs=True)` overwrites coordinate attributes 1423114234
1256225233 https://github.com/pydata/xarray/pull/6978#issuecomment-1256225233 https://api.github.com/repos/pydata/xarray/issues/6978 IC_kwDOAMm_X85K4HnR slevang 39069044 2022-09-23T13:38:04Z 2022-09-23T13:38:04Z CONTRIBUTOR

Is that something that will be deprecated or is it planned to keep the support for the kwargs dict forever?

Not sure if there are any strong opinions here? I don't see much harm in keeping it around but we could also deprecate.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix passing of curvefit kwargs 1359368857
1254063029 https://github.com/pydata/xarray/issues/7062#issuecomment-1254063029 https://api.github.com/repos/pydata/xarray/issues/7062 IC_kwDOAMm_X85Kv3u1 slevang 39069044 2022-09-21T18:15:46Z 2022-09-21T18:15:46Z CONTRIBUTOR

Without bottleneck, I guess both options from the MVCE above get routed through _mean: ```python

xr.set_options(use_bottleneck=False) da = xr.DataArray([1,2,3], coords={'x':[1,2,3]}).astype('float32') da.rolling(x=3, min_periods=1).mean().dtype dtype('float64') da.chunk({'x':1}).rolling(x=3, min_periods=1).mean().dtype dtype('float64') ```

And after #7063: ```python

xr.set_options(use_bottleneck=False) da = xr.DataArray([1,2,3], coords={'x':[1,2,3]}).astype('float32') da.rolling(x=3, min_periods=1).mean().dtype dtype('float32') da.chunk({'x':1}).rolling(x=3, min_periods=1).mean().dtype dtype('float32') ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Rolling mean on dask array does not preserve dtype 1381294181
1235974050 https://github.com/pydata/xarray/pull/6978#issuecomment-1235974050 https://api.github.com/repos/pydata/xarray/issues/6978 IC_kwDOAMm_X85Jq3ei slevang 39069044 2022-09-02T23:23:51Z 2022-09-02T23:23:51Z CONTRIBUTOR

Thanks for doing this @slevang ! Would you mind adding a tiny regression test too?

Right, I guess this actually breaks the previous way of passing kwargs and that is why the docs build failed. Hmmm. To go with the current changes, thoughts on adding something like this to the parsing logic: python if kwargs is None: kwargs = {} elif "kwargs" in kwargs: kwargs = {**kwargs.pop("kwargs"), **kwargs} to allow for the desired functionality but also handle the old case when someone passes kwargs as a dict? Feels wonky but it works. Is there a smarter way of doing this?

BTW it took me a minute to figure out what happened here because the docstring in the original PR was actually correct (requiring a dict, albeit maybe not the best way of passing kwargs), but then got changed in #5182 to suggest that kwargs could be passed on their own. I see .interp() behaves the same, where only a dict of kwargs can be provided and passed to the underlying scipy interpolation. So it could be worth generalizing this for other methods where we are passing kwargs to a child function of some sort, to allow both of these patterns.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix passing of curvefit kwargs 1359368857
1234368670 https://github.com/pydata/xarray/issues/6968#issuecomment-1234368670 https://api.github.com/repos/pydata/xarray/issues/6968 IC_kwDOAMm_X85Jkvie slevang 39069044 2022-09-01T14:34:05Z 2022-09-01T14:34:05Z CONTRIBUTOR

I don't think curvefit works well with datetime coordinates at this point, because everything gets coerced to float by apply_ufunc. Probably room for improvement there. At a minimum you would need to specify good guesses (p0) and/or bounds in terms of datetime64[ns] values.

An easy enough workaround is to assign a separate non-datetime coordinate. This works: ```python ds = xr.tutorial.open_dataset('air_temperature') ds = ds.assign_coords({'day':(ds.time - ds.time[0]) / np.timedelta64(1, 'D')}).swap_dims({'time':'day'})

def periodic_season(x, a0, a1, a2, a3): # periodic function with both phase amplitude and shift parameters return a0 + a1 * np.cos(a2 * x - a3)

dn = ds.curvefit( 'day', func=periodic_season, p0={'a0':275, 'a1':15, 'a2':2*np.pi/365} ) ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extracting annual amplitude and phase of xarray dataset for each pixel using xarray.DataArray.curvefit 1355733363
1188348238 https://github.com/pydata/xarray/issues/6799#issuecomment-1188348238 https://api.github.com/repos/pydata/xarray/issues/6799 IC_kwDOAMm_X85G1MFO slevang 39069044 2022-07-18T21:43:11Z 2022-07-18T21:43:11Z CONTRIBUTOR

The chunking structure on disk is pretty instrumental to my application, which requires fast retrievals of full slices in the time dimension. The loop option in my first post only takes about 10 seconds with ni=1000 which is fine for my use case, so I'll probably go with that for now. It would be interesting to dig deeper though and see if there is a way to handle this better in the interp logic.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `interp` performance with chunked dimensions 1307112340
1187729258 https://github.com/pydata/xarray/issues/6799#issuecomment-1187729258 https://api.github.com/repos/pydata/xarray/issues/6799 IC_kwDOAMm_X85Gy09q slevang 39069044 2022-07-18T16:44:40Z 2022-07-18T16:44:40Z CONTRIBUTOR

Interpolating on chunked dimensions doesn't work at all prior to #4155. The changes in #4069 are also relevant.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `interp` performance with chunked dimensions 1307112340
1094166371 https://github.com/pydata/xarray/pull/6461#issuecomment-1094166371 https://api.github.com/repos/pydata/xarray/issues/6461 IC_kwDOAMm_X85BN6dj slevang 39069044 2022-04-10T03:16:24Z 2022-04-10T03:16:24Z CONTRIBUTOR

Good point @keewis. This version should maintain the expected behavior in that case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix `xr.where(..., keep_attrs=True)` bug 1198058137
1093627256 https://github.com/pydata/xarray/issues/6444#issuecomment-1093627256 https://api.github.com/repos/pydata/xarray/issues/6444 IC_kwDOAMm_X85BL214 slevang 39069044 2022-04-09T03:09:49Z 2022-04-09T03:09:49Z CONTRIBUTOR

I ran into this error as well with a bunch of broken xr.where calls where I had set a global option for keep_attrs=True. I pushed a possible fix in #6461.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where with scalar as second argument fails with keep_attrs=True 1193704369
1057116313 https://github.com/pydata/xarray/issues/6317#issuecomment-1057116313 https://api.github.com/repos/pydata/xarray/issues/6317 IC_kwDOAMm_X84_AlCZ slevang 39069044 2022-03-02T16:24:37Z 2022-03-02T16:24:37Z CONTRIBUTOR

These errors can often be eliminated by passing appropriate initial guesses for each parameter through the p0 argument (which it looks like you're already doing), and/or by passing additional kwargs through to scipy.optimize.leastsq(), like ftol and maxfev which set the fit tolerance and max number of iterations.

Even so, there will be cases where some of your data just doesn't fit the desired function and you want to move on and return a result anyway. Your proposal seems fine as long as errors are raised by default and the documentation adequately describes the behavior when set to True.

{
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handle scipy fitting errors in `xarray.curve_fit` 1157140667
1016805677 https://github.com/pydata/xarray/pull/5933#issuecomment-1016805677 https://api.github.com/repos/pydata/xarray/issues/5933 IC_kwDOAMm_X848mzkt slevang 39069044 2022-01-19T19:39:57Z 2022-01-19T19:39:57Z CONTRIBUTOR

Not sure I understand the blockwise approach well enough to make a PR, but maybe I'll give it a try at some point.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reimplement `.polyfit()` with `apply_ufunc` 1043746973
1015887354 https://github.com/pydata/xarray/pull/5933#issuecomment-1015887354 https://api.github.com/repos/pydata/xarray/issues/5933 IC_kwDOAMm_X848jTX6 slevang 39069044 2022-01-18T22:21:49Z 2022-01-18T22:45:16Z CONTRIBUTOR

@slevang are you still interested to continue this PR? I think that would be a worthwhile addition and should not be too much left to do. (What would be nice, however, are tests for the issues this fixes.)

Definitely! I got distracted is all, and @dcherian posted a nice solution in #5629 that could allow us to preserve the ability to fit along a chunked dimension using blockwise operations and the dask lstsq implementation used by the existing polyfit code.

I'm happy to pick this back up and finish it off if there is consensus on the right way forward, but the blockwise approach seemed promising so I put this on hold.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reimplement `.polyfit()` with `apply_ufunc` 1043746973
885216203 https://github.com/pydata/xarray/issues/5629#issuecomment-885216203 https://api.github.com/repos/pydata/xarray/issues/5629 IC_kwDOAMm_X840w1PL slevang 39069044 2021-07-22T20:37:04Z 2021-07-22T20:37:04Z CONTRIBUTOR

Potentially a useful comparison: here is the task graph for the same problem posed to da.curvefit, which uses xr.apply_ufunc(scipy.optimize.curvefit, ..., vectorize=True). Looks a lot cleaner, and I think xr.polyfit could be re-implemented this way without much trouble. ```python import xarray as xr import dask.array as dsa

def linear(x, m, b): return m * x + b

da = xr.DataArray(dsa.random.random((4,6, 100), chunks=(1,2,100)), dims=['x','y', 'time']) fit = da.curvefit('time', linear).curvefit_coefficients ```

{
    "total_count": 3,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 3,
    "rocket": 0,
    "eyes": 0
}
  Polyfit performance on large datasets - Suboptimal dask task graph 950882492
820706339 https://github.com/pydata/xarray/issues/5165#issuecomment-820706339 https://api.github.com/repos/pydata/xarray/issues/5165 MDEyOklzc3VlQ29tbWVudDgyMDcwNjMzOQ== slevang 39069044 2021-04-15T20:20:26Z 2021-04-15T20:20:26Z CONTRIBUTOR

In fact an even simpler calculation also shows the same problem: ```python import xarray as xr import pandas as pd import numpy as np import dask.array as da

dates = pd.date_range('1980-01-01', '2019-12-31', freq='D') ds = xr.Dataset( data_vars = { 'x1':( ('time', 'lat', 'lon'), da.random.random(size=(dates.size, 360, 720), chunks=(1, -1, -1))), 'x2':( ('time', 'lat', 'lon'), da.random.random(size=(dates.size, 360, 720), chunks=(1, -1, -1))), }, coords = { 'time': dates, 'lat': np.arange(-90, 90, .5), 'lon': np.arange(-180, 180, .5), } )

ds['x3'] = ds.x2 - ds.x1 ds[['x3']].to_zarr('test-anom', mode='w') ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Poor memory management with dask=2021.4.0 859218255
810635014 https://github.com/pydata/xarray/pull/4849#issuecomment-810635014 https://api.github.com/repos/pydata/xarray/issues/4849 MDEyOklzc3VlQ29tbWVudDgxMDYzNTAxNA== slevang 39069044 2021-03-30T23:04:52Z 2021-03-30T23:04:52Z CONTRIBUTOR

I think so. I pushed a merge commit to get this up to date with the current release.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Basic curvefit implementation 797302408
782731258 https://github.com/pydata/xarray/pull/4849#issuecomment-782731258 https://api.github.com/repos/pydata/xarray/issues/4849 MDEyOklzc3VlQ29tbWVudDc4MjczMTI1OA== slevang 39069044 2021-02-20T18:46:55Z 2021-02-20T18:46:55Z CONTRIBUTOR

Thanks for the review @dcherian! The latest commit has been refactored with a couple helper functions and associated tests, and any steps that served no purpose other than consistency with polyfit have been removed.

If you can think of any more specific test cases that should be included, happy to add them.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Basic curvefit implementation 797302408
780953464 https://github.com/pydata/xarray/issues/4894#issuecomment-780953464 https://api.github.com/repos/pydata/xarray/issues/4894 MDEyOklzc3VlQ29tbWVudDc4MDk1MzQ2NA== slevang 39069044 2021-02-18T00:51:04Z 2021-02-18T00:51:04Z CONTRIBUTOR

4849 should be almost ready to go. Would be good for someone else to confirm the API makes sense. Additional features and performance enhancements could be added in later releases as long as the API is acceptable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  release 0.17.0 806436724
780900086 https://github.com/pydata/xarray/issues/4826#issuecomment-780900086 https://api.github.com/repos/pydata/xarray/issues/4826 MDEyOklzc3VlQ29tbWVudDc4MDkwMDA4Ng== slevang 39069044 2021-02-17T22:36:08Z 2021-02-17T22:36:08Z CONTRIBUTOR

I ran into this as well with the basic netcdf backends:

```python import xarray as xr

ds = xr.Dataset( data_vars={"foo":(["x"], [False, True, False])}, coords={"x": [1, 2, 3]}, ) ds.to_netcdf('test.nc') ds = xr.load_dataset('test.nc') print(ds.foo.dtype) ds.to_netcdf('test.nc') ds = xr.load_dataset('test.nc') print(ds.foo.dtype) Gives: bool int8 ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reading and writing a zarr dataset multiple times casts bools to int8 789410367
777602757 https://github.com/pydata/xarray/pull/4849#issuecomment-777602757 https://api.github.com/repos/pydata/xarray/issues/4849 MDEyOklzc3VlQ29tbWVudDc3NzYwMjc1Nw== slevang 39069044 2021-02-11T16:02:54Z 2021-02-11T16:02:54Z CONTRIBUTOR

I've been playing around with this some more, and found the performance to be much better using a process-heavy dask scheduler. For example: ```python import xarray as xr import numpy as np import time import dask ​ def exponential(x, a, xc): return np.exp((x - xc) / a) ​ x = np.arange(-5, 5, 0.001) t = np.arange(-5, 5, 0.01) X, T = np.meshgrid(x, t) Z1 = np.random.uniform(low=-5, high=5, size=X.shape) Z2 = exponential(Z1, 3, X) + np.random.normal(scale=0.1, size=Z1.shape) ​ ds = xr.Dataset( data_vars=dict(var1=(["t", "x"], Z1), var2=(["t", "x"], Z2)), coords={"t": t, "x": x}, ) ​ ds = ds.chunk({'x':10}) ​ def test_fit(): start = time.time() fit = ds.var2.curvefit( coords=ds.var1, func=exponential, reduce_dim="t", ).compute() print(f'Fitting time: {time.time() - start:.2f}s')

with dask.config.set(scheduler='threads'): test_fit() with dask.config.set(scheduler='processes'): test_fit() with dask.distributed.Client() as client: test_fit() with dask.distributed.Client(n_workers=8, threads_per_worker=1) as client: test_fit() On my 8-core machine, takes: Fitting time: 8.32s Fitting time: 2.71s Fitting time: 4.40s Fitting time: 3.43s ``` According to this the underlying scipy routines should be thread safe.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Basic curvefit implementation 797302408
772874760 https://github.com/pydata/xarray/pull/4849#issuecomment-772874760 https://api.github.com/repos/pydata/xarray/issues/4849 MDEyOklzc3VlQ29tbWVudDc3Mjg3NDc2MA== slevang 39069044 2021-02-03T22:37:50Z 2021-02-03T22:37:50Z CONTRIBUTOR

Added some checks that will raise errors if inspect.signature is not able to determine the function arguments, either for functions with variable-length args or things like numpy ufuncs that don't seem to work with signature. In these cases, param_names can be passed manually.

Also added minimal tests, but these should probably be expanded.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Basic curvefit implementation 797302408
771907303 https://github.com/pydata/xarray/pull/4849#issuecomment-771907303 https://api.github.com/repos/pydata/xarray/issues/4849 MDEyOklzc3VlQ29tbWVudDc3MTkwNzMwMw== slevang 39069044 2021-02-02T19:19:01Z 2021-02-02T19:19:01Z CONTRIBUTOR

Added a couple usage examples in the docs, including one that replicates the scipy example of fitting multiple peaks. Because of the wrapper function and variable args, this requires supplying param_names manually. Works nicely though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Basic curvefit implementation 797302408
771320891 https://github.com/pydata/xarray/pull/4849#issuecomment-771320891 https://api.github.com/repos/pydata/xarray/issues/4849 MDEyOklzc3VlQ29tbWVudDc3MTMyMDg5MQ== slevang 39069044 2021-02-02T03:12:41Z 2021-02-02T03:12:41Z CONTRIBUTOR

Some more progress here.

  1. I liked the idea of being able to supply p0 and bounds as a dictionary mapping between parameter names and values, which is much more in the style of xarray. So I've included the logic to translate between this and the scipy args which are just ordered lists. The logic first prioritizes any of these values supplied directly in keyword args, then looks for defaults in the function definition, and otherwise falls back to the scipy defaults.
  2. There's now a working implementation of multi-dimensional fits using ravel to flatten the coordinate arrays. One limitation is that func must be specified like f((x, y), z) as scipy takes a single x argument for the fitting coordinate. Maybe there is a clever way around this, but I think it's ok.
  3. Some changes to the API.

The best way to specify the fitting coordinates is a bit tricky to figure out. My original use case for this was needing to fit a relationship between two time/lat/lon dataarrays with the fit done over all time. But probably a more common use would be to just fit a curve over one or two dimensions that already exist in your data. So it would be great to handle these possibilities seamlessly.

What I've settled on for now is a coords argument that takes a list of coordinates that should be the same length as the input coordinates of your fitting function. Additionally, there is a reduce_dim arg, so that, say if you want to fit a function in time but aggregate over all lat and lon, you can supply those dimensions here. So with a 3D x/y/time array, any of the following should work: ```python

Fit a 1d function in time, returns parameters with dims (x, y)

da.curvefit(coords='time', ...)

Fit a 2d function in space, returns parameters with dims (t)

da.curvefit(coords=['x', 'y'], ...)

Fit a 1d function with another 3d dataarray and aggregate over time, returns parameters with dims (x, y)

da.curvefit(coords=da1, reduce_dim='time', ...) `` The logic to make this work got a bit complicated, since we need to supply the rightinput_core_dimstoapply_ufunc, and also to explicitly broadcast the coordinates to ensure theravel` operation works. Any thoughts on cleanup here appreciated.

Will eventually need to add tests and improve docs and examples. Tests especially I could use some help on.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Basic curvefit implementation 797302408
770395357 https://github.com/pydata/xarray/pull/4849#issuecomment-770395357 https://api.github.com/repos/pydata/xarray/issues/4849 MDEyOklzc3VlQ29tbWVudDc3MDM5NTM1Nw== slevang 39069044 2021-01-31T14:59:12Z 2021-01-31T14:59:12Z CONTRIBUTOR
  1. Different fit coefficients as differently-named variables in the output, rather than indexed with a coordinate.

I think the way I configured things now does replicate the polyfit results. For example: ```python ds = xr.tutorial.open_dataset('air_temperature') ds['air2'] = ds.air.copy() ds.polyfit(dim='time', deg=2)

<xarray.Dataset> Dimensions: (degree: 3, lat: 25, lon: 53) Coordinates: * degree (degree) int64 2 1 0 * lat (lat) float64 75.0 72.5 70.0 ... 20.0 17.5 15.0 * lon (lon) float64 200.0 202.5 205.0 ... 327.5 330.0 Data variables: air_polyfit_coefficients (degree, lat, lon) float64 -1.162e-32 ... 1.13... air2_polyfit_coefficients (degree, lat, lon) float64 -1.162e-32 ... 1.14... Compared to this:python def square(x, a, b ,c): return anp.power(x, 2) + bx + c

ds.curvefit(x=ds.time, dim='time', func=square)

<xarray.Dataset> Dimensions: (lat: 25, lon: 53, param: 3) Coordinates: * lat (lat) float32 75.0 72.5 70.0 ... 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 ... 327.5 330.0 * param (param) <U1 'a' 'b' 'c' Data variables: air_curvefit_coefficients (param, lat, lon) float64 -1.162e-32 ... 1.13... air2_curvefit_coefficients (param, lat, lon) float64 -1.162e-32 ... 1.14... `` In both cases, each variable in the dataset returns a separate coefficients variable, and all fittable coefficients are stacked along a dimension,degreefor the more specific polyfit case and a genericparam` for curvefit.

  1. Initial guesses for each fit parameter.

Yeah this would be good. Should be easy to look for default values in the function itself using inspect.

  1. (Stretch goal) Ability to fit >1D functions

Looks like this could be possible with a call to ravel like this. I'll do some experimenting.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Basic curvefit implementation 797302408
770133047 https://github.com/pydata/xarray/issues/4300#issuecomment-770133047 https://api.github.com/repos/pydata/xarray/issues/4300 MDEyOklzc3VlQ29tbWVudDc3MDEzMzA0Nw== slevang 39069044 2021-01-30T01:38:18Z 2021-01-30T01:38:18Z CONTRIBUTOR

I needed this functionality for a project, and piggy-backing off the last couple of comments decided the curve_fit wrapped by apply_ufunc approach works quite well. I put together a PR in #4849. Any feedback welcome!

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 2,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  General curve fitting method 671609109

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.896ms · About: xarray-datasette