github: issue_comments: 43 rows where author_association = "CONTRIBUTOR" and user = 39069044 sorted by updated

43 rows where author_association = "CONTRIBUTOR" and user = 39069044 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1537162245	https://github.com/pydata/xarray/pull/7821#issuecomment-1537162245	https://api.github.com/repos/pydata/xarray/issues/7821	IC_kwDOAMm_X85bnzwF	slevang 39069044	2023-05-06T15:12:57Z	2023-05-06T15:12:57Z	CONTRIBUTOR	This looks pretty good to me on first glance! I would vote to do `p0` and `bounds` in one PR. It could surprise a user if one of these can take an array and the other only scalars.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Implement multidimensional initial guess and bounds for `curvefit` 1698626185
1532986706	https://github.com/pydata/xarray/pull/7798#issuecomment-1532986706	https://api.github.com/repos/pydata/xarray/issues/7798	IC_kwDOAMm_X85bX4VS	slevang 39069044	2023-05-03T12:58:35Z	2023-05-03T12:58:35Z	CONTRIBUTOR	Would it be possible to run another bug fix release with this incorporated? Or I guess we're already on to `2023.5.0` according to the date.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix groupby binary ops when grouped array is subset relative to other 1689773381
1529050645	https://github.com/pydata/xarray/pull/7798#issuecomment-1529050645	https://api.github.com/repos/pydata/xarray/issues/7798	IC_kwDOAMm_X85bI3YV	slevang 39069044	2023-04-30T15:20:47Z	2023-04-30T15:20:47Z	CONTRIBUTOR	Thanks for the quick fix! Not sure about the bitshift test but I'm assuming @headtr1ck is right.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix groupby binary ops when grouped array is subset relative to other 1689773381
1516681755	https://github.com/pydata/xarray/issues/7768#issuecomment-1516681755	https://api.github.com/repos/pydata/xarray/issues/7768	IC_kwDOAMm_X85aZrob	slevang 39069044	2023-04-20T17:16:39Z	2023-04-20T17:16:39Z	CONTRIBUTOR	This should be doable. I think we would have to rework the `apply_ufunc` wrapper to pass `p0` and `bounds` as DataArrays through `*args` instead of simple dictionaries through`kwargs`, so that `apply_ufunc` can broadcast them and handle dask chunks.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Supplying multidimensional initial guess to `curvefit` 1674818753
1457345587	https://github.com/pydata/xarray/issues/7587#issuecomment-1457345587	https://api.github.com/repos/pydata/xarray/issues/7587	IC_kwDOAMm_X85W3VQz	slevang 39069044	2023-03-07T01:34:12Z	2023-03-07T01:34:12Z	CONTRIBUTOR	Your `m0tot` variable is also being broadcast in the `fami` dimension. So, an additional 10x384x1233x8/1e6=37MB.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.where increase the bytes of the dataset 1611288905
1457080267	https://github.com/pydata/xarray/issues/7587#issuecomment-1457080267	https://api.github.com/repos/pydata/xarray/issues/7587	IC_kwDOAMm_X85W2UfL	slevang 39069044	2023-03-06T22:06:11Z	2023-03-06T22:06:11Z	CONTRIBUTOR	Same issue as #1234. This has tripped me up before as well. A kwarg to control this behavior would be a nice enhancement to `.where()`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.where increase the bytes of the dataset 1611288905
1457061064	https://github.com/pydata/xarray/issues/7587#issuecomment-1457061064	https://api.github.com/repos/pydata/xarray/issues/7587	IC_kwDOAMm_X85W2PzI	slevang 39069044	2023-03-06T21:55:14Z	2023-03-06T21:55:14Z	CONTRIBUTOR	Since you're using `tp` (dims `fami, time, site`) as the condition, these dimensions are broadcast across all other variables in the dataset. The problem looks to be your variable `wshedOut`, which is now broadcast across all 5 dimensions in the dataset, hence greatly increased memory usage.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.where increase the bytes of the dataset 1611288905
1454968267	https://github.com/pydata/xarray/issues/7581#issuecomment-1454968267	https://api.github.com/repos/pydata/xarray/issues/7581	IC_kwDOAMm_X85WuQ3L	slevang 39069044	2023-03-05T02:45:42Z	2023-03-05T02:45:42Z	CONTRIBUTOR	`xr.where(cond, x, y, keep_attrs=True)` unambiguously takes the attributes of `x` as described in the docs, even if `x` is a scalar and has no attributes. This can be inconvenient at times but is intentional for consistency. You can invert the condition as you suggest, or better yet use `DataArray.where`/`Dataset.where` to ensure you are getting the attrs you expect. In what case would `a!=0` not mirror `a==0`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.where loses attributes despite keep_attrs=True 1608352581
1450841385	https://github.com/pydata/xarray/issues/7522#issuecomment-1450841385	https://api.github.com/repos/pydata/xarray/issues/7522	IC_kwDOAMm_X85WehUp	slevang 39069044	2023-03-01T21:01:48Z	2023-03-01T21:01:48Z	CONTRIBUTOR	Yeah that seems to be it. Dask's write neatly packs all the needed metadata at the beginning of the file, since we can scale this up to a many GB file with dozens of variables and still read in ~100ms. While xarray is doing a less well organized write of the metadata and we have to go seeking in the middle of the byte range. `cache_type="first"` does provide some improvement but still not as good as on the dask-written file. FWIW, I inspected the actual bytes of the dask and xarray written files and they are identical for a single variable, but diverge when multiple variables are being written. So, the important differences are probably associated with this step: It does set up the whole set of variables as a initialisation stage before writing any data - I don't know if xarray does this.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Differences in `to_netcdf` for dask and numpy backed arrays 1581046647
1449302032	https://github.com/pydata/xarray/issues/7522#issuecomment-1449302032	https://api.github.com/repos/pydata/xarray/issues/7522	IC_kwDOAMm_X85WYpgQ	slevang 39069044	2023-03-01T04:04:25Z	2023-03-01T04:04:25Z	CONTRIBUTOR	The slow file: And the fast file:	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Differences in `to_netcdf` for dask and numpy backed arrays 1581046647
1428872842	https://github.com/pydata/xarray/issues/7522#issuecomment-1428872842	https://api.github.com/repos/pydata/xarray/issues/7522	IC_kwDOAMm_X85VKt6K	slevang 39069044	2023-02-13T23:49:31Z	2023-02-13T23:49:31Z	CONTRIBUTOR	I did try many loops and different order of operations to make sure this isn't a caching or auth issue. You can see the std dev of the `timeit` calls above is pretty consistent. For my actual use case, the difference is very apparent, with `open_dataset` taking about 9 seconds on the numpy-saved file and <1 second on the dask-saved one. I can also clearly see when monitoring network traffic that the slow version has to read in hundreds of MB of data to open the dataset, while the fast one only reads the tiny headers. I also inspected the actual header bytes of these two files and see they are indeed different.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Differences in `to_netcdf` for dask and numpy backed arrays 1581046647
1401116054	https://github.com/pydata/xarray/issues/7435#issuecomment-1401116054	https://api.github.com/repos/pydata/xarray/issues/7435	IC_kwDOAMm_X85Tg1WW	slevang 39069044	2023-01-23T22:53:14Z	2023-01-23T22:53:14Z	CONTRIBUTOR	You do get a `FileNotFoundError` if you explicitly specify an engine with `xarray.load_dataset('not-existing-file.h5', engine='h5netcdf')`. It looks like neither `NetCDF4BackendEntrypoint` or `H5netcdfBackendEntrypoint` include `.h5` in the set of openable extensions they handle in `guess_can_open`, but they will work if they can peer into the file and detect a valid `.h5`. Probably some good reason for this.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	misleading error message for attempting to load non existing file 1532317033
1379504132	https://github.com/pydata/xarray/issues/7403#issuecomment-1379504132	https://api.github.com/repos/pydata/xarray/issues/7403	IC_kwDOAMm_X85SOZAE	slevang 39069044	2023-01-11T21:26:10Z	2023-01-11T21:26:10Z	CONTRIBUTOR	Personally, I'll make the argument that we should switch (carefully) from `-w` to `w` for the default case. This makes sense in principal for consistency with other i/o methods, but I think could go very wrong in practice, especially for inexperienced users. For example, a misplaced command of `ds.to_zarr("gs://bucket_name")` with default `mode="w"` overwrites a user's entire bucket, and same thing for local storage. `to_netcdf` doesn't have the capability to `rm -r` a directory, so I would argue `to_zarr(mode="w")` is much more dangerous.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr error when trying to overwrite part of existing store 1512290017
1366882900	https://github.com/pydata/xarray/issues/7403#issuecomment-1366882900	https://api.github.com/repos/pydata/xarray/issues/7403	IC_kwDOAMm_X85RePpU	slevang 39069044	2022-12-28T19:53:05Z	2022-12-28T19:53:05Z	CONTRIBUTOR	Your MCVE has `mode="a"`, is that intentional? As far as defaults, what the docstring says is: The default mode is “a” if append_dim is set. Otherwise, it is “r+” if region is set and w- otherwise. So without specifying mode, you get `mode="w-"` which will not overwrite anything. With `mode="a"`, your MCVE runs fine for me. From using these a bit, my understanding is: `mode="w-"` is the safe default that won't overwrite anything `mode="w"` is only if you want to rewrite the entire zarr store, not for rewriting a subset of existing data `mode="r+"` is for modifying values on some subset of the existing store specified by the index with `region`, but doesn't allow changing any dimension sizes `mode="a"` is for completely overwriting existing variables, appending new variables, or appending data along an existing dimension, paired with `append_dim` I actually didn't realize this behavior until testing it out just now, but it is described in the docstring	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }	Zarr error when trying to overwrite part of existing store 1512290017
1341818601	https://github.com/pydata/xarray/issues/7362#issuecomment-1341818601	https://api.github.com/repos/pydata/xarray/issues/7362	IC_kwDOAMm_X85P-obp	slevang 39069044	2022-12-08T00:54:26Z	2022-12-08T00:54:26Z	CONTRIBUTOR	So many edge cases and different ways to use this function! I'm also a bit surprised `numpy`-only inputs work here, but if it does, we may as well support it. #7364 would be an easy way to handle this case.	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`xr.where()` with numpy inputs produces error if global options changed 1478060173
1320983738	https://github.com/pydata/xarray/pull/7229#issuecomment-1320983738	https://api.github.com/repos/pydata/xarray/issues/7229	IC_kwDOAMm_X85OvJy6	slevang 39069044	2022-11-19T22:32:51Z	2022-11-19T22:32:51Z	CONTRIBUTOR	the change to explicitly constructing the `attrs` instead of working around quirks of `apply_ufunc` sounds good to me: when discussing this in the last meeting we did get the feeling that in the long run it would be better to think about redesigning that part of `apply_ufunc`. Yeah I think this would be worth doing eventually. Trying to index a list of attributes of unpredictable length doesn't feel very xarray-like. Any further refinements to the current approach of reconstructing attributes after `apply_ufunc` here, or is this good to go?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` 1424732975
1306498356	https://github.com/pydata/xarray/pull/7229#issuecomment-1306498356	https://api.github.com/repos/pydata/xarray/issues/7229	IC_kwDOAMm_X85N35U0	slevang 39069044	2022-11-08T01:45:59Z	2022-11-08T01:45:59Z	CONTRIBUTOR	The latest commit should do what we want, consistently taking attrs of `x`. Casting to `DataArray` along with the explicit `broadcast` operation gets us empty attrs on both the variable and the coords. The only way it deviates from this (spelled out in the tests) is to pull coord attrs from x, then y, then cond if any of these are scalars. I think this makes sense because if I pass `[DataArray, scalar, scalar]` for example I wouldn't expect it to drop the coordinate attrs.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` 1424732975
1304244938	https://github.com/pydata/xarray/pull/7229#issuecomment-1304244938	https://api.github.com/repos/pydata/xarray/issues/7229	IC_kwDOAMm_X85NvTLK	slevang 39069044	2022-11-04T20:55:02Z	2022-11-05T03:37:35Z	CONTRIBUTOR	I considered the `[scalar, data_array, whatever]` possibility but this seems like a serious edge case. Passing a scalar condition is the same as just taking either `x` or `y` so what's the point? As far as passing bare arrays, despite what the docstrings say it seems like you can actually do this with `DataArray.where`, etc: ```python x = xr.DataArray([1, 1], coords={"x": [0, 1]}) cond = xr.DataArray([True, False], coords={"x": [0, 1]}) x.where(cond, other=np.array([0, 2])) <xarray.DataArray (x: 2)> array([1, 2]) Coordinates: * x (x) int64 0 1 `` Which I don't think makes sense, but is mostly a separate issue. You do get a broadcast error ify` is a bare array with different size. After poking around I agree that this isn't easy to totally fix. I sort of started to go down the route of `1.` but it looked quite complicated. Option `2.` of wrapping non-xarray objects with `xr.Variable` only in `xr.where` might help us out a little here, I can try it. But, I think the current solution in this PR gets us back to the pre-#6461 behavior while still allowing for scalars and giving predictable behavior for everything except `cond=scalar` which I don't think is very important. I'm just keen to get this merged in some form because the regression of #6461 is pretty bad. For example: ```python ds = xr.tutorial.load_dataset('air_temperature') xr.where(ds.air>10, ds.air, 10, keep_attrs=True).to_netcdf('foo.nc') completely fails because the time attrs have been overwritten by ds.air attrs ValueError: failed to prevent overwriting existing key units in attrs on variable 'time'. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually. ``` I hit exactly this issue on some existing scripts so this is preventing me from upgrading beyond `xarray=2022.03.0`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` 1424732975
1302804069	https://github.com/pydata/xarray/pull/7249#issuecomment-1302804069	https://api.github.com/repos/pydata/xarray/issues/7249	IC_kwDOAMm_X85NpzZl	slevang 39069044	2022-11-03T23:57:33Z	2022-11-03T23:57:33Z	CONTRIBUTOR	Can we get #7229 in?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	whats-new for 2022.11.0 1433815234
1291957109	https://github.com/pydata/xarray/issues/7220#issuecomment-1291957109	https://api.github.com/repos/pydata/xarray/issues/7220	IC_kwDOAMm_X85NAbN1	slevang 39069044	2022-10-26T12:26:55Z	2022-10-26T12:26:55Z	CONTRIBUTOR	Anything that uses `attrs[1]` and the `_get_all_of_type` helper is going to be hard to guarantee the behavior stated in the docstring, which is that we take the attrs of `x`. If `x` is a scalar, then `_get_all_of_type` returns a list of length 2 and `attrs[1]` ends up being the attrs of `y`. I think we may want to rework this, will try a few things later today.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`xr.where(..., keep_attrs=True)` overwrites coordinate attributes 1423114234
1256225233	https://github.com/pydata/xarray/pull/6978#issuecomment-1256225233	https://api.github.com/repos/pydata/xarray/issues/6978	IC_kwDOAMm_X85K4HnR	slevang 39069044	2022-09-23T13:38:04Z	2022-09-23T13:38:04Z	CONTRIBUTOR	Is that something that will be deprecated or is it planned to keep the support for the kwargs dict forever? Not sure if there are any strong opinions here? I don't see much harm in keeping it around but we could also deprecate.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	fix passing of curvefit kwargs 1359368857
1254063029	https://github.com/pydata/xarray/issues/7062#issuecomment-1254063029	https://api.github.com/repos/pydata/xarray/issues/7062	IC_kwDOAMm_X85Kv3u1	slevang 39069044	2022-09-21T18:15:46Z	2022-09-21T18:15:46Z	CONTRIBUTOR	Without `bottleneck`, I guess both options from the MVCE above get routed through `_mean`: ```python xr.set_options(use_bottleneck=False) da = xr.DataArray([1,2,3], coords={'x':[1,2,3]}).astype('float32') da.rolling(x=3, min_periods=1).mean().dtype dtype('float64') da.chunk({'x':1}).rolling(x=3, min_periods=1).mean().dtype dtype('float64') ``` And after #7063: ```python xr.set_options(use_bottleneck=False) da = xr.DataArray([1,2,3], coords={'x':[1,2,3]}).astype('float32') da.rolling(x=3, min_periods=1).mean().dtype dtype('float32') da.chunk({'x':1}).rolling(x=3, min_periods=1).mean().dtype dtype('float32') ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling mean on dask array does not preserve dtype 1381294181
1235974050	https://github.com/pydata/xarray/pull/6978#issuecomment-1235974050	https://api.github.com/repos/pydata/xarray/issues/6978	IC_kwDOAMm_X85Jq3ei	slevang 39069044	2022-09-02T23:23:51Z	2022-09-02T23:23:51Z	CONTRIBUTOR	Thanks for doing this @slevang ! Would you mind adding a tiny regression test too? Right, I guess this actually breaks the previous way of passing kwargs and that is why the docs build failed. Hmmm. To go with the current changes, thoughts on adding something like this to the parsing logic: `python if kwargs is None: kwargs = {} elif "kwargs" in kwargs: kwargs = {kwargs.pop("kwargs"), kwargs}` to allow for the desired functionality but also handle the old case when someone passes `kwargs` as a dict? Feels wonky but it works. Is there a smarter way of doing this? BTW it took me a minute to figure out what happened here because the docstring in the original PR was actually correct (requiring a dict, albeit maybe not the best way of passing kwargs), but then got changed in #5182 to suggest that kwargs could be passed on their own. I see `.interp()` behaves the same, where only a dict of kwargs can be provided and passed to the underlying scipy interpolation. So it could be worth generalizing this for other methods where we are passing kwargs to a child function of some sort, to allow both of these patterns.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	fix passing of curvefit kwargs 1359368857
1234368670	https://github.com/pydata/xarray/issues/6968#issuecomment-1234368670	https://api.github.com/repos/pydata/xarray/issues/6968	IC_kwDOAMm_X85Jkvie	slevang 39069044	2022-09-01T14:34:05Z	2022-09-01T14:34:05Z	CONTRIBUTOR	I don't think `curvefit` works well with datetime coordinates at this point, because everything gets coerced to float by `apply_ufunc`. Probably room for improvement there. At a minimum you would need to specify good guesses (`p0`) and/or bounds in terms of `datetime64[ns]` values. An easy enough workaround is to assign a separate non-datetime coordinate. This works: ```python ds = xr.tutorial.open_dataset('air_temperature') ds = ds.assign_coords({'day':(ds.time - ds.time[0]) / np.timedelta64(1, 'D')}).swap_dims({'time':'day'}) def periodic_season(x, a0, a1, a2, a3): # periodic function with both phase amplitude and shift parameters return a0 + a1 * np.cos(a2 * x - a3) dn = ds.curvefit( 'day', func=periodic_season, p0={'a0':275, 'a1':15, 'a2':2*np.pi/365} ) ```	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extracting annual amplitude and phase of xarray dataset for each pixel using xarray.DataArray.curvefit 1355733363
1188348238	https://github.com/pydata/xarray/issues/6799#issuecomment-1188348238	https://api.github.com/repos/pydata/xarray/issues/6799	IC_kwDOAMm_X85G1MFO	slevang 39069044	2022-07-18T21:43:11Z	2022-07-18T21:43:11Z	CONTRIBUTOR	The chunking structure on disk is pretty instrumental to my application, which requires fast retrievals of full slices in the time dimension. The loop option in my first post only takes about 10 seconds with `ni=1000` which is fine for my use case, so I'll probably go with that for now. It would be interesting to dig deeper though and see if there is a way to handle this better in the interp logic.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`interp` performance with chunked dimensions 1307112340
1187729258	https://github.com/pydata/xarray/issues/6799#issuecomment-1187729258	https://api.github.com/repos/pydata/xarray/issues/6799	IC_kwDOAMm_X85Gy09q	slevang 39069044	2022-07-18T16:44:40Z	2022-07-18T16:44:40Z	CONTRIBUTOR	Interpolating on chunked dimensions doesn't work at all prior to #4155. The changes in #4069 are also relevant.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`interp` performance with chunked dimensions 1307112340
1094166371	https://github.com/pydata/xarray/pull/6461#issuecomment-1094166371	https://api.github.com/repos/pydata/xarray/issues/6461	IC_kwDOAMm_X85BN6dj	slevang 39069044	2022-04-10T03:16:24Z	2022-04-10T03:16:24Z	CONTRIBUTOR	Good point @keewis. This version should maintain the expected behavior in that case.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix `xr.where(..., keep_attrs=True)` bug 1198058137
1093627256	https://github.com/pydata/xarray/issues/6444#issuecomment-1093627256	https://api.github.com/repos/pydata/xarray/issues/6444	IC_kwDOAMm_X85BL214	slevang 39069044	2022-04-09T03:09:49Z	2022-04-09T03:09:49Z	CONTRIBUTOR	I ran into this error as well with a bunch of broken xr.where calls where I had set a global option for keep_attrs=True. I pushed a possible fix in #6461.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.where with scalar as second argument fails with keep_attrs=True 1193704369
1057116313	https://github.com/pydata/xarray/issues/6317#issuecomment-1057116313	https://api.github.com/repos/pydata/xarray/issues/6317	IC_kwDOAMm_X84_AlCZ	slevang 39069044	2022-03-02T16:24:37Z	2022-03-02T16:24:37Z	CONTRIBUTOR	These errors can often be eliminated by passing appropriate initial guesses for each parameter through the `p0` argument (which it looks like you're already doing), and/or by passing additional kwargs through to `scipy.optimize.leastsq()`, like `ftol` and `maxfev` which set the fit tolerance and max number of iterations. Even so, there will be cases where some of your data just doesn't fit the desired function and you want to move on and return a result anyway. Your proposal seems fine as long as errors are raised by default and the documentation adequately describes the behavior when set to True.	{ "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Handle scipy fitting errors in `xarray.curve_fit` 1157140667
1016805677	https://github.com/pydata/xarray/pull/5933#issuecomment-1016805677	https://api.github.com/repos/pydata/xarray/issues/5933	IC_kwDOAMm_X848mzkt	slevang 39069044	2022-01-19T19:39:57Z	2022-01-19T19:39:57Z	CONTRIBUTOR	Not sure I understand the blockwise approach well enough to make a PR, but maybe I'll give it a try at some point.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Reimplement `.polyfit()` with `apply_ufunc` 1043746973
1015887354	https://github.com/pydata/xarray/pull/5933#issuecomment-1015887354	https://api.github.com/repos/pydata/xarray/issues/5933	IC_kwDOAMm_X848jTX6	slevang 39069044	2022-01-18T22:21:49Z	2022-01-18T22:45:16Z	CONTRIBUTOR	@slevang are you still interested to continue this PR? I think that would be a worthwhile addition and should not be too much left to do. (What would be nice, however, are tests for the issues this fixes.) Definitely! I got distracted is all, and @dcherian posted a nice solution in #5629 that could allow us to preserve the ability to fit along a chunked dimension using blockwise operations and the dask `lstsq` implementation used by the existing `polyfit` code. I'm happy to pick this back up and finish it off if there is consensus on the right way forward, but the blockwise approach seemed promising so I put this on hold.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Reimplement `.polyfit()` with `apply_ufunc` 1043746973
885216203	https://github.com/pydata/xarray/issues/5629#issuecomment-885216203	https://api.github.com/repos/pydata/xarray/issues/5629	IC_kwDOAMm_X840w1PL	slevang 39069044	2021-07-22T20:37:04Z	2021-07-22T20:37:04Z	CONTRIBUTOR	Potentially a useful comparison: here is the task graph for the same problem posed to `da.curvefit`, which uses `xr.apply_ufunc(scipy.optimize.curvefit, ..., vectorize=True)`. Looks a lot cleaner, and I think `xr.polyfit` could be re-implemented this way without much trouble. ```python import xarray as xr import dask.array as dsa def linear(x, m, b): return m * x + b da = xr.DataArray(dsa.random.random((4,6, 100), chunks=(1,2,100)), dims=['x','y', 'time']) fit = da.curvefit('time', linear).curvefit_coefficients ```	{ "total_count": 3, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 3, "rocket": 0, "eyes": 0 }	Polyfit performance on large datasets - Suboptimal dask task graph 950882492
820706339	https://github.com/pydata/xarray/issues/5165#issuecomment-820706339	https://api.github.com/repos/pydata/xarray/issues/5165	MDEyOklzc3VlQ29tbWVudDgyMDcwNjMzOQ==	slevang 39069044	2021-04-15T20:20:26Z	2021-04-15T20:20:26Z	CONTRIBUTOR	In fact an even simpler calculation also shows the same problem: ```python import xarray as xr import pandas as pd import numpy as np import dask.array as da dates = pd.date_range('1980-01-01', '2019-12-31', freq='D') ds = xr.Dataset( data_vars = { 'x1':( ('time', 'lat', 'lon'), da.random.random(size=(dates.size, 360, 720), chunks=(1, -1, -1))), 'x2':( ('time', 'lat', 'lon'), da.random.random(size=(dates.size, 360, 720), chunks=(1, -1, -1))), }, coords = { 'time': dates, 'lat': np.arange(-90, 90, .5), 'lon': np.arange(-180, 180, .5), } ) ds['x3'] = ds.x2 - ds.x1 ds[['x3']].to_zarr('test-anom', mode='w') ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Poor memory management with dask=2021.4.0 859218255
810635014	https://github.com/pydata/xarray/pull/4849#issuecomment-810635014	https://api.github.com/repos/pydata/xarray/issues/4849	MDEyOklzc3VlQ29tbWVudDgxMDYzNTAxNA==	slevang 39069044	2021-03-30T23:04:52Z	2021-03-30T23:04:52Z	CONTRIBUTOR	I think so. I pushed a merge commit to get this up to date with the current release.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Basic curvefit implementation 797302408
782731258	https://github.com/pydata/xarray/pull/4849#issuecomment-782731258	https://api.github.com/repos/pydata/xarray/issues/4849	MDEyOklzc3VlQ29tbWVudDc4MjczMTI1OA==	slevang 39069044	2021-02-20T18:46:55Z	2021-02-20T18:46:55Z	CONTRIBUTOR	Thanks for the review @dcherian! The latest commit has been refactored with a couple helper functions and associated tests, and any steps that served no purpose other than consistency with `polyfit` have been removed. If you can think of any more specific test cases that should be included, happy to add them.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Basic curvefit implementation 797302408
780953464	https://github.com/pydata/xarray/issues/4894#issuecomment-780953464	https://api.github.com/repos/pydata/xarray/issues/4894	MDEyOklzc3VlQ29tbWVudDc4MDk1MzQ2NA==	slevang 39069044	2021-02-18T00:51:04Z	2021-02-18T00:51:04Z	CONTRIBUTOR	4849 should be almost ready to go. Would be good for someone else to confirm the API makes sense. Additional features and performance enhancements could be added in later releases as long as the API is acceptable.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	release 0.17.0 806436724
780900086	https://github.com/pydata/xarray/issues/4826#issuecomment-780900086	https://api.github.com/repos/pydata/xarray/issues/4826	MDEyOklzc3VlQ29tbWVudDc4MDkwMDA4Ng==	slevang 39069044	2021-02-17T22:36:08Z	2021-02-17T22:36:08Z	CONTRIBUTOR	I ran into this as well with the basic netcdf backends: ```python import xarray as xr ds = xr.Dataset( data_vars={"foo":(["x"], [False, True, False])}, coords={"x": [1, 2, 3]}, ) ds.to_netcdf('test.nc') ds = xr.load_dataset('test.nc') print(ds.foo.dtype) ds.to_netcdf('test.nc') ds = xr.load_dataset('test.nc') print(ds.foo.dtype) `Gives:` bool int8 ```	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Reading and writing a zarr dataset multiple times casts bools to int8 789410367
777602757	https://github.com/pydata/xarray/pull/4849#issuecomment-777602757	https://api.github.com/repos/pydata/xarray/issues/4849	MDEyOklzc3VlQ29tbWVudDc3NzYwMjc1Nw==	slevang 39069044	2021-02-11T16:02:54Z	2021-02-11T16:02:54Z	CONTRIBUTOR	I've been playing around with this some more, and found the performance to be much better using a process-heavy dask scheduler. For example: ```python import xarray as xr import numpy as np import time import dask def exponential(x, a, xc): return np.exp((x - xc) / a) x = np.arange(-5, 5, 0.001) t = np.arange(-5, 5, 0.01) X, T = np.meshgrid(x, t) Z1 = np.random.uniform(low=-5, high=5, size=X.shape) Z2 = exponential(Z1, 3, X) + np.random.normal(scale=0.1, size=Z1.shape) ds = xr.Dataset( data_vars=dict(var1=(["t", "x"], Z1), var2=(["t", "x"], Z2)), coords={"t": t, "x": x}, ) ds = ds.chunk({'x':10}) def test_fit(): start = time.time() fit = ds.var2.curvefit( coords=ds.var1, func=exponential, reduce_dim="t", ).compute() print(f'Fitting time: {time.time() - start:.2f}s') with dask.config.set(scheduler='threads'): test_fit() with dask.config.set(scheduler='processes'): test_fit() with dask.distributed.Client() as client: test_fit() with dask.distributed.Client(n_workers=8, threads_per_worker=1) as client: test_fit() `On my 8-core machine, takes:` Fitting time: 8.32s Fitting time: 2.71s Fitting time: 4.40s Fitting time: 3.43s ``` According to this the underlying scipy routines should be thread safe.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Basic curvefit implementation 797302408
772874760	https://github.com/pydata/xarray/pull/4849#issuecomment-772874760	https://api.github.com/repos/pydata/xarray/issues/4849	MDEyOklzc3VlQ29tbWVudDc3Mjg3NDc2MA==	slevang 39069044	2021-02-03T22:37:50Z	2021-02-03T22:37:50Z	CONTRIBUTOR	Added some checks that will raise errors if `inspect.signature` is not able to determine the function arguments, either for functions with variable-length args or things like numpy ufuncs that don't seem to work with `signature`. In these cases, `param_names` can be passed manually. Also added minimal tests, but these should probably be expanded.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Basic curvefit implementation 797302408
771907303	https://github.com/pydata/xarray/pull/4849#issuecomment-771907303	https://api.github.com/repos/pydata/xarray/issues/4849	MDEyOklzc3VlQ29tbWVudDc3MTkwNzMwMw==	slevang 39069044	2021-02-02T19:19:01Z	2021-02-02T19:19:01Z	CONTRIBUTOR	Added a couple usage examples in the docs, including one that replicates the scipy example of fitting multiple peaks. Because of the wrapper function and variable args, this requires supplying `param_names` manually. Works nicely though.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Basic curvefit implementation 797302408
771320891	https://github.com/pydata/xarray/pull/4849#issuecomment-771320891	https://api.github.com/repos/pydata/xarray/issues/4849	MDEyOklzc3VlQ29tbWVudDc3MTMyMDg5MQ==	slevang 39069044	2021-02-02T03:12:41Z	2021-02-02T03:12:41Z	CONTRIBUTOR	Some more progress here. I liked the idea of being able to supply `p0` and `bounds` as a dictionary mapping between parameter names and values, which is much more in the style of xarray. So I've included the logic to translate between this and the scipy args which are just ordered lists. The logic first prioritizes any of these values supplied directly in keyword args, then looks for defaults in the function definition, and otherwise falls back to the scipy defaults. There's now a working implementation of multi-dimensional fits using `ravel` to flatten the coordinate arrays. One limitation is that `func` must be specified like `f((x, y), z)` as scipy takes a single `x` argument for the fitting coordinate. Maybe there is a clever way around this, but I think it's ok. Some changes to the API. The best way to specify the fitting coordinates is a bit tricky to figure out. My original use case for this was needing to fit a relationship between two time/lat/lon dataarrays with the fit done over all time. But probably a more common use would be to just fit a curve over one or two dimensions that already exist in your data. So it would be great to handle these possibilities seamlessly. What I've settled on for now is a `coords` argument that takes a list of coordinates that should be the same length as the input coordinates of your fitting function. Additionally, there is a `reduce_dim` arg, so that, say if you want to fit a function in time but aggregate over all lat and lon, you can supply those dimensions here. So with a 3D x/y/time array, any of the following should work: ```python Fit a 1d function in time, returns parameters with dims (x, y) da.curvefit(coords='time', ...) Fit a 2d function in space, returns parameters with dims (t) da.curvefit(coords=['x', 'y'], ...) Fit a 1d function with another 3d dataarray and aggregate over time, returns parameters with dims (x, y) da.curvefit(coords=da1, reduce_dim='time', ...) `` The logic to make this work got a bit complicated, since we need to supply the rightinput_core_dims`to`apply_ufunc`, and also to explicitly broadcast the coordinates to ensure the`ravel` operation works. Any thoughts on cleanup here appreciated. Will eventually need to add tests and improve docs and examples. Tests especially I could use some help on.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Basic curvefit implementation 797302408
770395357	https://github.com/pydata/xarray/pull/4849#issuecomment-770395357	https://api.github.com/repos/pydata/xarray/issues/4849	MDEyOklzc3VlQ29tbWVudDc3MDM5NTM1Nw==	slevang 39069044	2021-01-31T14:59:12Z	2021-01-31T14:59:12Z	CONTRIBUTOR	Different fit coefficients as differently-named variables in the output, rather than indexed with a coordinate. I think the way I configured things now does replicate the polyfit results. For example: ```python ds = xr.tutorial.open_dataset('air_temperature') ds['air2'] = ds.air.copy() ds.polyfit(dim='time', deg=2) <xarray.Dataset> Dimensions: (degree: 3, lat: 25, lon: 53) Coordinates: * degree (degree) int64 2 1 0 * lat (lat) float64 75.0 72.5 70.0 ... 20.0 17.5 15.0 * lon (lon) float64 200.0 202.5 205.0 ... 327.5 330.0 Data variables: air_polyfit_coefficients (degree, lat, lon) float64 -1.162e-32 ... 1.13... air2_polyfit_coefficients (degree, lat, lon) float64 -1.162e-32 ... 1.14... `Compared to this:`python def square(x, a, b ,c): return anp.power(x, 2) + bx + c ds.curvefit(x=ds.time, dim='time', func=square) <xarray.Dataset> Dimensions: (lat: 25, lon: 53, param: 3) Coordinates: * lat (lat) float32 75.0 72.5 70.0 ... 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 ... 327.5 330.0 * param (param) <U1 'a' 'b' 'c' Data variables: air_curvefit_coefficients (param, lat, lon) float64 -1.162e-32 ... 1.13... air2_curvefit_coefficients (param, lat, lon) float64 -1.162e-32 ... 1.14... `` In both cases, each variable in the dataset returns a separate coefficients variable, and all fittable coefficients are stacked along a dimension,degree`for the more specific polyfit case and a generic`param` for curvefit. Initial guesses for each fit parameter. Yeah this would be good. Should be easy to look for default values in the function itself using `inspect`. (Stretch goal) Ability to fit >1D functions Looks like this could be possible with a call to `ravel` like this. I'll do some experimenting.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Basic curvefit implementation 797302408
770133047	https://github.com/pydata/xarray/issues/4300#issuecomment-770133047	https://api.github.com/repos/pydata/xarray/issues/4300	MDEyOklzc3VlQ29tbWVudDc3MDEzMzA0Nw==	slevang 39069044	2021-01-30T01:38:18Z	2021-01-30T01:38:18Z	CONTRIBUTOR	I needed this functionality for a project, and piggy-backing off the last couple of comments decided the `curve_fit` wrapped by `apply_ufunc` approach works quite well. I put together a PR in #4849. Any feedback welcome!	{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 2, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	General curve fitting method 671609109

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

43 rows where author_association = "CONTRIBUTOR" and user = 39069044 sorted by updated_at descending

completely fails because the time attrs have been overwritten by ds.air attrs

4849 should be almost ready to go. Would be good for someone else to confirm the API makes sense. Additional features and performance enhancements could be added in later releases as long as the API is acceptable.

Fit a 1d function in time, returns parameters with dims (x, y)

Fit a 2d function in space, returns parameters with dims (t)

Fit a 1d function with another 3d dataarray and aggregate over time, returns parameters with dims (x, y)

Advanced export