id,node_id,number,state,locked,title,user,body,created_at,updated_at,closed_at,merged_at,merge_commit_sha,assignee,milestone,draft,head,base,author_association,auto_merge,repo,url,merged_by 369184294,MDExOlB1bGxSZXF1ZXN0MzY5MTg0Mjk0,3733,closed,0,Implementation of polyfit and polyval,20629530," - [x] Closes #3349 - [x] Tests added - [x] Passes `isort -rc . && black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API Following discussions in #3349, I suggest here an implementation of `polyfit` and `polyval` for xarray. However, this is still work in progress, a lot of testing is missing, all docstrings are missing. But, mainly, I have questions on how to properly conduct this. My implementation mostly duplicates the code of `np.polyfit`, but making use of `dask.array.linalg.lstsq` and `dask.array.apply_along_axis` for dask arrays. The same method as in `xscale.signal.fitting.polyfit`, but I add NaN-awareness in a 1-D manner. The version with numpy is also slightly different of `np.polyfit` because of the NaN skipping, but I wanted the function to replicate its behaviour. It returns a variable number of DataArrays, depending on the keyword arguments (coefficients, [ residuals, matrix rank, singular values ] / [covariance matrix]). Thus giving a medium-length function that has a lot of duplicated code from `numpy.polyfit`. I thought of simply using a `xr.apply_ufunc`, but that makes chunking along the fitted dimension forbidden and difficult to return the ancillary results (residuals, rank, covariance matrix...). Questions: 1 ) Are the functions where they should go? 2 ) Should xarray's implementation really replicate the behaviour of numpy's? A lot of extra code could be removed if we'd say we only want to compute and return the residuals and the coefficients. All the other variables are a few lines of code away for the user that really wants them, and they don't need the power of xarray and dask anyway.",2020-01-30T16:58:51Z,2020-03-26T00:22:17Z,2020-03-25T17:17:45Z,2020-03-25T17:17:45Z,ec215daecec642db94102dc24156448f8440f52d,,,0,7eeba59ff487d5bc51809da4ae824e7283b5b2aa,009aa66620b3437cf0de675013fa7d1ff231963c,CONTRIBUTOR,,13221727,https://github.com/pydata/xarray/pull/3733, 413713886,MDExOlB1bGxSZXF1ZXN0NDEzNzEzODg2,4033,closed,0,xr.infer_freq,20629530," - [x] Tests added - [x] Passes `isort -rc . && black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API This PR adds a `xr.infer_freq` method to copy pandas `infer_freq` but on `CFTimeIndex` objects. I tried to subclass pandas `_FrequencyInferer` and to only override as little as possible. Two things are problematic right now and I would like to get feedback on how to implement them if this PR gets the dev's approval. 1) `pd.DatetimeIndex.asi8` returns integers representing _nanoseconds_ since 1970-1-1, while `xr.CFTimeIndex.asi8` returns _microseconds_. In order not to break the API, I patched the `_CFTimeFrequencyInferer` to store 1000x the values. Not sure if this is the best, but it works. 2) As of now, `xr.infer_freq` will fail on weekly indexes. This is because pandas is using `datetime.weekday()` at some point but cftime objects do not implement that (they use `dayofwk` instead). I'm not sure what to do? Cftime could implement it to completly mirror python's datetime or pandas could use `dayofwk` since it's available on the `TimeStamp` objects. Another option, cleaner but longer, would be to reimplement `_FrequencyInferer` from scratch. I may have time for this, cause I really think a `xr.infer_freq` method would be useful.",2020-05-05T19:39:05Z,2020-05-30T18:11:36Z,2020-05-30T18:08:27Z,2020-05-30T18:08:27Z,fd9e620a84389170138cc014ee5a0213718beb78,,,0,9a553edae8b2b4f52e5044d89b0f0354d51b003c,d1f7cb8fd95d588d3f7a7e90916c25747b90ad5a,CONTRIBUTOR,,13221727,https://github.com/pydata/xarray/pull/4033, 424048387,MDExOlB1bGxSZXF1ZXN0NDI0MDQ4Mzg3,4099,closed,0,Allow non-unique and non-monotonic coordinates in get_clean_interp_index and polyfit,20629530," - [ ] Closes #xxxx - [x] Tests added - [x] Passes `isort -rc . && black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API Pull #3733 added `da.polyfit` and `xr.polyval` and is using `xr.core.missing.get_clean_interp_index` in order to get the fitting coordinate. However, this method is stricter than what polyfit needs: as in `numpy.polyfit`, non-unique and non-monotonic indexes are acceptable. This PR adds a `strict` keyword argument to `get_clean_interp_index` so we can skip the uniqueness and monotony tests. `ds.polyfit` and `xr.polyval` were modified to use that keyword. I only added tests for `get_clean_interp_index`, could add more for `polyfit` if requested.",2020-05-27T18:48:58Z,2020-06-05T15:46:00Z,2020-06-05T15:46:00Z,2020-06-05T15:46:00Z,09df5ca4036d84620373fa4bccd11d1f1d4bec28,,,0,fedfbf5ccdf52cac82ac0c072ae8882d630a2f51,e5cc19cd8f8a69e0743f230f5bf51b7778a0ff96,CONTRIBUTOR,,13221727,https://github.com/pydata/xarray/pull/4099, 431889644,MDExOlB1bGxSZXF1ZXN0NDMxODg5NjQ0,4135,closed,0,Correct dask handling for 1D idxmax/min on ND data,20629530," - [x] Closes #4123 - [x] Tests added - [x] Passes `isort -rc . && black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API Based on comments on dask/dask#3096, I fixed the dask indexing error that occurred when `idxmax/idxmin` were called on ND data (where N > 2). Added tests are very simplistic, I believe the 1D and 2D tests already cover most cases, I just wanted to test that is was indeed working on ND data, assuming that non-dask data was already treated properly. I believe this doesn't conflict with #3936.",2020-06-09T15:36:09Z,2020-06-25T16:09:59Z,2020-06-25T03:59:52Z,2020-06-25T03:59:51Z,f4638afe009fde5f53de1a1b80cc71f62593c463,,,0,76e82e90948aae14f170c595dc2ee61fdf1770cf,fb5fe79a2881055065cc2c0ed3f49f5448afdf32,CONTRIBUTOR,,13221727,https://github.com/pydata/xarray/pull/4135, 443610926,MDExOlB1bGxSZXF1ZXN0NDQzNjEwOTI2,4193,closed,0,Fix polyfit fail on deficient rank,20629530," - [x] Closes #4190 - [x] Tests added - [x] Passes `isort -rc . && black . && mypy . && flake8` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` Fixes #4190. In cases where the input matrix had a deficient rank (matrix rank != order) because of the number of NaN values, polyfit would fail, simply because numpy's lstsq returned an empty array for the residuals (instead of a size 1 array). This fixes the problem by catching the case and returning `np.nan` instead. The other point in the issue was that `RankWarning` is also not raised in that case. That was due to the fact that `da.polyfit` was computing the rank from the coordinate (Vandermonde) matrix, instead of the masked data. Thus, is a given line has too many NaN values, its deficient rank was not detected. I added a test and warning at all places where a rank is computed (5 different lines). Also, to match np.polyfit behaviour of no warning when `full=True`, I changed the warning filters using a context manager, ignoring the `RankWarning` in that case. Overall, it feels a bi ugly because of the duplicated code and it will print the warning for every line of an array that has a deficient rank, which can be a lot... ",2020-07-02T16:00:21Z,2020-08-20T14:20:43Z,2020-08-20T08:34:45Z,2020-08-20T08:34:45Z,efabe74b1ce8f0666b93658ebb48104aa37b26ac,,,0,04be2e0fa1f96762798761f08aca7c37d7d8c67d,26547d19d477cc77461c09b3aadd55f7eb8b4dbf,CONTRIBUTOR,,13221727,https://github.com/pydata/xarray/pull/4193, 625530046,MDExOlB1bGxSZXF1ZXN0NjI1NTMwMDQ2,5233,closed,0,Calendar utilities,20629530," - [x] Closes #5155 - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` So: - Added `coding.cftime_offsets.date_range` and `coding.cftime_offsets.date_range_like` The first simply swtiches between `pd.date_range` and `xarray.cftime_range` according to the arguments. The second infers start, end and freq from an existing datetime array and returns a similar range in another calendar. - Added `coding/calendar_ops.py` with `convert_calendar` and `interp_calendar` Didn't know where to put them, so there they are. - Added `DataArray.dt.calendar`. When the datetime objects are backed by numpy, it always return `""proleptic_gregorian""`. I'm not sure where to expose the function. Should the range-generators be accessible directly like `xr.date_range`? The `convert_calendar` and `interp_calendar` could be implemented as methods of `DataArray` and `Dataset`, should I do that? ",2021-04-28T20:01:33Z,2021-12-30T22:54:49Z,2021-12-30T22:54:11Z,2021-12-30T22:54:11Z,b14e2d8400da5c036f1ebb5486939f7f587b9f27,,,0,5aa747079ce32c51645ca245b1423cbacaf0cb1b,2694046c748a51125de6d460073635f1d789958e,CONTRIBUTOR,,13221727,https://github.com/pydata/xarray/pull/5233, 657205536,MDExOlB1bGxSZXF1ZXN0NjU3MjA1NTM2,5402,open,0,`dt.to_pytimedelta` to allow arithmetic with cftime objects,20629530," - [ ] Closes #xxxx - [x] Tests added - [x] Passes `pre-commit run --all-files` - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` When playing with cftime objects a problem I encountered many times is that I can sub two arrays and them add it back to another. Subtracting to cftime datetime arrays result in an array of `np.timedelta64`. And when trying to add it back to another cftime array, we get a `UFuncTypeError` because the two arrays have incompatible dtypes : ' - [ ] Closes #xxxx - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` Simply add a `encodings` argument to `save_mfdataset`. As for the other args, it expects a list of dictionaries, with encoding information to be passed to `to_netcdf` for each dataset. Added a minimal test, simply to see if the argument was taken into account.",2021-09-08T21:24:13Z,2022-10-06T21:44:18Z,,,d86b32087d7108dc866e34569653033973160827,,,0,23acbb84683f3dab9f593ee63a0323433b2b3638,d1e4164f3961d7bbb3eb79037e96cae14f7182f8,CONTRIBUTOR,,13221727,https://github.com/pydata/xarray/pull/5781, 1673012286,PR_kwDOAMm_X85juCQ-,8603,closed,0,Convert 360_day calendars by choosing random dates to drop or add,20629530," - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` Small PR to add a new ""method"" to convert to and from 360_day calendars. The current two methods (chosen with the `align_on` keyword) will always remove or add the same day-of-year for all years of the same length. This new option will randomly chose the days, one for each fifth of the year (72-days period). It emulates the method of the LOCA datasets (see [web page](https://loca.ucsd.edu/loca-calendar/) and [article](https://journals.ametsoc.org/view/journals/hydr/15/6/jhm-d-14-0082_1.xml) ). February 29th is always removed/added when the source/target is a leap year. I copied the implementation from xclim (which I wrote), [see code here](https://github.com/Ouranosinc/xclim/blob/fb29b8a8e400c7d8aaf4e1233a6b37a300126257/xclim/core/calendar.py#L112-L134) . ",2024-01-10T19:13:31Z,2024-04-16T14:53:42Z,2024-04-16T14:53:42Z,2024-04-16T14:53:42Z,239309f881ba0d7e02280147bc443e6e286e6a63,,,0,b581e1f700382207c3bd0fd03860f44f33b29b79,b004af5174a4b0e32519df792a4f625d5548a9f0,CONTRIBUTOR,,13221727,https://github.com/pydata/xarray/pull/8603,