id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2075019328,PR_kwDOAMm_X85juCQ-,8603,Convert 360_day calendars by choosing random dates to drop or add,20629530,closed,0,,,3,2024-01-10T19:13:31Z,2024-04-16T14:53:42Z,2024-04-16T14:53:42Z,CONTRIBUTOR,,0,pydata/xarray/pulls/8603," - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` Small PR to add a new ""method"" to convert to and from 360_day calendars. The current two methods (chosen with the `align_on` keyword) will always remove or add the same day-of-year for all years of the same length. This new option will randomly chose the days, one for each fifth of the year (72-days period). It emulates the method of the LOCA datasets (see [web page](https://loca.ucsd.edu/loca-calendar/) and [article](https://journals.ametsoc.org/view/journals/hydr/15/6/jhm-d-14-0082_1.xml) ). February 29th is always removed/added when the source/target is a leap year. I copied the implementation from xclim (which I wrote), [see code here](https://github.com/Ouranosinc/xclim/blob/fb29b8a8e400c7d8aaf4e1233a6b37a300126257/xclim/core/calendar.py#L112-L134) . ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8603/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1347026292,I_kwDOAMm_X85QSf10,6946,reset_index not resetting levels of MultiIndex,20629530,closed,0,4160723,,3,2022-08-22T21:47:04Z,2022-09-27T10:35:39Z,2022-09-27T10:35:39Z,CONTRIBUTOR,,,,"### What happened? I'm not sure my usecase is the simplest way to demonstrate the issue, but let's try anyway. I have a DataArray with two coordinates and I stack them into a new multi-index. I want to pass the levels of that new multi-index into a function, but as dask arrays. Turns out, it is not straightforward to chunk these variables because they act like `IndexVariable` objects and refuse to be chunked. Thus, I reset the multi-index, drop it, but the variables still don't want to be chunked! ### What did you expect to happen? I expected the levels to be chunkable after the sequence : stack, reset_index. ### Minimal Complete Verifiable Example ```Python import xarray as xr ds = xr.tutorial.open_dataset('air_temperature') ds = ds.stack(spatial=['lon', 'lat']) ds = ds.reset_index('spatial', drop=True) # I don't think the drop is important here. lon_chunked = ds.lon.chunk() # woups, doesn't do anything! type(ds.lon.variable) # xarray.core.variable.IndexVariable # I assumed either the stack or the reset_index would have modified this type into a normal variable. ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? Seems kinda related to the issues around `reset_index`. I thinks this is related to (but not a duplicate of) #4366. ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.22.4 scipy: 1.9.0 netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: 2.12.0 cftime: 1.6.1 nc_time_axis: 1.4.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.8.0 distributed: 2022.8.0 matplotlib: 3.5.2 cartopy: 0.20.3 seaborn: None numbagg: None fsspec: 2022.7.1 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 63.4.2 pip: 22.2.2 conda: None pytest: None IPython: 8.4.0 sphinx: 5.1.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6946/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 824917345,MDU6SXNzdWU4MjQ5MTczNDU=,5010,DataArrays inside apply_ufunc with dask=parallelized,20629530,closed,0,,,3,2021-03-08T20:19:41Z,2021-03-08T20:37:15Z,2021-03-08T20:35:01Z,CONTRIBUTOR,,,," **Is your feature request related to a problem? Please describe.** Currently, when using apply_ufunc with `dask=parallelized` the wrapped function receives numpy arrays upon computation. Some xarray operations generate enormous amount of chunks (best example : `da.groupby('time.dayofyear')`, so any complex script using dask ends up with huge task graphs. Dask's scheduler becomes overloaded, sometimes even hangs, sometimes uses way more RAM than its workers. **Describe the solution you'd like** I'd want to profit from both the tools of xarray and the power of dask parallelization. I'd like to be able to do something like this: ```python3 def func(da): """"""Example of an operation not (easily) possible with numpy."""""" return da.groupby('time').mean() xr.apply_ufunc( da, func, input_core_dims=[['time']], pass_xr=True, dask='parallelized' ) ``` I'd like the wrapped func to receive DataArrays resembling the inputs (named dims, coords and all), but only with the subset of that dask chunk. Doing this, the whole function gets parallelized : dask only sees 1 task and I can code using xarray. Depending on the implementation, it might be less efficient than `dask=allowed` for small dataset, but I think this could be beneficial for long and complex computations on large datasets. **Describe alternatives you've considered** The alternative is to reduce the size of the datasets (looping on other dimensions), but that defeats the purpose of dask. Another alternative I am currently testing, is to add a layer between apply_ufunc and the `func`. That layer reconstruct a DataArray and deconstructs it before returning the result, so xarray/dask only passing by. If this works and is elegant enough, I can maybe suggest an implementation within xarray.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5010/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 612846594,MDExOlB1bGxSZXF1ZXN0NDEzNzEzODg2,4033,xr.infer_freq,20629530,closed,0,,,3,2020-05-05T19:39:05Z,2020-05-30T18:11:36Z,2020-05-30T18:08:27Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4033," - [x] Tests added - [x] Passes `isort -rc . && black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API This PR adds a `xr.infer_freq` method to copy pandas `infer_freq` but on `CFTimeIndex` objects. I tried to subclass pandas `_FrequencyInferer` and to only override as little as possible. Two things are problematic right now and I would like to get feedback on how to implement them if this PR gets the dev's approval. 1) `pd.DatetimeIndex.asi8` returns integers representing _nanoseconds_ since 1970-1-1, while `xr.CFTimeIndex.asi8` returns _microseconds_. In order not to break the API, I patched the `_CFTimeFrequencyInferer` to store 1000x the values. Not sure if this is the best, but it works. 2) As of now, `xr.infer_freq` will fail on weekly indexes. This is because pandas is using `datetime.weekday()` at some point but cftime objects do not implement that (they use `dayofwk` instead). I'm not sure what to do? Cftime could implement it to completly mirror python's datetime or pandas could use `dayofwk` since it's available on the `TimeStamp` objects. Another option, cleaner but longer, would be to reimplement `_FrequencyInferer` from scratch. I may have time for this, cause I really think a `xr.infer_freq` method would be useful.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4033/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull