id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1347026292,I_kwDOAMm_X85QSf10,6946,reset_index not resetting levels of MultiIndex,20629530,closed,0,4160723,,3,2022-08-22T21:47:04Z,2022-09-27T10:35:39Z,2022-09-27T10:35:39Z,CONTRIBUTOR,,,,"### What happened? I'm not sure my usecase is the simplest way to demonstrate the issue, but let's try anyway. I have a DataArray with two coordinates and I stack them into a new multi-index. I want to pass the levels of that new multi-index into a function, but as dask arrays. Turns out, it is not straightforward to chunk these variables because they act like `IndexVariable` objects and refuse to be chunked. Thus, I reset the multi-index, drop it, but the variables still don't want to be chunked! ### What did you expect to happen? I expected the levels to be chunkable after the sequence : stack, reset_index. ### Minimal Complete Verifiable Example ```Python import xarray as xr ds = xr.tutorial.open_dataset('air_temperature') ds = ds.stack(spatial=['lon', 'lat']) ds = ds.reset_index('spatial', drop=True) # I don't think the drop is important here. lon_chunked = ds.lon.chunk() # woups, doesn't do anything! type(ds.lon.variable) # xarray.core.variable.IndexVariable # I assumed either the stack or the reset_index would have modified this type into a normal variable. ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? Seems kinda related to the issues around `reset_index`. I thinks this is related to (but not a duplicate of) #4366. ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.22.4 scipy: 1.9.0 netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: 2.12.0 cftime: 1.6.1 nc_time_axis: 1.4.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.8.0 distributed: 2022.8.0 matplotlib: 3.5.2 cartopy: 0.20.3 seaborn: None numbagg: None fsspec: 2022.7.1 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 63.4.2 pip: 22.2.2 conda: None pytest: None IPython: 8.4.0 sphinx: 5.1.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6946/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 824917345,MDU6SXNzdWU4MjQ5MTczNDU=,5010,DataArrays inside apply_ufunc with dask=parallelized,20629530,closed,0,,,3,2021-03-08T20:19:41Z,2021-03-08T20:37:15Z,2021-03-08T20:35:01Z,CONTRIBUTOR,,,," **Is your feature request related to a problem? Please describe.** Currently, when using apply_ufunc with `dask=parallelized` the wrapped function receives numpy arrays upon computation. Some xarray operations generate enormous amount of chunks (best example : `da.groupby('time.dayofyear')`, so any complex script using dask ends up with huge task graphs. Dask's scheduler becomes overloaded, sometimes even hangs, sometimes uses way more RAM than its workers. **Describe the solution you'd like** I'd want to profit from both the tools of xarray and the power of dask parallelization. I'd like to be able to do something like this: ```python3 def func(da): """"""Example of an operation not (easily) possible with numpy."""""" return da.groupby('time').mean() xr.apply_ufunc( da, func, input_core_dims=[['time']], pass_xr=True, dask='parallelized' ) ``` I'd like the wrapped func to receive DataArrays resembling the inputs (named dims, coords and all), but only with the subset of that dask chunk. Doing this, the whole function gets parallelized : dask only sees 1 task and I can code using xarray. Depending on the implementation, it might be less efficient than `dask=allowed` for small dataset, but I think this could be beneficial for long and complex computations on large datasets. **Describe alternatives you've considered** The alternative is to reduce the size of the datasets (looping on other dimensions), but that defeats the purpose of dask. Another alternative I am currently testing, is to add a layer between apply_ufunc and the `func`. That layer reconstruct a DataArray and deconstructs it before returning the result, so xarray/dask only passing by. If this works and is elegant enough, I can maybe suggest an implementation within xarray.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5010/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue