id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1495605827,I_kwDOAMm_X85ZJSJD,7376,groupby+map performance regression on MultiIndex dataset,1419010,closed,0,,,11,2022-12-14T03:56:06Z,2023-08-22T14:47:13Z,2023-08-22T14:47:13Z,NONE,,,,"### What happened?

We have upgraded to 2022.12.0 version, and noticed a significant performance regression (orders of magnitude) in a code that involves a groupby+map. This seems to be the issue since the 2022.6.0 release, which I understand had a number of changes (including to the groupby code paths) ([release notes](https://docs.xarray.dev/en/stable/whats-new.html#v2022-06-0-july-21-2022)).

### What did you expect to happen?

Fix the performance regression.

### Minimal Complete Verifiable Example

```Python
import contextlib
import os
import time
from collections.abc import Iterator

import numpy as np
import pandas as pd
import xarray as xr


@contextlib.contextmanager
def log_time(label: str) -> Iterator[None]:
    """"""Logs execution time of the context block""""""
    t_0 = time.time()
    yield
    print(f""{label} took {time.time() - t_0} seconds"")


def main() -> None:
    m = 100_000
    with log_time(""creating df""):
        df = pd.DataFrame(
            {
                ""i1"": [1] * m + [2] * m + [3] * m + [4] * m,
                ""i2"": list(range(m)) * 4,
                ""d3"": np.random.randint(0, 2, 4 * m).astype(bool),
            }
        )

        ds = df.to_xarray().set_coords([""i1"", ""i2""]).set_index(index=[""i1"", ""i2""])

    with log_time(""groupby""):

        def per_grp(da: xr.DataArray) -> xr.DataArray:
            return da

        (ds.assign(x=lambda ds: ds[""d3""].groupby(""i1"").map(per_grp)))


if __name__ == ""__main__"":
    main()
```


### MVCE confirmation

- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

```Python
xarray current main `2022.12.1.dev7+g021c73e1`, but affects all version since 2022.6.0 (inclusive). 

> creating df took 0.10657930374145508 seconds
> groupby took 129.5521149635315 seconds

<hr>

xarray 2022.3.0:

> creating df took 0.09968900680541992 seconds
> groupby took 0.19161295890808105 seconds
```


### Anything else we need to know?

_No response_

### Environment

Environment of the version installed from source (`2022.12.1.dev7+g021c73e1`):

<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:25:29) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.1.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2022.12.1.dev7+g021c73e1
pandas: 1.5.2
numpy: 1.23.5
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.1
pip: 22.3.1
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7376/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
753374426,MDU6SXNzdWU3NTMzNzQ0MjY=,4623,Allow chunk spec per variable,1419010,open,0,,,3,2020-11-30T10:56:39Z,2020-12-19T17:17:23Z,,NONE,,,,"Say, I have a zarr dataset with multiple variables `Foo`, `Bar` and `Baz` (and potentially, many more), there are 2 dimensions: `x`, `y` (potentially more). Say both `Foo` and `Bar` are large 2d arrays dims: `x, y`, `Baz` is relatively small 1d array dim: `y`. Say I would like to read that dataset with xarray but increase chunk from the native zarr chunk size for `x` and `y` but only for `Foo` and `Bar`, I would like to keep native chunking for ` Baz`. afaiu currently I would do that with `chunks` parameter to `open_dataset`/`open_zarr`, but if I do do that via say `dict(x=N, y=M)` that will change chunking for all variables that use those dimensions, which isn't exactly what I need, I need those changed only for `Foo` and `Bar`. Is there a way to do that? Should that be part of the ""harmonisation""? One could imagine that xarray could accept a dict of dict akin to `{var: {dim: chunk_spec}}` to specify chunking for specific variables.

Note that `rechunk` after reading is not what I want, I would like to specify chunking at read op.

_Originally posted by @ravwojdyla in https://github.com/pydata/xarray/issues/4496#issuecomment-732486436_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4623/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue