id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1495605827,I_kwDOAMm_X85ZJSJD,7376,groupby+map performance regression on MultiIndex dataset,1419010,closed,0,,,11,2022-12-14T03:56:06Z,2023-08-22T14:47:13Z,2023-08-22T14:47:13Z,NONE,,,,"### What happened? We have upgraded to 2022.12.0 version, and noticed a significant performance regression (orders of magnitude) in a code that involves a groupby+map. This seems to be the issue since the 2022.6.0 release, which I understand had a number of changes (including to the groupby code paths) ([release notes](https://docs.xarray.dev/en/stable/whats-new.html#v2022-06-0-july-21-2022)). ### What did you expect to happen? Fix the performance regression. ### Minimal Complete Verifiable Example ```Python import contextlib import os import time from collections.abc import Iterator import numpy as np import pandas as pd import xarray as xr @contextlib.contextmanager def log_time(label: str) -> Iterator[None]: """"""Logs execution time of the context block"""""" t_0 = time.time() yield print(f""{label} took {time.time() - t_0} seconds"") def main() -> None: m = 100_000 with log_time(""creating df""): df = pd.DataFrame( { ""i1"": [1] * m + [2] * m + [3] * m + [4] * m, ""i2"": list(range(m)) * 4, ""d3"": np.random.randint(0, 2, 4 * m).astype(bool), } ) ds = df.to_xarray().set_coords([""i1"", ""i2""]).set_index(index=[""i1"", ""i2""]) with log_time(""groupby""): def per_grp(da: xr.DataArray) -> xr.DataArray: return da (ds.assign(x=lambda ds: ds[""d3""].groupby(""i1"").map(per_grp))) if __name__ == ""__main__"": main() ``` ### MVCE confirmation - [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [x] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [x] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output ```Python xarray current main `2022.12.1.dev7+g021c73e1`, but affects all version since 2022.6.0 (inclusive). > creating df took 0.10657930374145508 seconds > groupby took 129.5521149635315 seconds