issues: 1495605827
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1495605827 | I_kwDOAMm_X85ZJSJD | 7376 | groupby+map performance regression on MultiIndex dataset | 1419010 | closed | 0 | 11 | 2022-12-14T03:56:06Z | 2023-08-22T14:47:13Z | 2023-08-22T14:47:13Z | NONE | What happened?We have upgraded to 2022.12.0 version, and noticed a significant performance regression (orders of magnitude) in a code that involves a groupby+map. This seems to be the issue since the 2022.6.0 release, which I understand had a number of changes (including to the groupby code paths) (release notes). What did you expect to happen?Fix the performance regression. Minimal Complete Verifiable Example```Python import contextlib import os import time from collections.abc import Iterator import numpy as np import pandas as pd import xarray as xr @contextlib.contextmanager def log_time(label: str) -> Iterator[None]: """Logs execution time of the context block""" t_0 = time.time() yield print(f"{label} took {time.time() - t_0} seconds") def main() -> None: m = 100_000 with log_time("creating df"): df = pd.DataFrame( { "i1": [1] * m + [2] * m + [3] * m + [4] * m, "i2": list(range(m)) * 4, "d3": np.random.randint(0, 2, 4 * m).astype(bool), } )
if name == "main": main() ``` MVCE confirmation
Relevant log output
xarray 2022.3.0:
Anything else we need to know?No response EnvironmentEnvironment of the version installed from source (
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:25:29) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.1.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.12.1.dev7+g021c73e1
pandas: 1.5.2
numpy: 1.23.5
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.1
pip: 22.3.1
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7376/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |