issues
2 rows where user = 1419010 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date), closed_at (date)
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1495605827 | I_kwDOAMm_X85ZJSJD | 7376 | groupby+map performance regression on MultiIndex dataset | ravwojdyla 1419010 | closed | 0 | 11 | 2022-12-14T03:56:06Z | 2023-08-22T14:47:13Z | 2023-08-22T14:47:13Z | NONE | What happened?We have upgraded to 2022.12.0 version, and noticed a significant performance regression (orders of magnitude) in a code that involves a groupby+map. This seems to be the issue since the 2022.6.0 release, which I understand had a number of changes (including to the groupby code paths) (release notes). What did you expect to happen?Fix the performance regression. Minimal Complete Verifiable Example```Python import contextlib import os import time from collections.abc import Iterator import numpy as np import pandas as pd import xarray as xr @contextlib.contextmanager def log_time(label: str) -> Iterator[None]: """Logs execution time of the context block""" t_0 = time.time() yield print(f"{label} took {time.time() - t_0} seconds") def main() -> None: m = 100_000 with log_time("creating df"): df = pd.DataFrame( { "i1": [1] * m + [2] * m + [3] * m + [4] * m, "i2": list(range(m)) * 4, "d3": np.random.randint(0, 2, 4 * m).astype(bool), } )
if name == "main": main() ``` MVCE confirmation
Relevant log output
xarray 2022.3.0:
Anything else we need to know?No response EnvironmentEnvironment of the version installed from source (
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:25:29) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.1.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.12.1.dev7+g021c73e1
pandas: 1.5.2
numpy: 1.23.5
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.1
pip: 22.3.1
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7376/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 753374426 | MDU6SXNzdWU3NTMzNzQ0MjY= | 4623 | Allow chunk spec per variable | ravwojdyla 1419010 | open | 0 | 3 | 2020-11-30T10:56:39Z | 2020-12-19T17:17:23Z | NONE | Say, I have a zarr dataset with multiple variables Note that Originally posted by @ravwojdyla in https://github.com/pydata/xarray/issues/4496#issuecomment-732486436 |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/4623/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] (
[id] INTEGER PRIMARY KEY,
[node_id] TEXT,
[number] INTEGER,
[title] TEXT,
[user] INTEGER REFERENCES [users]([id]),
[state] TEXT,
[locked] INTEGER,
[assignee] INTEGER REFERENCES [users]([id]),
[milestone] INTEGER REFERENCES [milestones]([id]),
[comments] INTEGER,
[created_at] TEXT,
[updated_at] TEXT,
[closed_at] TEXT,
[author_association] TEXT,
[active_lock_reason] TEXT,
[draft] INTEGER,
[pull_request] TEXT,
[body] TEXT,
[reactions] TEXT,
[performed_via_github_app] TEXT,
[state_reason] TEXT,
[repo] INTEGER REFERENCES [repos]([id]),
[type] TEXT
);
CREATE INDEX [idx_issues_repo]
ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
ON [issues] ([user]);