home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1657036222

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1657036222 I_kwDOAMm_X85ixF2- 7730 flox performance regression for cftime resampling 10194086 closed 0     8 2023-04-06T09:38:03Z 2023-10-15T03:48:44Z 2023-10-15T03:48:44Z MEMBER      

What happened?

Running an in-memory groupby operation took much longer than expected. Turning off flox fixed this - but I don't think that's the idea ;-)

What did you expect to happen?

flox to be at least on par with our naive implementation

Minimal Complete Verifiable Example

```Python import numpy as np import xarray as xr

arr = np.random.randn(10, 10, 36530) time = xr.date_range("2000", periods=30365, calendar="noleap") da = xr.DataArray(arr, dims=("y", "x", "time"), coords={"time": time})

using max

print("max:") xr.set_options(use_flox=True) %timeit da.groupby("time.year").max("time") %timeit da.groupby("time.year").max("time", engine="flox")

xr.set_options(use_flox=False) %timeit da.groupby("time.year").max("time")

as reference

%timeit [da.sel(time=str(year)).max("time") for year in range(2000, 2030)]

using mean

print("mean:") xr.set_options(use_flox=True) %timeit da.groupby("time.year").mean("time") %timeit da.groupby("time.year").mean("time", engine="flox")

xr.set_options(use_flox=False) %timeit da.groupby("time.year").mean("time")

as reference

%timeit [da.sel(time=str(year)).mean("time") for year in range(2000, 2030)] ```

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python max: 158 ms ± 4.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 28.1 ms ± 318 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 11.5 ms ± 52.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

mean: 95.6 ms ± 10.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 34.8 ms ± 2.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 15.2 ms ± 232 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: f8127fc9ad24fe8b41cce9f891ab2c98eb2c679a python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.15.0-69-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: main pandas: 1.5.3 numpy: 1.23.5 scipy: 1.10.1 netCDF4: 1.6.3 pydap: installed h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.14.2 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: 3.2.2 iris: 3.4.1 bottleneck: 1.3.7 dask: 2023.3.2 distributed: 2023.3.2.1 matplotlib: 3.7.1 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: 0.2.2 fsspec: 2023.3.0 cupy: None pint: 0.20.1 sparse: 0.14.0 flox: 0.6.10 numpy_groupies: 0.9.20 setuptools: 67.6.1 pip: 23.0.1 conda: None pytest: 7.2.2 mypy: None IPython: 8.12.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7730/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 6 rows from issue in issue_comments
Powered by Datasette · Queries took 0.785ms · About: xarray-datasette