home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

4 rows where repo = 13221727, state = "closed" and user = 56583917 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 2
  • pull 2

state 1

  • closed · 4 ✖

repo 1

  • xarray · 4 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2108557477 PR_kwDOAMm_X85lfTRf 8684 Enable `numbagg` in calculation of quantiles maawoo 56583917 closed 0     5 2024-01-30T18:59:55Z 2024-02-11T22:31:26Z 2024-02-07T16:28:04Z CONTRIBUTOR   0 pydata/xarray/pulls/8684
  • [x] Closes #7377
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

Just saw your message in the related issue @max-sixty. This is what I came up with earlier. I also did a quick test, comparing the calculation with and without using numbagg for a dummy 3D DataArray. I was only wondering if the default usage of numbagg (given that it's available and method='linear') should be noted somewhere in the docstrings and/or the docs in general.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8684/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1497031605 I_kwDOAMm_X85ZOuO1 7377 Aggregating a dimension using the Quantiles method with `skipna=True` is very slow maawoo 56583917 closed 0     17 2022-12-14T16:52:35Z 2024-02-07T16:28:05Z 2024-02-07T16:28:05Z CONTRIBUTOR      

What happened?

Hi all, as the title already summarizes, I'm running into performance issues when aggregating over the time-dimension of a 3D DataArray using the quantiles method with skipna=True. See the section below for some dummy data that represents what I'm working with (e.g., similar to this). Aggregating over the time-dimension of this dummy data I'm getting the following wall times:

| | | | | --------------- | --------------- | --------------- | | 1 | da.median(dim='time', skipna=True) | 1.35 s | | 2 | da.quantile(0.95, dim='time', skipna=False) | 5.95 s | | 3 | da.quantile(0.95, dim='time', skipna=True) | 6 min 6s |

I'm currently using a compute node with 40 CPUs and 180 GB RAM. Here is what the resource utilization looks like. First small bump are 1 and 2. Second longer peak is 3.

In this small example, the process at least finishes after a few seconds. With my actual dataset the quantile calculation takes hours...

I guess the following issue is relevant and should be revived: https://github.com/numpy/numpy/issues/16575

Are there any possible work-arounds?

What did you expect to happen?

No response

Minimal Complete Verifiable Example

```Python import pandas as pd import numpy as np import xarray as xr

Create dummy data with 20% random NaNs

size_spatial = 2000 size_temporal = 20 n_nan = int(size_spatial*20.2)

time = pd.date_range("2000-01-01", periods=size_temporal) lat = np.random.uniform(low=-90, high=90, size=size_spatial) lon = np.random.uniform(low=-180, high=180, size=size_spatial) data = np.random.rand(size_temporal, size_spatial, size_spatial) index_nan = np.random.choice(data.size, n_nan, replace=False) data.ravel()[index_nan] = np.nan

Create DataArray

da = xr.DataArray(data=data, dims=['time', 'x', 'y'], coords={'time': time, 'x': lon, 'y': lat}, attrs={'nodata': np.nan})

Calculate 95th quantile over time-dimension

da.quantile(0.95, dim='time', skipna=True) ```

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [x] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 5.4.0-125-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0 xarray: 2022.12.0 pandas: 1.5.0 numpy: 1.23.3 scipy: 1.9.1 netCDF4: 1.6.1 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.3 cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.10.0 distributed: 2022.10.0 matplotlib: 3.6.1 cartopy: 0.21.0 seaborn: 0.12.0 numbagg: None fsspec: 2022.8.2 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.5.0 pip: 22.3 conda: 4.12.0 pytest: None mypy: None IPython: 8.5.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7377/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2108212331 PR_kwDOAMm_X85leIWu 8683 Docs: Fix url in "Contribute to xarray" guide maawoo 56583917 closed 0     3 2024-01-30T15:59:37Z 2024-01-30T18:13:36Z 2024-01-30T18:13:26Z CONTRIBUTOR   0 pydata/xarray/pulls/8683

The URL in the section about creating a local development environment was pointing to itself. The new URL is pointing to the (I assume) correct section further down in the same guide.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8683/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2098488235 I_kwDOAMm_X859FGOr 8654 Inconsistent preservation of chunk alignment for groupby-/resample-reduce operations w/o using flox maawoo 56583917 closed 0     2 2024-01-24T15:12:38Z 2024-01-24T16:23:20Z 2024-01-24T15:58:22Z CONTRIBUTOR      

What happened?

When performing groupby-/resample-reduce operations (e.g., ds.resample(time="6h").mean() as shown here) the alignment of chunks is not preserved when flox is disabled:

...whereas the alignment is preserved when flox is enabled:

What did you expect to happen?

The alignment of chunks is preserved whether using flox or not.

Minimal Complete Verifiable Example

```Python import pandas as pd import numpy as np import xarray as xr

size_spatial = 1000 size_temporal = 200 time = pd.date_range("2000-01-01", periods=size_temporal, freq='h') lat = np.random.uniform(low=-90, high=90, size=size_spatial) lon = np.random.uniform(low=-180, high=180, size=size_spatial) data = np.random.rand(size_temporal, size_spatial, size_spatial)

da = xr.DataArray(data=data, dims=['time', 'x', 'y'], coords={'time': time, 'x': lon, 'y': lat}).chunk({'time': -1, 'x': 'auto', 'y': 'auto'})

Chunk alignment not preserved

with xr.set_options(use_flox=False): da_1 = da.copy(deep=True) da_1 = da_1.resample(time="6h").mean()

Chunk alignment preserved

with xr.set_options(use_flox=True): da_2 = da.copy(deep=True) da_2 = da_2.resample(time="6h").mean() ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:38:07) [Clang 16.0.6 ] python-bits: 64 OS: Darwin OS-release: 22.4.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.3 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: 2024.1.0 distributed: 2024.1.0 matplotlib: None cartopy: None seaborn: None numbagg: 0.7.1 fsspec: 2023.12.2 cupy: None pint: None sparse: None flox: 0.9.0 numpy_groupies: 0.10.2 setuptools: 69.0.3 pip: 23.3.2 conda: None pytest: None mypy: None IPython: 8.20.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8654/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  not_planned xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 1767.313ms · About: xarray-datasette