id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1497031605,I_kwDOAMm_X85ZOuO1,7377,Aggregating a dimension using the Quantiles method with `skipna=True` is very slow,56583917,closed,0,,,17,2022-12-14T16:52:35Z,2024-02-07T16:28:05Z,2024-02-07T16:28:05Z,CONTRIBUTOR,,,,"### What happened?

Hi all,
as the title already summarizes, I'm running into performance issues when aggregating over the time-dimension of a 3D DataArray using the [quantiles](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.quantile.html?highlight=quantiles) method with `skipna=True`. 
See the section below for some dummy data that represents what I'm working with (e.g., similar to [this](https://planetarycomputer.microsoft.com/dataset/sentinel-1-rtc)). Aggregating over the time-dimension of this dummy data I'm getting the following wall times: 

|   |   |  |
| --------------- | --------------- | --------------- | 
| 1 | `da.median(dim='time', skipna=True)` | 1.35 s |
| 2 | `da.quantile(0.95, dim='time', skipna=False)` | 5.95 s |
| 3 | `da.quantile(0.95, dim='time', skipna=True)` | 6 min 6s |


I'm currently using a compute node with 40 CPUs and 180 GB RAM. Here is what the resource utilization looks like. First small bump are 1 and 2. Second longer peak is 3.

![Screenshot 2022-12-14 at 17 33 14](https://user-images.githubusercontent.com/56583917/207654729-7ccecfc9-93f9-49f3-9bff-18f8643996d3.png)

In this small example, the process at least finishes after a few seconds. With my actual dataset the quantile calculation takes hours... 

I guess the following issue is relevant and should be revived:  https://github.com/numpy/numpy/issues/16575

Are there any possible work-arounds?


### What did you expect to happen?

_No response_

### Minimal Complete Verifiable Example

```Python
import pandas as pd
import numpy as np
import xarray as xr

# Create dummy data with 20% random NaNs
size_spatial = 2000
size_temporal = 20
n_nan = int(size_spatial**2*0.2)

time = pd.date_range(""2000-01-01"", periods=size_temporal)
lat = np.random.uniform(low=-90, high=90, size=size_spatial)
lon = np.random.uniform(low=-180, high=180, size=size_spatial)
data = np.random.rand(size_temporal, size_spatial, size_spatial)
index_nan = np.random.choice(data.size, n_nan, replace=False)
data.ravel()[index_nan] = np.nan

# Create DataArray
da = xr.DataArray(data=data, 
                  dims=['time', 'x', 'y'], 
                  coords={'time': time, 'x': lon, 'y': lat}, 
                  attrs={'nodata': np.nan})

# Calculate 95th quantile over time-dimension
da.quantile(0.95, dim='time', skipna=True)
```


### MVCE confirmation

- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

_No response_

### Anything else we need to know?

_No response_

### Environment

<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-125-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.0

xarray: 2022.12.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.3
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.10.0
distributed: 2022.10.0
matplotlib: 3.6.1
cartopy: 0.21.0
seaborn: 0.12.0
numbagg: None
fsspec: 2022.8.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.0
pip: 22.3
conda: 4.12.0
pytest: None
mypy: None
IPython: 8.5.0
sphinx: None
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7377/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2098488235,I_kwDOAMm_X859FGOr,8654,Inconsistent preservation of chunk alignment for groupby-/resample-reduce operations w/o using flox,56583917,closed,0,,,2,2024-01-24T15:12:38Z,2024-01-24T16:23:20Z,2024-01-24T15:58:22Z,CONTRIBUTOR,,,,"### What happened?

When performing groupby-/resample-reduce operations (e.g., `ds.resample(time=""6h"").mean()` as shown [here](https://docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations)) the alignment of chunks is **not preserved when flox is disabled**:

![image](https://github.com/pydata/xarray/assets/56583917/2d365cfb-294d-40cc-9456-a825c1914446)

...whereas the alignment **is preserved when flox is enabled**:

![image](https://github.com/pydata/xarray/assets/56583917/ff0dc739-826b-45f5-98f1-5cab8fe6011f)


### What did you expect to happen?

The alignment of chunks is preserved whether using flox or not.

### Minimal Complete Verifiable Example

```Python
import pandas as pd
import numpy as np
import xarray as xr

size_spatial = 1000
size_temporal = 200
time = pd.date_range(""2000-01-01"", periods=size_temporal, freq='h')
lat = np.random.uniform(low=-90, high=90, size=size_spatial)
lon = np.random.uniform(low=-180, high=180, size=size_spatial)
data = np.random.rand(size_temporal, size_spatial, size_spatial)

da = xr.DataArray(data=data, 
                  dims=['time', 'x', 'y'], 
                  coords={'time': time, 'x': lon, 'y': lat}).chunk({'time': -1, 'x': 'auto', 'y': 'auto'}) 

# Chunk alignment not preserved
with xr.set_options(use_flox=False):
    da_1 = da.copy(deep=True)
    da_1 = da_1.resample(time=""6h"").mean()

# Chunk alignment preserved
with xr.set_options(use_flox=True):
    da_2 = da.copy(deep=True)
    da_2 = da_2.resample(time=""6h"").mean()
```


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

### Relevant log output

_No response_

### Anything else we need to know?

_No response_

### Environment

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:38:07) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2024.1.1
pandas: 2.2.0
numpy: 1.26.3
scipy: 1.12.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.1.0
distributed: 2024.1.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: 0.7.1
fsspec: 2023.12.2
cupy: None
pint: None
sparse: None
flox: 0.9.0
numpy_groupies: 0.10.2
setuptools: 69.0.3
pip: 23.3.2
conda: None
pytest: None
mypy: None
IPython: 8.20.0
sphinx: None
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8654/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue
1963071630,I_kwDOAMm_X851AhiO,8378,Extend DatetimeAccessor with `snap`-method,56583917,open,0,,,2,2023-10-26T09:16:24Z,2023-10-27T08:08:58Z,,CONTRIBUTOR,,,,"### Is your feature request related to a problem?

With satellite remote sensing data, you sometimes end up with a blown up DataArray/Dataset because individual acquisitions have been saved in slices:

![group_acq_slices_1](https://github.com/pydata/xarray/assets/56583917/e439c153-b0f6-4025-85b2-9b31e1daf784)

One could then aggregate these slices with something like this: 

```python
ds.coords['time'] = ds.time.dt.floor('1H')  # or .ceil
ds = ds_copy.groupby('time').mean()
```

However, this would miss cases where one slice has been acquired before and the other after a specific hour. The  [`pandas.DatetimeIndex.snap`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.snap.html) method could be a good alternative for such cases.


### Describe the solution you'd like

In addition to the [`floor`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.dt.floor.html), [`ceil`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.dt.ceil.html) and [`round`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.dt.round.html) methods, it would be great to also implement [`pandas.DatetimeIndex.snap`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.snap.html).

### Describe alternatives you've considered

_No response_

### Additional context

_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8378/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1497131525,I_kwDOAMm_X85ZPGoF,7378,Improve docstrings for better discoverability,56583917,open,0,,,9,2022-12-14T17:59:20Z,2023-04-02T04:26:57Z,,CONTRIBUTOR,,,,"### What is your issue?

I noticed that the docstrings of the [aggregation methods](https://docs.xarray.dev/en/stable/api.html#aggregation) are mostly written in the same style, e.g.: ""Reduce this Dataset's data by applying xy along some dimension(s)."".  Let's say a user is interested in calculating the variance and searches for the appropriate method. Neither [xarray.DataArray.var](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.var.html#xarray.DataArray.var) nor [xarray.Dataset.var](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.var.html#xarray.Dataset.var) will be returned (see [here](https://docs.xarray.dev/en/stable/search.html?q=variance#)), because ""variance"" is not mentioned at all in the docstrings. Same problem exists for other methods like `.std`, `.prod`, `.cumsum`, `.cumprod`, and probably others.

https://github.com/pydata/xarray/issues/6793 is related, but I guess it already has enough tasks.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7378/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue