id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1962680526,I_kwDOAMm_X850_CDO,8377,Slow performance with groupby using a custom DataArray grouper,33886395,closed,0,,,6,2023-10-26T04:28:00Z,2024-02-15T22:44:18Z,2024-02-15T22:44:18Z,NONE,,,,"### What is your issue?

I have a code that calculates a per-pixel nearest neighbor match between two datasets, to then perform a groupby + aggregation.
The calculation I perform is generally lazy using dask.

I recently noticed a slow performance of groupby in this way, with lazy calculations taking in excess of 10 minutes for an index of approximately 4000 by 4000.

I did a bit of digging around and noticed that the slow line is [this](https://github.com/pydata/xarray/blob/main/xarray/core/indexing.py#L1429):
```Python
Timer unit: 1e-09 s

Total time: 0.263679 s
File: /env/lib/python3.10/site-packages/xarray/core/duck_array_ops.py
Function: array_equiv at line 260

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   260                                           def array_equiv(arr1, arr2):
   261                                               """"""Like np.array_equal, but also allows values to be NaN in both arrays""""""
   262     22140   96490101.0   4358.2     36.6      arr1 = asarray(arr1)
   263     22140   34155953.0   1542.7     13.0      arr2 = asarray(arr2)
   264     22140  119855572.0   5413.5     45.5      lazy_equiv = lazy_array_equiv(arr1, arr2)
   265     22140    7390478.0    333.8      2.8      if lazy_equiv is None:
   266                                                   with warnings.catch_warnings():
   267                                                       warnings.filterwarnings(""ignore"", ""In the future, 'NAT == x'"")
   268                                                       flag_array = (arr1 == arr2) | (isnull(arr1) & isnull(arr2))
   269                                                       return bool(flag_array.all())
   270                                               else:
   271     22140    5787053.0    261.4      2.2          return lazy_equiv

Total time: 242.247 s
File: /env/lib/python3.10/site-packages/xarray/core/indexing.py
Function: __getitem__ at line 1419

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1419                                               def __getitem__(self, key):
  1420     22140   26764337.0   1208.9      0.0          if not isinstance(key, VectorizedIndexer):
  1421                                                       # if possible, short-circuit when keys are effectively slice(None)
  1422                                                       # This preserves dask name and passes lazy array equivalence checks
  1423                                                       # (see duck_array_ops.lazy_array_equiv)
  1424     22140   10513930.0    474.9      0.0              rewritten_indexer = False
  1425     22140    4602305.0    207.9      0.0              new_indexer = []
  1426     66420   61804870.0    930.5      0.0              for idim, k in enumerate(key.tuple):
  1427     88560   78516641.0    886.6      0.0                  if isinstance(k, Iterable) and (
  1428     22140  151748667.0   6854.1      0.1                      not is_duck_dask_array(k)
  1429     22140        2e+11    1e+07     93.6                      and duck_array_ops.array_equiv(k, np.arange(self.array.shape[idim]))
  1430                                                           ):
  1431                                                               new_indexer.append(slice(None))
  1432                                                               rewritten_indexer = True
  1433                                                           else:
  1434     44280   40322984.0    910.6      0.0                      new_indexer.append(k)
  1435     22140    4847251.0    218.9      0.0              if rewritten_indexer:
  1436                                                           key = type(key)(tuple(new_indexer))
  1437                                           
  1438     22140   24251221.0   1095.4      0.0          if isinstance(key, BasicIndexer):
  1439                                                       return self.array[key.tuple]
  1440     22140    9613954.0    434.2      0.0          elif isinstance(key, VectorizedIndexer):
  1441                                                       return self.array.vindex[key.tuple]
  1442                                                   else:
  1443     22140    8618414.0    389.3      0.0              assert isinstance(key, OuterIndexer)
  1444     22140   26601491.0   1201.5      0.0              key = key.tuple
  1445     22140    6010672.0    271.5      0.0              try:
  1446     22140        2e+10 678487.7      6.2                  return self.array[key]
  1447                                                       except NotImplementedError:
  1448                                                           # manual orthogonal indexing.
  1449                                                           # TODO: port this upstream into dask in a saner way.
  1450                                                           value = self.array
  1451                                                           for axis, subkey in reversed(list(enumerate(key))):
  1452                                                               value = value[(slice(None),) * axis + (subkey,)]
  1453                                                           return value
```

The test `duck_array_ops.array_equiv(k, np.arange(self.array.shape[idim]))` is repeated multiple times, and despite that being decently fast it amounts to a lot of time that could be potentially minimized by introducing a prior test of equal length, like

```python
                if isinstance(k, Iterable) and (
                    not is_duck_dask_array(k)
                    and len(k) == self.array.shape[idim]
                    and duck_array_ops.array_equiv(k, np.arange(self.array.shape[idim]))
                ):
```

This would work better because, despite that test being performed [by array_equiv](https://github.com/pydata/xarray/blob/main/xarray/core/duck_array_ops.py#L233), currently the array to test against is always created using `np.arange`, that being ultimately the bottleneck
```Python
         74992059 function calls (73375414 primitive calls) in 298.934 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    22140  225.296    0.010  225.296    0.010 {built-in method numpy.arange}
   177123    3.192    0.000    3.670    0.000 inspect.py:2920(__init__)
110702/110701    2.180    0.000    2.180    0.000 {built-in method numpy.asarray}
11690863/11668723    2.036    0.000    5.043    0.000 {built-in method builtins.isinstance}
   287827    1.876    0.000    3.768    0.000 utils.py:25(meta_from_array)
   132843    1.872    0.000    7.649    0.000 inspect.py:2280(_signature_from_function)
   974166    1.485    0.000    2.558    0.000 inspect.py:2637(__init__)
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8377/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,completed,13221727,issue
1397532790,I_kwDOAMm_X85TTKh2,7132,Saving a DataArray of datetime objects as zarr is not a lazy operation despite compute=False,33886395,closed,0,,,2,2022-10-05T09:50:34Z,2024-01-29T19:12:32Z,2024-01-29T19:12:32Z,NONE,,,,"### What happened?

Trying to save a lazy xr.DataArray of datetime objects as a zarr forces a dask.compute operation and retrieves the data to the local notebook. This is generally not a problem for indices of datetime objects as that is already locally store and generally small in size.

However, if the whole underlying array is a datetime object, that can be a serious problem. In my case it simply crashed the scheduler upon attempting to retrieve the data persisted on workers.

I managed to isolate the problem on this call stack. The issue is in the `encode_cf_datetime` function


### What did you expect to happen?

Storing the data in zarr format to be performed directly by dask workers bypassing the scheduler/Client if `compute=True`, and complete lazy operation if `compute=False`

### Minimal Complete Verifiable Example

```Python
import numpy as np
import xarray as xr
import dask.array as da
test = xr.DataArray(
    data = da.full((20000, 20000), np.datetime64('2005-02-25T03:30', 'ns')),
    coords = {'x': range(20000), 'y': range(20000)}
).to_dataset(name='test')

print(test.test.dtype)
# dtype('<M8[ns]')

test.to_zarr('test.zarr', compute=False)
# this will take a while and trigger the computation of the array. No data will be actually saved though
```


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

```Python
File /env/lib/python3.8/site-packages/xarray/core/dataset.py:2036, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options)
   2033 if encoding is None:
   2034     encoding = {}
-> 2036 return to_zarr(
   2037     self,
   2038     store=store,
   2039     chunk_store=chunk_store,
   2040     storage_options=storage_options,
   2041     mode=mode,
   2042     synchronizer=synchronizer,
   2043     group=group,
   2044     encoding=encoding,
   2045     compute=compute,
   2046     consolidated=consolidated,
   2047     append_dim=append_dim,
   2048     region=region,
   2049     safe_chunks=safe_chunks,
   2050 )

File /env/lib/python3.8/site-packages/xarray/backends/api.py:1431, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options)
   1429 writer = ArrayWriter()
   1430 # TODO: figure out how to properly handle unlimited_dims
-> 1431 dump_to_store(dataset, zstore, writer, encoding=encoding)
   1432 writes = writer.sync(compute=compute)
   1434 if compute:

File /env/lib/python3.8/site-packages/xarray/backends/api.py:1119, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1116 if encoder:
   1117     variables, attrs = encoder(variables, attrs)
-> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File /env/lib/python3.8/site-packages/xarray/backends/zarr.py:500, in ZarrStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    498 new_variables = set(variables) - existing_variable_names
    499 variables_without_encoding = {vn: variables[vn] for vn in new_variables}
--> 500 variables_encoded, attributes = self.encode(
    501     variables_without_encoding, attributes
    502 )
    504 if existing_variable_names:
    505     # Decode variables directly, without going via xarray.Dataset to
    506     # avoid needing to load index variables into memory.
    507     # TODO: consider making loading indexes lazy again?
    508     existing_vars, _, _ = conventions.decode_cf_variables(
    509         self.get_variables(), self.get_attrs()
    510     )

File /env/lib/python3.8/site-packages/xarray/backends/common.py:200, in AbstractWritableDataStore.encode(self, variables, attributes)
    183 def encode(self, variables, attributes):
    184     """"""
    185     Encode the variables and attributes in this store
    186 
   (...)
    198 
    199     """"""
--> 200     variables = {k: self.encode_variable(v) for k, v in variables.items()}
    201     attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}
    202     return variables, attributes

File /env/lib/python3.8/site-packages/xarray/backends/common.py:200, in <dictcomp>(.0)
    183 def encode(self, variables, attributes):
    184     """"""
    185     Encode the variables and attributes in this store
    186 
   (...)
    198 
    199     """"""
--> 200     variables = {k: self.encode_variable(v) for k, v in variables.items()}
    201     attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}
    202     return variables, attributes

File /env/lib/python3.8/site-packages/xarray/backends/zarr.py:459, in ZarrStore.encode_variable(self, variable)
    458 def encode_variable(self, variable):
--> 459     variable = encode_zarr_variable(variable)
    460     return variable

File /env/lib/python3.8/site-packages/xarray/backends/zarr.py:258, in encode_zarr_variable(var, needs_copy, name)
    237 def encode_zarr_variable(var, needs_copy=True, name=None):
    238     """"""
    239     Converts an Variable into an Variable which follows some
    240     of the CF conventions:
   (...)
    255         A variable which has been encoded as described above.
    256     """"""
--> 258     var = conventions.encode_cf_variable(var, name=name)
    260     # zarr allows unicode, but not variable-length strings, so it's both
    261     # simpler and more compact to always encode as UTF-8 explicitly.
    262     # TODO: allow toggling this explicitly via dtype in encoding.
    263     coder = coding.strings.EncodedStringCoder(allows_unicode=True)

File /env/lib/python3.8/site-packages/xarray/conventions.py:273, in encode_cf_variable(var, needs_copy, name)
    264 ensure_not_multiindex(var, name=name)
    266 for coder in [
    267     times.CFDatetimeCoder(),
    268     times.CFTimedeltaCoder(),
   (...)
    271     variables.UnsignedIntegerCoder(),
    272 ]:
--> 273     var = coder.encode(var, name=name)
    275 # TODO(shoyer): convert all of these to use coders, too:
    276 var = maybe_encode_nonstring_dtype(var, name=name)

File /env/lib/python3.8/site-packages/xarray/coding/times.py:659, in CFDatetimeCoder.encode(self, variable, name)
    655 dims, data, attrs, encoding = unpack_for_encoding(variable)
    656 if np.issubdtype(data.dtype, np.datetime64) or contains_cftime_datetimes(
    657     variable
    658 ):
--> 659     (data, units, calendar) = encode_cf_datetime(
    660         data, encoding.pop(""units"", None), encoding.pop(""calendar"", None)
    661     )
    662     safe_setitem(attrs, ""units"", units, name=name)
    663     safe_setitem(attrs, ""calendar"", calendar, name=name)

File /env/lib/python3.8/site-packages/xarray/coding/times.py:592, in encode_cf_datetime(dates, units, calendar)
    582 def encode_cf_datetime(dates, units=None, calendar=None):
    583     """"""Given an array of datetime objects, returns the tuple `(num, units,
    584     calendar)` suitable for a CF compliant time variable.
    585 
   (...)
    590     cftime.date2num
    591     """"""
--> 592     dates = np.asarray(dates)
    594     if units is None:
    595         units = infer_datetime_units(dates)
```


### Anything else we need to know?

Our system uses dask_gateway in a AWS infrastructure (S3 for storage)

### Environment

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.209-116.367.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.4
libnetcdf: 4.7.3

xarray: 2022.3.0
pandas: 1.5.0
numpy: 1.22.4
scipy: 1.9.1
netCDF4: 1.6.1
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.2
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.9.2
distributed: 2022.9.2
matplotlib: 3.6.0
cartopy: 0.20.2
seaborn: 0.12.0
numbagg: None
fsspec: 2022.8.2
cupy: None
pint: None
sparse: 0.13.0
setuptools: 65.4.1
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 8.5.0
sphinx: None
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7132/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1307523148,I_kwDOAMm_X85N7zhM,6803,Passing a distributed.Future to the kwargs of apply_ufunc should resolve the future,33886395,closed,0,,,10,2022-07-18T07:31:28Z,2024-01-09T18:21:15Z,2023-12-19T05:40:20Z,NONE,,,,"### What is your issue?

I am trying to scatter an large array and pass it as keyword argument to a function applied using `apply_ufunc` but that is currently not working. 
The same function works if providing the actual array, but if providing the Future linked to the scatter data the task fails.

Here is a minimal example to reproduce this issue

```python
import dask.array as da
import xarray as xr
import numpy as np

data = xr.DataArray(data=da.random.random((15, 15, 20)), coords={'x': range(15), 'y': range(15), 'z': range(20)}, dims=('x', 'y', 'z'))

test = np.full((20,), 30)
test_future = client.scatter(test, broadcast=True)

def _copy_test(d, test=None):
    return test


new_data_actual = xr.apply_ufunc(
    _copy_test,
    data, 
    input_core_dims=[['z']],
    output_core_dims=[['new_z']],
    vectorize=True,
    dask='parallelized',
    output_dtypes=""float64"",
    kwargs={'test':test},
    dask_gufunc_kwargs = {'output_sizes':{'new_z':20}}
)

new_data_future = xr.apply_ufunc(
    _copy_test,
    data, 
    input_core_dims=[['z']],
    output_core_dims=[['new_z']],
    vectorize=True,
    dask='parallelized',
    output_dtypes=""float64"",
    kwargs={'test':test_future},
    dask_gufunc_kwargs = {'output_sizes':{'new_z':20}}
)

data[0, 0].compute()
#[0.3034994 , 0.08172002, 0.34731092, ...]

new_data_actual[0, 0].compute()
#[30.0, 30.0, 30.0, ...]

new_data_future[0,0].compute()
#KilledWorker

```

I tried different versions of this, going from explicitly calling `test.result()` to change the way the Future was passed, but nothing worked. 
I also tried to raise exceptions within the function and various way to print information, but that also did not work. This last issue makes me think that if passing a Future I actually don't get to the scope of that function

Am I trying to do something completely silly? or is this an unexpected behavior?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6803/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
584865241,MDU6SXNzdWU1ODQ4NjUyNDE=,3872,Operations resulting in np.timedelta64 are not properly coerced,33886395,closed,0,,,2,2020-03-20T06:14:30Z,2020-03-23T20:55:54Z,2020-03-23T20:55:53Z,NONE,,,,"It seems that operations that are resulting in `timedelta64` (for example `datetime64` arithmetic) are not properly coerced. In fact, the result of that operation is a xarray object that has a dt accessor being of type `xarray.core.accessor_dt.DatetimeAccessor` instead of the expected `xarray.core.accessor_dt.TimedeltaAccessor`

This follows the numpy documentation describing datetime arithmentic resulting in timedelta objects (http://lagrange.univ-lyon1.fr/docs/numpy/1.11.0/reference/arrays.datetime.html#datetime-and-timedelta-arithmetic)

#### MCVE Code Sample


```python
# this is a DataArray of type np.datetime64
da = xr.DataArray(
    data= pd.date_range('2020-01-01', '2020-01-30', freq='D')
)
# this simple arithmetic will result in np.timedelta64
delta = (da - np.datetime64('2020-01-01'))

type(delta.data[0])
# > numpy.timedelta64
type(delta.dt)
# > xarray.core.accessor_dt.DatetimeAccessor
```

#### Expected Output
```python
type(delta.dt)
# > xarray.core.accessor_dt.TimedeltaAccessor
```
#### Problem Description
Having a `.data` of type `timedelta64` would benefit from having a `.dt` accessor of type `TimedeltaAccessor`.
This would allow to represent such `timedelta64` using the relevant time units like `days`

#### Versions

<details><summary>Output of `xr.show_versions()`</summary>
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.10 | packaged by conda-forge | (default, Mar  5 2020, 10:05:08) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.12.14-95.32-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.14.1
pandas: 0.24.1
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.1.2
pydap: None
h5netcdf: 0.8.0
h5py: 2.9.0
Nio: None
zarr: 2.2.0
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.12.0
distributed: 2.12.0
matplotlib: 3.0.3
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 45.2.0.post20200209
pip: 20.0.2
conda: None
pytest: None
IPython: 7.13.0
sphinx: None


</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3872/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue