home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

34 rows where repo = 13221727 and user = 20629530 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, closed_at, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 25
  • pull 9

state 2

  • closed 25
  • open 9

repo 1

  • xarray · 34 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2075019328 PR_kwDOAMm_X85juCQ- 8603 Convert 360_day calendars by choosing random dates to drop or add aulemahal 20629530 closed 0     3 2024-01-10T19:13:31Z 2024-04-16T14:53:42Z 2024-04-16T14:53:42Z CONTRIBUTOR   0 pydata/xarray/pulls/8603
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

Small PR to add a new "method" to convert to and from 360_day calendars. The current two methods (chosen with the align_on keyword) will always remove or add the same day-of-year for all years of the same length.

This new option will randomly chose the days, one for each fifth of the year (72-days period). It emulates the method of the LOCA datasets (see web page and article ). February 29th is always removed/added when the source/target is a leap year.

I copied the implementation from xclim (which I wrote), see code here .

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8603/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1831975171 I_kwDOAMm_X85tMbkD 8039 Update assign_coords with a MultiIndex to match new Coordinates API aulemahal 20629530 closed 0     11 2023-08-01T20:22:41Z 2023-08-29T14:23:30Z 2023-08-29T14:23:30Z CONTRIBUTOR      

What is your issue?

A pattern we used in xclim (and elsewhere) seems to be broken on the master.

See MWE: ```python3 import pandas as pd import xarray as xr

da = xr.DataArray([1] * 730, coords={"time": xr.date_range('1900-01-01', periods=730, freq='D', calendar='noleap')}) mulind = pd.MultiIndex.from_arrays((da.time.dt.year.values, da.time.dt.dayofyear.values), names=('year', 'doy'))

Override previous time axis with new MultiIndex

da.assign_coords(time=mulind).unstack('time') ```

Now this works ok with both the current master and the latest release. However, if we chunk da, the last line now fails: python da.chunk(time=50).assign_coords(time=mulind).unstack('time') On the master, this gives: ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}

Full traceback:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[44], line 1 ----> 1 da.chunk(time=50).assign_coords(time=mulind).unstack("time") File ~/Projets/xarray/xarray/core/dataarray.py:2868, in DataArray.unstack(self, dim, fill_value, sparse) 2808 def unstack( 2809 self, 2810 dim: Dims = None, 2811 fill_value: Any = dtypes.NA, 2812 sparse: bool = False, 2813 ) -> DataArray: 2814 """ 2815 Unstack existing dimensions corresponding to MultiIndexes into 2816 multiple new dimensions. (...) 2866 DataArray.stack 2867 """ -> 2868 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse) 2869 return self._from_temp_dataset(ds) File ~/Projets/xarray/xarray/core/dataset.py:5481, in Dataset.unstack(self, dim, fill_value, sparse) 5479 for d in dims: 5480 if needs_full_reindex: -> 5481 result = result._unstack_full_reindex( 5482 d, stacked_indexes[d], fill_value, sparse 5483 ) 5484 else: 5485 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse) File ~/Projets/xarray/xarray/core/dataset.py:5365, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse) 5362 else: 5363 # TODO: we may depreciate implicit re-indexing with a pandas.MultiIndex 5364 xr_full_idx = PandasMultiIndex(full_idx, dim) -> 5365 indexers = Indexes( 5366 {k: xr_full_idx for k in index_vars}, 5367 xr_full_idx.create_variables(index_vars), 5368 ) 5369 obj = self._reindex( 5370 indexers, copy=False, fill_value=fill_value, sparse=sparse 5371 ) 5373 for name, var in obj.variables.items(): File ~/Projets/xarray/xarray/core/indexes.py:1435, in Indexes.__init__(self, indexes, variables, index_type) 1433 unmatched_keys = set(indexes) ^ set(variables) 1434 if unmatched_keys: -> 1435 raise ValueError( 1436 f"unmatched keys found in indexes and variables: {unmatched_keys}" 1437 ) 1439 if any(not isinstance(idx, index_type) for idx in indexes.values()): 1440 index_type_str = f"{index_type.__module__}.{index_type.__name__}" ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}

This seems related to PR #7368.

The reason for the title of this issue is that in both versions, I now realize the da.assign_coords(time=mulind) prints as: <xarray.DataArray (time: 730)> dask.array<xarray-<this-array>, shape=(730,), dtype=int64, chunksize=(50,), chunktype=numpy.ndarray> Coordinates: * time (time) object MultiIndex Something's fishy, because the two "sub" indexes are not showing.

And indeed, with the current master, I can get this to work by doing (again changing the last line): python da2 = xr.DataArray(da.data, coords=xr.Coordinates.from_pandas_multiindex(mulind, 'time')) da2.chunk(time=50).unstack('time') But it seems a bit odd to me that we need to reconstruct the DataArray to replace its coordinate with a "MultiIndex" one.

Thus, my questions are:

  1. How does one properly override a coordinate by a MultiIndex ? Is there a way to use assign_coords ? If not, then this issue would become a feature request.
  2. Is this a regression ? Or was I just "lucky" before ?
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8039/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1617939324 I_kwDOAMm_X85gb8t8 7602 Inconsistent coordinate attributes handling in apply_ufunc aulemahal 20629530 open 0     0 2023-03-09T20:19:25Z 2023-03-15T17:04:35Z   CONTRIBUTOR      

What happened?

When calling apply_ufunc with keep_attrs=False, the coordinate attributes are dropped only if there is more than one argument to the call.

What did you expect to happen?

I expected the behaviour to be the same, no matter the number of arguments.

I also expected the coordinate attributes to be preserved if that coordinate was appearing on only one argument.

Minimal Complete Verifiable Example

```Python import xarray as xr

def wrapper(ar1, ar2=None): return ar1.mean(axis=-1)

ds = xr.tutorial.open_dataset("air_temperature")

o1 = xr.apply_ufunc( wrapper, ds.air, ds.time, input_core_dims=[['time'], ['time']], keep_attrs=False ) print(o1.lat.attrs) # {}

o2 = xr.apply_ufunc( wrapper, ds.air, input_core_dims=[['time']], keep_attrs=False ) print(o2.lat.attrs) # {'standard_name': ... } ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

The behaviour stems from this if/else:

https://github.com/pydata/xarray/blob/6d771fc82228bdaf8a4b77d0ceec1cc444ebd090/xarray/core/computation.py#L252-L260

The upper part (1 arg) doesn't touch the attributes, but in the else (more than 1 arg) , two levels deeper in merge_coordinates_without_align , we have:

https://github.com/pydata/xarray/blob/6d771fc82228bdaf8a4b77d0ceec1cc444ebd090/xarray/core/merge.py#L283-L286

When apply_ufunc is called with keep_attrs=False, the combine_attrs above is "drop". In merge_attrs even though there is only one attribute dict passed for lat, it returns an empty dict.

My preference would be for keep_attrs to only refer to the data attributes and that coordinate attributes would be preserved, or even merged if needed. This was my expectation here, as this is the behaviour in many other places of xarray. For example : python3 with xr.set_options(keep_attrs=False): o = ds.air.mean('time') This drops attributes of air, but preserves those of lat and lon.

I see no easy way out here, except by handling it explicitly somewhere in apply_ufunc ? If the decision is that "untouched" coordinate attribute preservation is not ensured by xarray, I think it would be worth noting somewhere (but I don't know where). And I would change my codes to "manually" preserve those where appropriate.

Environment

INSTALLED VERSIONS ------------------ commit: 6d771fc82228bdaf8a4b77d0ceec1cc444ebd090 python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 6.1.11-100.fc36.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_CA.UTF-8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.2.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.10.1 netCDF4: 1.6.3 pydap: installed h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.13.6 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: 3.2.2 rasterio: 1.3.6 cfgrib: 0.9.10.3 iris: 3.4.1 bottleneck: 1.3.7 dask: 2023.3.0 distributed: 2023.3.0 matplotlib: 3.7.1 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: 0.2.2 fsspec: 2023.3.0 cupy: None pint: 0.20.1 sparse: 0.14.0 flox: 0.6.8 numpy_groupies: 0.9.20 setuptools: 67.6.0 pip: 23.0.1 conda: None pytest: 7.2.2 mypy: None IPython: 8.11.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7602/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1442443970 I_kwDOAMm_X85V-fLC 7275 REG: `nc_time_axis` not imported anymore aulemahal 20629530 closed 0     1 2022-11-09T17:02:59Z 2022-11-10T21:45:28Z 2022-11-10T21:45:28Z CONTRIBUTOR      

What happened?

With xarray 2022.11.0, plotting a DataArray with a cftime time axis fails.

It fails with a matplotlib error : TypeError: float() argument must be a string or a real number, not 'cftime._cftime.DatetimeNoLeap'

What did you expect to happen?

With previous versions of xarray, the nc_time_axis package was imported by xarray and these errors were avoided.

Minimal Complete Verifiable Example

Python import xarray as xr da = xr.DataArray( list(range(10)), dims=('time',), coords={'time': xr.cftime_range('1900-01-01', periods=10, calendar='noleap', freq='D')} ) da.plot()

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

TypeError Traceback (most recent call last) Cell In [1], line 7 1 import xarray as xr 2 da = xr.DataArray( 3 list(range(10)), 4 dims=('time',), 5 coords={'time': xr.cftime_range('1900-01-01', periods=10, calendar='noleap', freq='D')} 6 ) ----> 7 da.plot()

File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/xarray/plot/accessor.py:46, in DataArrayPlotAccessor.call(self, kwargs) 44 @functools.wraps(dataarray_plot.plot, assigned=("doc", "annotations")) 45 def call(self, kwargs) -> Any: ---> 46 return dataarray_plot.plot(self._da, **kwargs)

File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/xarray/plot/dataarray_plot.py:312, in plot(darray, row, col, col_wrap, ax, hue, subplot_kws, kwargs) 308 plotfunc = hist 310 kwargs["ax"] = ax --> 312 return plotfunc(darray, kwargs)

File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/xarray/plot/dataarray_plot.py:517, in line(darray, row, col, figsize, aspect, size, ax, hue, x, y, xincrease, yincrease, xscale, yscale, xticks, yticks, xlim, ylim, add_legend, _labels, args, kwargs) 513 ylabel = label_from_attrs(yplt, extra=y_suffix) 515 _ensure_plottable(xplt_val, yplt_val) --> 517 primitive = ax.plot(xplt_val, yplt_val, args, **kwargs) 519 if _labels: 520 if xlabel is not None:

File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/matplotlib/axes/_axes.py:1664, in Axes.plot(self, scalex, scaley, data, args, kwargs) 1662 lines = [self._get_lines(args, data=data, *kwargs)] 1663 for line in lines: -> 1664 self.add_line(line) 1665 if scalex: 1666 self._request_autoscale_view("x")

File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/matplotlib/axes/_base.py:2340, in _AxesBase.add_line(self, line) 2337 if line.get_clip_path() is None: 2338 line.set_clip_path(self.patch) -> 2340 self._update_line_limits(line) 2341 if not line.get_label(): 2342 line.set_label(f'_child{len(self._children)}')

File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/matplotlib/axes/_base.py:2363, in _AxesBase._update_line_limits(self, line) 2359 def _update_line_limits(self, line): 2360 """ 2361 Figures out the data limit of the given line, updating self.dataLim. 2362 """ -> 2363 path = line.get_path() 2364 if path.vertices.size == 0: 2365 return

File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/matplotlib/lines.py:1031, in Line2D.get_path(self) 1029 """Return the ~matplotlib.path.Path associated with this line.""" 1030 if self._invalidy or self._invalidx: -> 1031 self.recache() 1032 return self._path

File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/matplotlib/lines.py:659, in Line2D.recache(self, always) 657 if always or self._invalidx: 658 xconv = self.convert_xunits(self._xorig) --> 659 x = _to_unmasked_float_array(xconv).ravel() 660 else: 661 x = self._x

File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/matplotlib/cbook/init.py:1369, in _to_unmasked_float_array(x) 1367 return np.ma.asarray(x, float).filled(np.nan) 1368 else: -> 1369 return np.asarray(x, float)

TypeError: float() argument must be a string or a real number, not 'cftime._cftime.DatetimeNoLeap' ```

Anything else we need to know?

I suspect #7179.

This line: https://github.com/pydata/xarray/blob/cc7e09a3507fa342b3790b5c109e700fa12f0b17/xarray/plot/utils.py#L27 does not import nc_time_axis. Further down, the variable gets checked and if False an error is raised, but if the package still is not imported if True.

Previously we had: https://github.com/pydata/xarray/blob/fc9026b59d38146a21769cc2d3026a12d58af059/xarray/plot/utils.py#L27-L32 where the package is always imported.

Maybe there's a way to import nc_time_axis only when needed?

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 6.0.5-200.fc36.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_CA.UTF-8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2022.11.0 pandas: 1.5.1 numpy: 1.23.4 scipy: 1.8.1 netCDF4: 1.6.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.10.2 distributed: 2022.10.2 matplotlib: 3.6.2 cartopy: None seaborn: None numbagg: None fsspec: 2022.10.0 cupy: None pint: 0.20.1 sparse: None flox: None numpy_groupies: None setuptools: 65.5.1 pip: 22.3.1 conda: None pytest: 7.2.0 IPython: 8.6.0 sphinx: 5.3.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7275/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
991544027 MDExOlB1bGxSZXF1ZXN0NzI5OTkzMTE0 5781 Add encodings to save_mfdataset aulemahal 20629530 open 0     1 2021-09-08T21:24:13Z 2022-10-06T21:44:18Z   CONTRIBUTOR   0 pydata/xarray/pulls/5781
  • [ ] Closes #xxxx
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

Simply add a encodings argument to save_mfdataset. As for the other args, it expects a list of dictionaries, with encoding information to be passed to to_netcdf for each dataset. Added a minimal test, simply to see if the argument was taken into account.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5781/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1347026292 I_kwDOAMm_X85QSf10 6946 reset_index not resetting levels of MultiIndex aulemahal 20629530 closed 0 benbovy 4160723   3 2022-08-22T21:47:04Z 2022-09-27T10:35:39Z 2022-09-27T10:35:39Z CONTRIBUTOR      

What happened?

I'm not sure my usecase is the simplest way to demonstrate the issue, but let's try anyway.

I have a DataArray with two coordinates and I stack them into a new multi-index. I want to pass the levels of that new multi-index into a function, but as dask arrays. Turns out, it is not straightforward to chunk these variables because they act like IndexVariable objects and refuse to be chunked.

Thus, I reset the multi-index, drop it, but the variables still don't want to be chunked!

What did you expect to happen?

I expected the levels to be chunkable after the sequence : stack, reset_index.

Minimal Complete Verifiable Example

```Python import xarray as xr ds = xr.tutorial.open_dataset('air_temperature')

ds = ds.stack(spatial=['lon', 'lat']) ds = ds.reset_index('spatial', drop=True) # I don't think the drop is important here. lon_chunked = ds.lon.chunk() # woups, doesn't do anything!

type(ds.lon.variable) # xarray.core.variable.IndexVariable # I assumed either the stack or the reset_index would have modified this type into a normal variable. ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

Seems kinda related to the issues around reset_index. I thinks this is related to (but not a duplicate of) #4366.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.22.4 scipy: 1.9.0 netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: 2.12.0 cftime: 1.6.1 nc_time_axis: 1.4.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.8.0 distributed: 2022.8.0 matplotlib: 3.5.2 cartopy: 0.20.3 seaborn: None numbagg: None fsspec: 2022.7.1 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 63.4.2 pip: 22.2.2 conda: None pytest: None IPython: 8.4.0 sphinx: 5.1.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6946/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1235725650 I_kwDOAMm_X85Jp61S 6607 Coordinate promotion workaround broken aulemahal 20629530 closed 0 benbovy 4160723   4 2022-05-13T21:20:25Z 2022-09-27T09:33:41Z 2022-09-27T09:33:41Z CONTRIBUTOR      

What happened?

Ok so this one is a bit weird. I'm not sure this is a bug, but code that worked before doesn't anymore, so it is some sort of regression.

I have a dataset with one dimension and one coordinate along that one, but they have different names. I want to transform this so that the coordinate name becomes the dimension name so it becomes are proper dimension-coordinate (I don't know how to call it). After renaming the dim to the coord's name, it all looks good in the repr, but the coord still is missing an index for that dimension (crd.indexes is empty, see MCVE). There was a workaround through reset_coords for this, but it doesn't work anymore.

Instead, the last line of the MCVE downgrades the variable, the final lon doesn't have coords anymore.

What did you expect to happen?

In the MCVE below, I show what the old "workaround" was. I expected lon.indexes to contain the indexes lon at the end of the procedure.

Minimal Complete Verifiable Example

```Python import xarray as xr

A dataset with a 1d variable along a dimension

ds = xr.Dataset({'lon': xr.DataArray([1, 2, 3], dims=('x',))})

Promote to coord. This still is not a proper crd-dim (different name)

ds = ds.set_coords(['lon'])

Rename dim:

ds = ds.rename(x='lon')

Now do we have a proper coord-dim ? No. not yet because:

ds.indexes # is empty

Workaround that was used up to the last release

lon = ds.lon.reset_coords(drop=True)

Because of the missing indexes the next line fails on the master

lon - lon.diff('lon') ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

My guess is that this line is causing reset_coords to drop the coordinate from itself : https://github.com/pydata/xarray/blob/c34ef8a60227720724e90aa11a6266c0026a812a/xarray/core/dataarray.py#L866

It would be nice if the renaming was sufficient for the indexes to appear.

My example is weird I know. The real use case is a script where we receive a 2d coordinate but where all lines are the same, so we take the first line and promote it to a proper coord-dim. But the current code fails on the master on the lon - lon.diff('lon') step that happens afterwards.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:22:55) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.13.19-2-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.UTF-8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2022.3.1.dev104+gc34ef8a6 pandas: 1.4.2 numpy: 1.22.2 scipy: 1.8.0 netCDF4: None pydap: installed h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.02.1 distributed: 2022.2.1 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2022.3.0 cupy: None pint: None sparse: 0.13.0 setuptools: 59.8.0 pip: 22.0.3 conda: None pytest: 7.0.1 IPython: 8.3.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6607/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
906175200 MDExOlB1bGxSZXF1ZXN0NjU3MjA1NTM2 5402 `dt.to_pytimedelta` to allow arithmetic with cftime objects aulemahal 20629530 open 0     1 2021-05-28T22:48:50Z 2022-06-09T14:50:16Z   CONTRIBUTOR   0 pydata/xarray/pulls/5402
  • [ ] Closes #xxxx
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

When playing with cftime objects a problem I encountered many times is that I can sub two arrays and them add it back to another. Subtracting to cftime datetime arrays result in an array of np.timedelta64. And when trying to add it back to another cftime array, we get a UFuncTypeError because the two arrays have incompatible dtypes : '<m8[ns]' and 'O'.

Example: ```python import xarray as xr da = xr.DataArray(xr.cftime_range('1900-01-01', freq='D', periods=10), dims=('time',))

An array of timedelta64[ns]

dt = da - da[0]

da[-1] + dt # Fails ```

However, if the two arrays were of 'O' dtype, then the subtraction would be made by cftime which supports datetime.timedelta objects.

This solution here adds a to_pytimedelta to the TimedeltaAccessor, mirroring the name of the similar function on pd.Series.dt. It uses a monkeypatching workaround to prevent xarray to case the array back into numpy objects.

The user still has to check if the data is in cftime or numpy to adapt the operation (calling dt.to_pytimedelta or not), but custom workaround were always overly complicated for such a simple problem, this helps.

Also, this doesn't work with dask arrays because loading a dask array triggers the variable constructor and thus recasts the array of datetime.timedelta to numpy.timedelta[64].

I realize I maybe should have opened an issue before, but I had this idea and it all rushed along.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5402/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1237552666 I_kwDOAMm_X85Jw44a 6613 Flox can't handle cftime objects aulemahal 20629530 closed 0     2 2022-05-16T18:35:56Z 2022-06-02T23:23:20Z 2022-06-02T23:23:20Z CONTRIBUTOR      

What happened?

I use resampling to count the number of timesteps within time periods. So the simple way is to : da.time.resample(time='YS').count(). With the current master, a non-standard calendar and with floxinstalled, this fails : flox can't handle the cftime objects of the time coordinate.

What did you expect to happen?

I expected the count of elements for each period to be returned.

Minimal Complete Verifiable Example

```Python import xarray as xr

timeNP = xr.DataArray(xr.date_range('2009-01-01', '2012-12-31', use_cftime=False), dims=('time',), name='time')

timeCF = xr.DataArray(xr.date_range('2009-01-01', '2012-12-31', use_cftime=True), dims=('time',), name='time')

timeNP.resample(time='YS').count() # works

timeCF.resample(time='YS').count() # Fails ```

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [x] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

TypeError Traceback (most recent call last) Input In [3], in <cell line: 1>() ----> 1 a.resample(time='YS').count()

File ~/Python/myxarray/xarray/core/_reductions.py:5456, in DataArrayResampleReductions.count(self, dim, keep_attrs, kwargs) 5401 """ 5402 Reduce this DataArray's data by applying count along some dimension(s). 5403 (...) 5453 * time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31 5454 """ 5455 if flox and OPTIONS["use_flox"] and contains_only_dask_or_numpy(self._obj): -> 5456 return self._flox_reduce( 5457 func="count", 5458 dim=dim, 5459 # fill_value=fill_value, 5460 keep_attrs=keep_attrs, 5461 kwargs, 5462 ) 5463 else: 5464 return self.reduce( 5465 duck_array_ops.count, 5466 dim=dim, 5467 keep_attrs=keep_attrs, 5468 **kwargs, 5469 )

File ~/Python/myxarray/xarray/core/resample.py:44, in Resample._flox_reduce(self, dim, kwargs) 41 labels = np.repeat(self._unique_coord.data, repeats) 42 group = DataArray(labels, dims=(self._group_dim,), name=self._unique_coord.name) ---> 44 result = super()._flox_reduce(dim=dim, group=group, kwargs) 45 result = self._maybe_restore_empty_groups(result) 46 result = result.rename({RESAMPLE_DIM: self._group_dim})

File ~/Python/myxarray/xarray/core/groupby.py:661, in GroupBy._flox_reduce(self, dim, kwargs) 658 expected_groups = (self._unique_coord.values,) 659 isbin = False --> 661 result = xarray_reduce( 662 self._original_obj.drop_vars(non_numeric), 663 group, 664 dim=dim, 665 expected_groups=expected_groups, 666 isbin=isbin, 667 kwargs, 668 ) 670 # Ignore error when the groupby reduction is effectively 671 # a reduction of the underlying dataset 672 result = result.drop_vars(unindexed_dims, errors="ignore")

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/xarray.py:308, in xarray_reduce(obj, func, expected_groups, isbin, sort, dim, split_out, fill_value, method, engine, keep_attrs, skipna, min_count, reindex, by, finalize_kwargs) 305 input_core_dims = _get_input_core_dims(group_names, dim, ds, grouper_dims) 306 input_core_dims += [input_core_dims[-1]] * (len(by) - 1) --> 308 actual = xr.apply_ufunc( 309 wrapper, 310 ds.drop_vars(tuple(missing_dim)).transpose(..., grouper_dims), 311 *by, 312 input_core_dims=input_core_dims, 313 # for xarray's test_groupby_duplicate_coordinate_labels 314 exclude_dims=set(dim), 315 output_core_dims=[group_names], 316 dask="allowed", 317 dask_gufunc_kwargs=dict(output_sizes=group_sizes), 318 keep_attrs=keep_attrs, 319 kwargs={ 320 "func": func, 321 "axis": axis, 322 "sort": sort, 323 "split_out": split_out, 324 "fill_value": fill_value, 325 "method": method, 326 "min_count": min_count, 327 "skipna": skipna, 328 "engine": engine, 329 "reindex": reindex, 330 "expected_groups": tuple(expected_groups), 331 "isbin": isbin, 332 "finalize_kwargs": finalize_kwargs, 333 }, 334 ) 336 # restore non-dim coord variables without the core dimension 337 # TODO: shouldn't apply_ufunc handle this? 338 for var in set(ds.variables) - set(ds.dims):

File ~/Python/myxarray/xarray/core/computation.py:1170, in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, args) 1168 # feed datasets apply_variable_ufunc through apply_dataset_vfunc 1169 elif any(is_dict_like(a) for a in args): -> 1170 return apply_dataset_vfunc( 1171 variables_vfunc, 1172 args, 1173 signature=signature, 1174 join=join, 1175 exclude_dims=exclude_dims, 1176 dataset_join=dataset_join, 1177 fill_value=dataset_fill_value, 1178 keep_attrs=keep_attrs, 1179 ) 1180 # feed DataArray apply_variable_ufunc through apply_dataarray_vfunc 1181 elif any(isinstance(a, DataArray) for a in args):

File ~/Python/myxarray/xarray/core/computation.py:460, in apply_dataset_vfunc(func, signature, join, dataset_join, fill_value, exclude_dims, keep_attrs, args) 455 list_of_coords, list_of_indexes = build_output_coords_and_indexes( 456 args, signature, exclude_dims, combine_attrs=keep_attrs 457 ) 458 args = [getattr(arg, "data_vars", arg) for arg in args] --> 460 result_vars = apply_dict_of_variables_vfunc( 461 func, args, signature=signature, join=dataset_join, fill_value=fill_value 462 ) 464 if signature.num_outputs > 1: 465 out = tuple( 466 _fast_dataset(*args) 467 for args in zip(result_vars, list_of_coords, list_of_indexes) 468 )

File ~/Python/myxarray/xarray/core/computation.py:402, in apply_dict_of_variables_vfunc(func, signature, join, fill_value, args) 400 result_vars = {} 401 for name, variable_args in zip(names, grouped_by_name): --> 402 result_vars[name] = func(variable_args) 404 if signature.num_outputs > 1: 405 return _unpack_dict_tuples(result_vars, signature.num_outputs)

File ~/Python/myxarray/xarray/core/computation.py:750, in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, args) 745 if vectorize: 746 func = _vectorize( 747 func, signature, output_dtypes=output_dtypes, exclude_dims=exclude_dims 748 ) --> 750 result_data = func(input_data) 752 if signature.num_outputs == 1: 753 result_data = (result_data,)

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/xarray.py:291, in xarray_reduce.<locals>.wrapper(array, func, skipna, by, kwargs) 288 if "nan" not in func and func not in ["all", "any", "count"]: 289 func = f"nan{func}" --> 291 result, groups = groupby_reduce(array, by, func=func, *kwargs) 292 return result

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/core.py:1553, in groupby_reduce(array, func, expected_groups, sort, isbin, axis, fill_value, min_count, split_out, method, engine, reindex, finalize_kwargs, by) 1550 agg = _initialize_aggregation(func, array.dtype, fill_value, min_count, finalize_kwargs) 1552 if not has_dask: -> 1553 results = _reduce_blockwise( 1554 array, by, agg, expected_groups=expected_groups, reindex=reindex, *kwargs 1555 ) 1556 groups = (results["groups"],) 1557 result = results[agg.name]

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/core.py:1008, in _reduce_blockwise(array, by, agg, axis, expected_groups, fill_value, engine, sort, reindex) 1005 finalize_kwargs = (finalize_kwargs,) 1006 finalize_kwargs = finalize_kwargs + ({},) + ({},) -> 1008 results = chunk_reduce( 1009 array, 1010 by, 1011 func=agg.numpy, 1012 axis=axis, 1013 expected_groups=expected_groups, 1014 # This fill_value should only apply to groups that only contain NaN observations 1015 # BUT there is funkiness when axis is a subset of all possible values 1016 # (see below) 1017 fill_value=agg.fill_value["numpy"], 1018 dtype=agg.dtype["numpy"], 1019 kwargs=finalize_kwargs, 1020 engine=engine, 1021 sort=sort, 1022 reindex=reindex, 1023 ) # type: ignore 1025 if _is_arg_reduction(agg): 1026 results["intermediates"][0] = np.unravel_index(results["intermediates"][0], array.shape)[-1]

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/core.py:677, in chunk_reduce(array, by, func, expected_groups, axis, fill_value, dtype, reindex, engine, kwargs, sort) 675 result = reduction(group_idx, array, kwargs) 676 else: --> 677 result = generic_aggregate( 678 group_idx, array, axis=-1, engine=engine, func=reduction, kwargs 679 ).astype(dt, copy=False) 680 if np.any(props.nanmask): 681 # remove NaN group label which should be last 682 result = result[..., :-1]

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/aggregations.py:49, in generic_aggregate(group_idx, array, engine, func, axis, size, fill_value, dtype, kwargs) 44 else: 45 raise ValueError( 46 f"Expected engine to be one of ['flox', 'numpy', 'numba']. Received {engine} instead." 47 ) ---> 49 return method( 50 group_idx, array, axis=axis, size=size, fill_value=fill_value, dtype=dtype, kwargs 51 )

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/aggregate_flox.py:86, in nanlen(group_idx, array, args, kwargs) 85 def nanlen(group_idx, array, args, kwargs): ---> 86 return sum(group_idx, (~np.isnan(array)).astype(int), *args, kwargs)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' ```

Anything else we need to know?

I was able to resolve this by modifying xarray.core.utils.contains_only_dask_or_numpy as to return False if the input's dtype is 'O'. This check seems to only be used when choosing between flox and the old algos. Does this make sense?

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.17.5-arch1-2 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 2022.3.1.dev16+g3ead17ea pandas: 1.4.2 numpy: 1.21.6 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.0 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2022.04.1 distributed: 2022.4.1 matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.07.0 cupy: None pint: 0.18 sparse: None flox: 0.5.1 numpy_groupies: 0.9.16 setuptools: 57.4.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 8.2.0 sphinx: 4.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6613/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1242388766 I_kwDOAMm_X85KDVke 6623 Cftime arrays not supported by polyval aulemahal 20629530 closed 0     1 2022-05-19T22:19:14Z 2022-05-31T17:16:04Z 2022-05-31T17:16:04Z CONTRIBUTOR      

What happened?

I was trying to use polyval with a cftime coordinate and it failed with TypeError: unsupported operand type(s) for *: 'float' and 'cftime._cftime.DatetimeNoLeap'. The error seems to originate from #6548, where the process transforming coordinates to numerical values was modified. The new _ensure_numeric method seems to ignore the possibility of cftime arrays.

What did you expect to happen?

A polynomial to be evaluated along my coordinate.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

use_cftime=False will work

t = xr.date_range('2001-01-01', periods=100, use_cftime=True, freq='YS') da = xr.DataArray(np.arange(100) ** 3, dims=('time',), coords={'time': t}) coeffs = da.polyfit('time', 4) da2 = xr.polyval(da.time, coeffs).polyfit_coefficients ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

TypeError Traceback (most recent call last) Input In [5], in <cell line: 4>() 2 da = xr.DataArray(np.arange(100) ** 3, dims=('time',), coords={'time': t}) 3 coeffs = da.polyfit('time', 4) ----> 4 da2 = xr.polyval(da.time, coeffs).polyfit_coefficients

File ~/Python/xarray/xarray/core/computation.py:1931, in polyval(coord, coeffs, degree_dim) 1929 res = zeros_like(coord) + coeffs.isel({degree_dim: max_deg}, drop=True) 1930 for deg in range(max_deg - 1, -1, -1): -> 1931 res *= coord 1932 res += coeffs.isel({degree_dim: deg}, drop=True) 1934 return res

File ~/Python/xarray/xarray/core/_typed_ops.py:103, in DatasetOpsMixin.imul(self, other) 102 def imul(self, other): --> 103 return self._inplace_binary_op(other, operator.imul)

File ~/Python/xarray/xarray/core/dataset.py:6107, in Dataset._inplace_binary_op(self, other, f) 6105 other = other.reindex_like(self, copy=False) 6106 g = ops.inplace_to_noninplace_op(f) -> 6107 ds = self._calculate_binary_op(g, other, inplace=True) 6108 self._replace_with_new_dims( 6109 ds._variables, 6110 ds._coord_names, (...) 6113 inplace=True, 6114 ) 6115 return self

File ~/Python/xarray/xarray/core/dataset.py:6154, in Dataset._calculate_binary_op(self, f, other, join, inplace) 6152 else: 6153 other_variable = getattr(other, "variable", other) -> 6154 new_vars = {k: f(self.variables[k], other_variable) for k in self.data_vars} 6155 ds._variables.update(new_vars) 6156 ds._dims = calculate_dimensions(ds._variables)

File ~/Python/xarray/xarray/core/dataset.py:6154, in <dictcomp>(.0) 6152 else: 6153 other_variable = getattr(other, "variable", other) -> 6154 new_vars = {k: f(self.variables[k], other_variable) for k in self.data_vars} 6155 ds._variables.update(new_vars) 6156 ds._dims = calculate_dimensions(ds._variables)

File ~/Python/xarray/xarray/core/_typed_ops.py:402, in VariableOpsMixin.mul(self, other) 401 def mul(self, other): --> 402 return self._binary_op(other, operator.mul)

File ~/Python/xarray/xarray/core/variable.py:2494, in Variable._binary_op(self, other, f, reflexive) 2491 attrs = self._attrs if keep_attrs else None 2492 with np.errstate(all="ignore"): 2493 new_data = ( -> 2494 f(self_data, other_data) if not reflexive else f(other_data, self_data) 2495 ) 2496 result = Variable(dims, new_data, attrs=attrs) 2497 return result

TypeError: unsupported operand type(s) for *: 'float' and 'cftime._cftime.DatetimeGregorian' ```

Anything else we need to know?

I also noticed that since the Horner PR, polyfit and polyval do not use the same function to convert coordinates into numerical values. Isn't this dangerous?

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:38:57) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.1.dev267+gd711d58 pandas: 1.4.2 numpy: 1.21.6 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: 1.4.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.4 dask: 2022.04.1 distributed: 2022.4.1 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: None numbagg: None fsspec: 2022.3.0 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.0 numpy_groupies: 0.9.15 setuptools: 62.1.0 pip: 22.0.4 conda: None pytest: None IPython: 8.2.0 sphinx: 4.5.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6623/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1237587122 I_kwDOAMm_X85JxBSy 6615 Flox grouping does not cast bool to int in summation aulemahal 20629530 closed 0     0 2022-05-16T19:06:45Z 2022-05-17T02:24:32Z 2022-05-17T02:24:32Z CONTRIBUTOR      

What happened?

In my codes I used the implicit cast from bool to int that xarray/numpy perform for certain operations. This is the case for sum. A resampling sum on a boolean array actually returns the number of True values and not the OR of all values.

However, when flox is activated, it does return the OR of all values. Digging a bit, I see that the flox aggregation uses np.add and not np.sum. So, this may in fact be an issue for flox? It felt the xarray devs should know about this potential regression anyway.

What did you expect to happen?

I expected a sum of boolean to actually be the count of True values.

Minimal Complete Verifiable Example

```Python import xarray as xr

ds = xr.tutorial.open_dataset("air_temperature")

Count the monthly number of 6-hour periods with tas over 300K

with xr.set_options(use_flox=False): # this works as expected outOLD = (ds.air > 300).resample(time='MS').sum()

with xr.set_options(use_flox=True): # this doesn't fail, but return True or False : # the OR and not the expected sum. outFLOX = (ds.air > 300).resample(time='MS').sum() ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I wrote a quick test for basic operations and sum seems the only really problematic one. prod does return a different dtype, but the values are not impacted.

for op in ['any', 'all', 'count', 'sum', 'prod', 'mean', 'var', 'std', 'max', 'min']: with xr.set_options(use_flox=False): outO = getattr((ds.air > 300).resample(time='YS'), op)() with xr.set_options(use_flox=True): outF = getattr((ds.air > 300).resample(time='YS'), op)() print(op, outO.dtype, outF.dtype, outO.equals(outF))) returns any bool bool True all bool bool True count int64 int64 True sum int64 bool False prod int64 bool True mean float64 float64 True var float64 float64 True std float64 float64 True max bool bool True min bool bool True

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.17.5-arch1-2 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 2022.3.1.dev16+g3ead17ea pandas: 1.4.2 numpy: 1.21.6 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.0 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2022.04.1 distributed: 2022.4.1 matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.07.0 cupy: None pint: 0.18 sparse: None flox: 0.5.1 numpy_groupies: 0.9.16 setuptools: 57.4.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 8.2.0 sphinx: 4.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6615/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
969079775 MDU6SXNzdWU5NjkwNzk3NzU= 5701 Performance issues using map_blocks with cftime indexes. aulemahal 20629530 open 0     1 2021-08-12T15:47:29Z 2022-04-19T02:44:37Z   CONTRIBUTOR      

What happened: When using map_blocks on an object that is dask-backed and has a CFTimeIndex coordinate, the construction step (not computation done) is very slow. I've seen up to 100x slower than an equivalent object with a numpy datetime index.

What you expected to happen: I would understand a performance difference since numpy/pandas objects are usually more optimized than cftime/xarray objects, but the difference is quite large here.

Minimal Complete Verifiable Example:

Here is a MCVE that I ran in a jupyter notebook. Performance is basically measured by execution time (wall time). I included the current workaround I have for my usecase.

```python import numpy as np import pandas as pd import xarray as xr import dask.array as da from dask.distributed import Client

c = Client(n_workers=1, threads_per_worker=8)

Test Data

Nt = 10_000 Nx = Ny = 100 chks = (Nt, 10, 10)

A = xr.DataArray( da.zeros((Nt, Ny, Nx), chunks=chks), dims=('time', 'y', 'x'), coords={'time': pd.date_range('1900-01-01', freq='D', periods=Nt), 'x': np.arange(Nx), 'y': np.arange(Ny) }, name='data' )

Copy of a, but with a cftime coordinate

B = A.copy() B['time'] = xr.cftime_range('1900-01-01', freq='D', periods=Nt, calendar='noleap')

A dumb function to apply

def func(data): return data + data

Test 1 : numpy-backed time coordinate

%time outA = A.map_blocks(func, template=A) # %time outA.load();

Res on my machine:

CPU times: user 130 ms, sys: 6.87 ms, total: 136 ms

Wall time: 127 ms

CPU times: user 3.01 s, sys: 8.09 s, total: 11.1 s

Wall time: 13.4 s

Test 2 : cftime-backed time coordinate

%time outB = B.map_blocks(func, template=B) %time outB.load();

Res on my machine

CPU times: user 4.42 s, sys: 219 ms, total: 4.64 s

Wall time: 4.48 s

CPU times: user 13.2 s, sys: 3.1 s, total: 16.3 s

Wall time: 26 s

Workaround in my code

def func_cf(data): data['time'] = xr.decode_cf(data.coords.to_dataset()).time return data + data

def map_blocks_cf(func, data): data2 = data.copy() data2['time'] = xr.conventions.encode_cf_variable(data.time) return data2.map_blocks(func, template=data)

Test 3 : cftime time coordinate with encoding-decoding

%time outB2 = map_blocks_cf(func_cf, B) %time outB2.load();

Res

CPU times: user 536 ms, sys: 10.5 ms, total: 546 ms

Wall time: 528 ms

CPU times: user 9.57 s, sys: 2.23 s, total: 11.8 s

Wall time: 21.7 s

```

Anything else we need to know?: After exploration I found 2 culprits for this slowness. I used %%prun to profile the construction phase of map_blocks and found that in the second case (cftime time coordinate):

  1. In map_blocks calls to dask.base.tokenize take the most time. Precisely, tokenizing a numpy ndarray of O dtype goes through the pickling process of the array. This is already quite slow and cftime objects take even more time to pickle. See Unidata/cftime#253 for the corresponding issue. Most of the construction phase execution time is spent pickling the same datetime array at least once per chunk.
  2. Second, but only significant when the time coordinate is very large (55000 in my use case). CFTimeIndex.__new__ is called more than twice as many times as there are chunks. And within the object creation there is this line : https://github.com/pydata/xarray/blob/3956b73a7792f41e4410349f2c40b9a9a80decd2/xarray/coding/cftimeindex.py#L228 The larger the array, the more time is spent in this iteration. Changing the example above to use Nt = 50_000, the code spent a total of 25 s in dask.base.tokenize calls and 5 s in CFTimeIndex.__new__ calls.

My workaround is not the best, but it was easy to code without touching xarray. The encoding of the time coordinate changes it to an integer array, which is super fast to tokenize. And the speed up of the construction phase is because there is only one call to encode_cf_variable compared to N_chunks calls to the pickling, As shown above, I have not seen a slowdown in the computation phase. I think this is mostly because the added decode_cf calls are done in parallel, but there might be other reason I do not understand.

I do not know for sure how/why this tokenization works, but I guess the best improvment in xarray could be to: - Look into the inputs of map_blocks and spot cftime-backed coordinates - Convert those coordinates to a ndarray of a basic dtype. - At the moment of tokenization of the time coordinates, do a switheroo and pass the converted arrays instead.

I have no idea if that would work, but if it does that would be the best speed-up I think.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-514.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.1.dev18+g4bb9d9c.d20210810 pandas: 1.3.1 numpy: 1.21.1 scipy: 1.7.0 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.07.1 distributed: 2021.07.1 matplotlib: 3.4.2 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.2.1 conda: None pytest: None IPython: 7.25.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5701/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1175454678 I_kwDOAMm_X85GEAPW 6393 DataArray groupby returning Dataset broken in some cases aulemahal 20629530 closed 0     1 2022-03-21T14:17:25Z 2022-03-21T15:26:20Z 2022-03-21T15:26:20Z CONTRIBUTOR      

What happened?

This is a the reverse problem of #6379, the DataArrayGroupBy._combine method seems broken when the mapped function returns a Dataset (which worked before #5692).

What did you expect to happen?

No response

Minimal Complete Verifiable Example

```Python import xarray as xr

ds = xr.tutorial.open_dataset("air_temperature")

ds.air.resample(time="YS").map(lambda grp: grp.mean("time").to_dataset()) ```

Relevant log output

```Python

TypeError Traceback (most recent call last) Input In [3], in <module> ----> 1 ds.air.resample(time="YS").map(lambda grp: grp.mean("time").to_dataset())

File ~/Python/myxarray/xarray/core/resample.py:223, in DataArrayResample.map(self, func, shortcut, args, kwargs) 180 """Apply a function to each array in the group and concatenate them 181 together into a new array. 182 (...) 219 The result of splitting, applying and combining this array. 220 """ 221 # TODO: the argument order for Resample doesn't match that for its parent, 222 # GroupBy --> 223 combined = super().map(func, shortcut=shortcut, args=args, kwargs) 225 # If the aggregation function didn't drop the original resampling 226 # dimension, then we need to do so before we can rename the proxy 227 # dimension we used. 228 if self._dim in combined.coords:

File ~/Python/myxarray/xarray/core/groupby.py:835, in DataArrayGroupByBase.map(self, func, shortcut, args, kwargs) 833 grouped = self._iter_grouped_shortcut() if shortcut else self._iter_grouped() 834 applied = (maybe_wrap_array(arr, func(arr, *args, kwargs)) for arr in grouped) --> 835 return self._combine(applied, shortcut=shortcut)

File ~/Python/myxarray/xarray/core/groupby.py:869, in DataArrayGroupByBase._combine(self, applied, shortcut) 867 index, index_vars = create_default_index_implicit(coord) 868 indexes = {k: index for k in index_vars} --> 869 combined = combined._overwrite_indexes(indexes, coords=index_vars) 870 combined = self._maybe_restore_empty_groups(combined) 871 combined = self._maybe_unstack(combined)

TypeError: _overwrite_indexes() got an unexpected keyword argument 'coords' ```

Anything else we need to know?

I guess the same solution as #6386 could be used!

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.16.13-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 2022.3.1.dev16+g3ead17ea pandas: 1.4.0 numpy: 1.20.3 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.0 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.08.0 distributed: 2021.08.0 matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.07.0 cupy: None pint: 0.18 sparse: None setuptools: 57.4.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 8.0.1 sphinx: 4.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6393/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1173980959 I_kwDOAMm_X85F-Ycf 6379 Dataset groupby returning DataArray broken in some cases aulemahal 20629530 closed 0     1 2022-03-18T20:07:37Z 2022-03-20T18:55:26Z 2022-03-20T18:55:26Z CONTRIBUTOR      

What happened?

Got a TypeError when resampling a dataset along a dimension, mapping a function to each group. The function returns a DataArray.

Failed with : TypeError: _overwrite_indexes() got an unexpected keyword argument 'variables'

What did you expect to happen?

This worked before the merging of #5692. A DataArray was returned as expected.

Minimal Complete Verifiable Example

```Python import xarray as xr

ds = xr.tutorial.open_dataset("air_temperature")

ds.resample(time="YS").map(lambda grp: grp.air.mean("time")) ```

Relevant log output

```Python

TypeError Traceback (most recent call last) Input In [37], in <module> ----> 1 ds.resample(time="YS").map(lambda grp: grp.air.mean("time"))

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/xarray/core/resample.py:300, in DatasetResample.map(self, func, args, shortcut, kwargs) 298 # ignore shortcut if set (for now) 299 applied = (func(ds, *args, kwargs) for ds in self._iter_grouped()) --> 300 combined = self._combine(applied) 302 return combined.rename({self._resample_dim: self._dim})

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/xarray/core/groupby.py:999, in DatasetGroupByBase._combine(self, applied) 997 index, index_vars = create_default_index_implicit(coord) 998 indexes = {k: index for k in index_vars} --> 999 combined = combined._overwrite_indexes(indexes, variables=index_vars) 1000 combined = self._maybe_restore_empty_groups(combined) 1001 combined = self._maybe_unstack(combined)

TypeError: _overwrite_indexes() got an unexpected keyword argument 'variables' ```

Anything else we need to know?

In the docstring of DatasetGroupBy.map it is not made clear that the passed function should return a dataset, but the opposite is also not said. This worked before and I think the issues comes from #5692, which introduced different signatures for DataArray._overwrite_indexes (which is called in my case) and Dataset._overwrite_indexes (which is expected by the new _combine).

If the function passed to Dataset.resample(...).map should only return Datasets then I believe a more explicit error is needed, as well as some notice in the docs and a breaking change entry in the changelog. If DataArrays should be accepted, then we have a regression here.

I may have time to help on this.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.16.13-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 2022.3.1.dev16+g3ead17ea pandas: 1.4.0 numpy: 1.20.3 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.0 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.08.0 distributed: 2021.08.0 matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.07.0 cupy: None pint: 0.18 sparse: None setuptools: 57.4.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 8.0.1 sphinx: 4.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6379/reactions",
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  completed xarray 13221727 issue
1173997225 I_kwDOAMm_X85F-cap 6380 Attributes of concatenation coordinate are dropped aulemahal 20629530 closed 0     1 2022-03-18T20:31:17Z 2022-03-20T18:53:46Z 2022-03-20T18:53:46Z CONTRIBUTOR      

What happened?

When concatenating two objects with xr.concat along a new dimension given through a DataArray, the attributes of this given coordinate are lost in the concatenation.

What did you expect to happen?

I expected the concatenation coordinate to be identical to the 1D DataArray I gave to concat.

Minimal Complete Verifiable Example

```Python import xarray as xr

ds = xr.tutorial.open_dataset("air_temperature")

concat_dim = xr.DataArray([1, 2], dims=("condim",), attrs={"an_attr": "yep"}, name="condim")

out = xr.concat([ds, ds], concat_dim) out.condim.attrs ```

Before #5692, I get: {'an_attr': 'yep'} with the current master, I get: {}

Anything else we need to know?

I'm not 100% sure, but I think the change is due to xr.core.concat._calc_concat_dim_coord being replaced by xr.core.concat.__calc_concat_dim_index. The former didn't touch the concatenation coordinate, while the latter casts it as an index, thus dropping the attributes in the process.

If the solution is to add a check in xr.concat, I may have time to implement something simple.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.16.13-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 2022.3.1.dev16+g3ead17ea pandas: 1.4.0 numpy: 1.20.3 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.0 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.08.0 distributed: 2021.08.0 matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.07.0 cupy: None pint: 0.18 sparse: None setuptools: 57.4.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 8.0.1 sphinx: 4.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6380/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
870312451 MDExOlB1bGxSZXF1ZXN0NjI1NTMwMDQ2 5233 Calendar utilities aulemahal 20629530 closed 0     16 2021-04-28T20:01:33Z 2021-12-30T22:54:49Z 2021-12-30T22:54:11Z CONTRIBUTOR   0 pydata/xarray/pulls/5233
  • [x] Closes #5155
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst

So:

  • Added coding.cftime_offsets.date_range and coding.cftime_offsets.date_range_like The first simply swtiches between pd.date_range and xarray.cftime_range according to the arguments. The second infers start, end and freq from an existing datetime array and returns a similar range in another calendar.

  • Added coding/calendar_ops.py with convert_calendar and interp_calendar Didn't know where to put them, so there they are.

  • Added DataArray.dt.calendar. When the datetime objects are backed by numpy, it always return "proleptic_gregorian".

I'm not sure where to expose the function. Should the range-generators be accessible directly like xr.date_range?

The convert_calendar and interp_calendar could be implemented as methods of DataArray and Dataset, should I do that?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5233/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
857947050 MDU6SXNzdWU4NTc5NDcwNTA= 5155 Calendar utilities aulemahal 20629530 closed 0     9 2021-04-14T14:18:48Z 2021-12-30T22:54:11Z 2021-12-30T22:54:11Z CONTRIBUTOR      

Is your feature request related to a problem? Please describe. Handling cftime and numpy time coordinates can sometimes be exhausting. Here I am thinking of the following common problems:

  1. Querying the calendar type from a time coordinate.
  2. Converting a dataset from a calendar type to another.
  3. Generating a time coordinate in the correct calendar.

Describe the solution you'd like

  1. ds.time.dt.calendar would be magic.
  2. xr.convert_calendar(ds, "new_cal") could be nice?
  3. xr.date_range(start, stop, calendar=cal), same as pandas' (see context below).

Describe alternatives you've considered We have implemented all this in (xclim)[https://xclim.readthedocs.io/en/stable/api.html#calendar-handling-utilities] (and more). But it seems to make sense that some of the simplest things there could move to xarray? We had this discussion in xarray-contrib/cf-xarray#193 and suggestion was made to see what fits here before implementing this there.

Additional context At xclim, to differentiate numpy datetime64 from cftime types, we call the former "default". This way a time coordinate using cftime's "proleptic_gregorian" calendar is distinct from one using numpy's datetime64.

  1. is easy (xclim function). If the datatype is numpy return "default", if cftime, look into the first non-null value and get the calendar.
  2. xclim function The calendar type of each time element is transformed to the new calendar. Our way is to drop any dates that do not exist in the new calendar (like Feb 29th when going to "noleap"). In the other direction, there is an option to either fill with some fill value of simply not include them. It can't be a DataArray method, but could be a Dataset one, or simply a top-level function. Related to #5107.

We also have an interp_calendar function that reinterps data on a yearly basis. This is a bit narrower, because it only makes sense on daily data (or coarser).

  1. With the definition of a "default" calendar, date_range and date_range_like simply chose between pd.date_range and xr.cftime_range according to the target calendar.

What do you think? I have time to move whatever code makes sense to move.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5155/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1035607476 I_kwDOAMm_X849uh20 5897 ds.mean bugs with cftime objects aulemahal 20629530 open 0     1 2021-10-25T21:55:12Z 2021-10-27T14:51:07Z   CONTRIBUTOR      

What happened: Given a dataset that has a variable with cftime objects along dimension A, averaging (mean) leads to buggy behaviour:

  1. Averaging over 'A' drops the variable instead of averaging it.
  2. Averaging over any other dimension will fail if that variable is on the dask backend.

What you expected to happen:

  1. I expected the average to fail in the case of a dask-backed cftime variable, given that this code exists: https://github.com/pydata/xarray/blob/fdabf3bea5c750939a4a2ae60f80ed34a6aebd58/xarray/core/duck_array_ops.py#L562-L572

And I expected the average to work (not drop the var) in the case of the numpy backend.

  1. I expected the fact that dask is used to be irrelevant to the result. I expected the mean to conserve the cftime variable as-is since it doesn't include the averaged dimension.

Minimal Complete Verifiable Example:

```python

Put your MCVE code here

import xarray as xr

ds = xr.Dataset({ 'var1': (('time',), xr.cftime_range('2021-10-31', periods=10, freq='D')), 'var2': (('x',), list(range(10))) })

var1 contains cftime objects

var2 contains integers

They do not share dims

ds.mean('time') # var1 has disappeared instead of being averaged

ds.mean('x') # Everything ok

dsc = ds.chunk({})

dsc.mean('time') # var1 has disappeared. I would expected this line to fail.

dsc.mean('x') # Raises NotImplementedError. I would expect this line to run flawlessly. ```

Anything else we need to know?: A culprit is #5393, but maybe the bug is older? I think the change introduced there causes the issue (2) above.

In duck_array_ops.py the mean operation is declared numeric_only, which is kinda incoherent with the implementation allowing means of datetime objects. This setting causes my (1) above.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: fdabf3bea5c750939a4a2ae60f80ed34a6aebd58 python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.14.12-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.19.1.dev89+gfdabf3be pandas: 1.3.4 numpy: 1.21.3 scipy: 1.7.1 netCDF4: 1.5.7 pydap: installed h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.1 cftime: 1.5.1 nc_time_axis: 1.4.0 PseudoNetCDF: installed rasterio: 1.2.10 cfgrib: 0.9.9.1 iris: 3.1.0 bottleneck: 1.3.2 dask: 2021.10.0 distributed: 2021.10.0 matplotlib: 3.4.3 cartopy: 0.20.1 seaborn: 0.11.2 numbagg: 0.2.1 fsspec: 2021.10.1 cupy: None pint: 0.17 sparse: 0.13.0 setuptools: 58.2.0 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.28.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5897/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
932898075 MDU6SXNzdWU5MzI4OTgwNzU= 5551 cftime 1.5.0 changes behviour upon pickling : breaks get_clean_interp_index with a dask distributed scheduler. aulemahal 20629530 open 0     2 2021-06-29T16:29:45Z 2021-09-24T20:55:27Z   CONTRIBUTOR      

What happened:

Quite a specific bug! Using map_blocks to wrap a polyfit computation, using a dask client (not the local scheduler) and a time axis with a cftime calendar, I got the error : TypeError: cannot compute the time difference between dates with different calendars.

What you expected to happen:

No bug.

Minimal Complete Verifiable Example:

```python ds = xr.tutorial.open_dataset('rasm').chunk({'x': 25, 'y': 25})

templ = ds.Tair

def func(ds, verbose=False): # Dummy function that call get_clean_interp_index function # Return the Tair as-is just for the test. if verbose: print(ds.time) print(type(ds.time[0].item())) x = xr.core.missing.get_clean_interp_index(ds, 'time')

return ds.Tair

This works (time is a coordinate, so it is already loaded

x = xr.core.missing.get_clean_interp_index(ds, 'time')

This works too. The local scheduler is used.

out = ds.map_blocks( func, template=templ, kwargs={'verbose': False} ) out.load()

This fails!

with Client(n_workers=1, threads_per_worker=8, dashboard_address=8786, memory_limit='7GB') as c: out = ds.map_blocks( func, template=templ, kwargs={'verbose': True} ) out.load() ```

The full traceback is here:

```python --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-12-de89288ffcd5> in <module> 27 kwargs={'verbose': True} 28 ) ---> 29 out.load() /exec/pbourg/.conda/x38/lib/python3.9/site-packages/xarray/core/dataarray.py in load(self, **kwargs) 883 dask.compute 884 """ --> 885 ds = self._to_temp_dataset().load(**kwargs) 886 new = self._from_temp_dataset(ds) 887 self._variable = new._variable /exec/pbourg/.conda/x38/lib/python3.9/site-packages/xarray/core/dataset.py in load(self, **kwargs) 848 849 # evaluate all the dask arrays simultaneously --> 850 evaluated_data = da.compute(*lazy_data.values(), **kwargs) 851 852 for k, data in zip(lazy_data, evaluated_data): /exec/pbourg/.conda/x38/lib/python3.9/site-packages/dask/base.py in compute(*args, **kwargs) 565 postcomputes.append(x.__dask_postcompute__()) 566 --> 567 results = schedule(dsk, keys, **kwargs) 568 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) 569 /exec/pbourg/.conda/x38/lib/python3.9/site-packages/distributed/client.py in get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs) 2705 should_rejoin = False 2706 try: -> 2707 results = self.gather(packed, asynchronous=asynchronous, direct=direct) 2708 finally: 2709 for f in futures.values(): /exec/pbourg/.conda/x38/lib/python3.9/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous) 2019 else: 2020 local_worker = None -> 2021 return self.sync( 2022 self._gather, 2023 futures, /exec/pbourg/.conda/x38/lib/python3.9/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs) 860 return future 861 else: --> 862 return sync( 863 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs 864 ) /exec/pbourg/.conda/x38/lib/python3.9/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs) 336 if error[0]: 337 typ, exc, tb = error[0] --> 338 raise exc.with_traceback(tb) 339 else: 340 return result[0] /exec/pbourg/.conda/x38/lib/python3.9/site-packages/distributed/utils.py in f() 319 if callback_timeout is not None: 320 future = asyncio.wait_for(future, callback_timeout) --> 321 result[0] = yield future 322 except Exception: 323 error[0] = sys.exc_info() /exec/pbourg/.conda/x38/lib/python3.9/site-packages/tornado/gen.py in run(self) 760 761 try: --> 762 value = future.result() 763 except Exception: 764 exc_info = sys.exc_info() /exec/pbourg/.conda/x38/lib/python3.9/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker) 1884 exc = CancelledError(key) 1885 else: -> 1886 raise exception.with_traceback(traceback) 1887 raise exc 1888 if errors == "skip": /exec/pbourg/.conda/x38/lib/python3.9/site-packages/xarray/core/parallel.py in _wrapper() 284 ] 285 --> 286 result = func(*converted_args, **kwargs) 287 288 # check all dims are present <ipython-input-12-de89288ffcd5> in func() 8 print(ds.time) 9 print(type(ds.time[0].item())) ---> 10 x = xr.core.missing.get_clean_interp_index(ds, 'time') 11 12 return ds.Tair /exec/pbourg/.conda/x38/lib/python3.9/site-packages/xarray/core/missing.py in get_clean_interp_index() 276 index = index.values 277 index = Variable( --> 278 data=datetime_to_numeric(index, offset=offset, datetime_unit="ns"), 279 dims=(dim,), 280 ) /exec/pbourg/.conda/x38/lib/python3.9/site-packages/xarray/core/duck_array_ops.py in datetime_to_numeric() 462 # For np.datetime64, this can silently yield garbage due to overflow. 463 # One option is to enforce 1970-01-01 as the universal offset. --> 464 array = array - offset 465 466 # Scalar is converted to 0d-array src/cftime/_cftime.pyx in cftime._cftime.datetime.__sub__() TypeError: cannot compute the time difference between dates with different calendars ```

The printout to the console. I am calling this in a jupyter notebook so the prints from within workers are in the console, not in the cell's output. I removed useless lines to shorten it.

``` <xarray.DataArray 'time' (time: 36)> array([cftime.datetime(1980, 9, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True), cftime.datetime(1980, 10, 17, 0, 0, 0, 0, calendar='noleap', has_year_zero=True), cftime.datetime(1980, 11, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True), .... )], dtype=object) Coordinates: * time (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00 Attributes: long_name: time type_preferred: int

<class 'cftime._cftime.datetime'> ```

And for reference: ```python

ds.time array([cftime.DatetimeNoLeap(1980, 9, 16, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(1980, 10, 17, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(1980, 11, 16, 12, 0, 0, 0, has_year_zero=True), ... type(ds.time[0].item()) cftime._cftime.DatetimeNoLeap ``` Anything else we need to know?:

I'm not sure where the exact breaking change lies (dask or cftime?), but this worked with dask 2021.5 and cftime <= 1.4.1. The problem lies in get_clean_interp_index, specifically these lines:

https://github.com/pydata/xarray/blob/5ccb06951cecd59b890c1457e36ee3c2030a67aa/xarray/core/missing.py#L274-L280

On the original dataset, the class of the time values is DatetimeNoLeap whereas the time coordinates received by func are of class datetime, the calendar is only a kwargs. Thus, in get_clean_interp_index the offset is created with the default "standard" calendar and becomes incompatible with the array itself. Which makes datetime_to_numeric fail.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.5 | packaged by conda-forge | (default, Jun 19 2021, 00:32:32) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-514.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.18.2 pandas: 1.2.5 numpy: 1.21.0 scipy: 1.7.0 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.06.2 distributed: 2021.06.2 matplotlib: 3.4.2 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.1.3 conda: None pytest: None IPython: 7.25.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5551/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
895842334 MDU6SXNzdWU4OTU4NDIzMzQ= 5346 Fast-track unstack doesn't work with dask aulemahal 20629530 closed 0     6 2021-05-19T20:14:26Z 2021-05-26T07:07:17Z 2021-05-26T07:07:17Z CONTRIBUTOR      

What happened: Using unstack on data with the dask backend fails with a dask error.

What you expected to happen: No failure, as with xarray 0.18.0 and earlier.

Minimal Complete Verifiable Example:

```python import pandas as pd import xarray as xr

da = xr.DataArray([1] * 4, dims=('x',), coords={'x': [1, 2, 3, 4]}) dac = da.chunk()

ind = pd.MultiIndex.from_arrays(([0, 0, 1, 1], [0, 1, 0, 1]), names=("y", "z")) dac.assign_coords(x=ind).unstack("x")") Fails with:python


NotImplementedError Traceback (most recent call last) <ipython-input-4-3c317738ec05> in <module> 3 4 ind = pd.MultiIndex.from_arrays(([0, 0, 1, 1], [0, 1, 0, 1]), names=("y", "z")) ----> 5 dac.assign_coords(x=ind).unstack("x")

~/Python/myxarray/xarray/core/dataarray.py in unstack(self, dim, fill_value, sparse) 2133 DataArray.stack 2134 """ -> 2135 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse) 2136 return self._from_temp_dataset(ds) 2137

~/Python/myxarray/xarray/core/dataset.py in unstack(self, dim, fill_value, sparse) 4038 ): 4039 # Fast unstacking path: -> 4040 result = result._unstack_once(dim, fill_value) 4041 else: 4042 # Slower unstacking path, examples of array types that

~/Python/myxarray/xarray/core/dataset.py in unstack_once(self, dim, fill_value) 3914 fill_value = fill_value 3915 -> 3916 variables[name] = var.unstack_once( 3917 index=index, dim=dim, fill_value=fill_value 3918 )

~/Python/myxarray/xarray/core/variable.py in _unstack_once(self, index, dim, fill_value) 1605 # sparse doesn't support item assigment, 1606 # https://github.com/pydata/sparse/issues/114 -> 1607 data[(..., *indexer)] = reordered 1608 1609 return self._replace(dims=new_dims, data=data)

~/.conda/envs/xxx/lib/python3.8/site-packages/dask/array/core.py in setitem(self, key, value) 1693 1694 out = "setitem-" + tokenize(self, key, value) -> 1695 dsk = setitem_array(out, self, key, value) 1696 1697 graph = HighLevelGraph.from_collections(out, dsk, dependencies=[self])

~/.conda/envs/xxx/lib/python3.8/site-packages/dask/array/slicing.py in setitem_array(out_name, array, indices, value) 1787 1788 # Reformat input indices -> 1789 indices, indices_shape, reverse = parse_assignment_indices(indices, array_shape) 1790 1791 # Empty slices can only be assigned size 1 values

~/.conda/envs/xxx/lib/python3.8/site-packages/dask/array/slicing.py in parse_assignment_indices(indices, shape) 1476 n_lists += 1 1477 if n_lists > 1: -> 1478 raise NotImplementedError( 1479 "dask is currently limited to at most one " 1480 "dimension's assignment index being a "

NotImplementedError: dask is currently limited to at most one dimension's assignment index being a 1-d array of integers or booleans. Got: (Ellipsis, array([0, 0, 1, 1], dtype=int8), array([0, 1, 0, 1], dtype=int8)) ``` The example works when I go back to xarray 0.18.0.

Anything else we need to know?: I saw no tests in "test_daraarray.py" and "test_dataset.py" for unstack+dask, but they might be elsewhere? If #5315 was successful, maybe there is something specific in my example and config that is causing the error? @max-sixty @Illviljan

Proposed test, for "test_dataset.py", adapted copy of test_unstack: python @requires_dask def test_unstack_dask(self): index = pd.MultiIndex.from_product([[0, 1], ["a", "b"]], names=["x", "y"]) ds = Dataset({"b": ("z", [0, 1, 2, 3]), "z": index}).chunk() expected = Dataset( {"b": (("x", "y"), [[0, 1], [2, 3]]), "x": [0, 1], "y": ["a", "b"]} ) for dim in ["z", ["z"], None]: actual = ds.unstack(dim).load() assert_identical(actual, expected)

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.11.16-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.18.2.dev2+g6d2a7301 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.3 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.8.1 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: installed rasterio: 1.2.2 cfgrib: 0.9.9.0 iris: 2.4.0 bottleneck: 1.3.2 dask: 2021.05.0 distributed: 2021.05.0 matplotlib: 3.4.1 cartopy: 0.19.0 seaborn: 0.11.1 numbagg: installed pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.1 conda: None pytest: 6.2.3 IPython: 7.22.0 sphinx: 3.5.4
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5346/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
893692903 MDU6SXNzdWU4OTM2OTI5MDM= 5326 map_blocks doesn't handle tranposed arrays aulemahal 20629530 closed 0     7 2021-05-17T20:34:58Z 2021-05-18T14:14:37Z 2021-05-18T14:14:37Z CONTRIBUTOR      

What happened:

I was using map_blocks for a complex function which returns an array with a different dimension order than the input. Because of the complexity of the wrapped func, I need to generate a template first.

When calling map_blocks and loading the result, it passes all checks in map_blocks but Variable fails when assigning the new data.

What you expected to happen: I expected no failure. Either the result would have transposed dimensions, or it would have been transposed back to fit with template.

Minimal Complete Verifiable Example:

```python import xarray as xr

da = xr.DataArray([[0, 1, 2], [3, 4, 5]], dims=('x', 'y'))

def func(d): return d.transpose()

dac = da.chunk() dac.map_blocks(func, template=dac).load() ```

Traceback: ```python


ValueError Traceback (most recent call last) <ipython-input-1-0da1b18a36a8> in <module> 7 8 dac = da.chunk() ----> 9 dac.map_blocks(func, template=dac).load()

~/.conda/envs/xclim/lib/python3.8/site-packages/xarray/core/dataarray.py in load(self, kwargs) 871 dask.compute 872 """ --> 873 ds = self._to_temp_dataset().load(kwargs) 874 new = self._from_temp_dataset(ds) 875 self._variable = new._variable

~/.conda/envs/xclim/lib/python3.8/site-packages/xarray/core/dataset.py in load(self, **kwargs) 799 800 for k, data in zip(lazy_data, evaluated_data): --> 801 self.variables[k].data = data 802 803 # load everything else sequentially

~/.conda/envs/xclim/lib/python3.8/site-packages/xarray/core/variable.py in data(self, data) 378 data = as_compatible_data(data) 379 if data.shape != self.shape: --> 380 raise ValueError( 381 f"replacement data must match the Variable's shape. " 382 f"replacement data has shape {data.shape}; Variable has shape {self.shape}"

ValueError: replacement data must match the Variable's shape. replacement data has shape (3, 2); Variable has shape (2, 3) `` Iffuncis made to returnd` (no transpose), the code works.

I actually not sure which behaviour would be the best : a result with transposed dimensions to fit with the wrapped func or to tranpose the result to fit with the template. The latter seems much easier to implement by editing core/parallel.py and add the transposition at the end of _wrapper() in map_blocks().

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.11.16-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.17.1.dev99+gc58e2aeb.d20210430 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.3 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.8.1 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: installed rasterio: 1.2.2 cfgrib: 0.9.9.0 iris: 2.4.0 bottleneck: 1.3.2 dask: 2021.04.0 distributed: 2021.04.1 matplotlib: 3.4.1 cartopy: 0.19.0 seaborn: 0.11.1 numbagg: installed pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.1 conda: None pytest: 6.2.3 IPython: 7.22.0 sphinx: 3.5.4
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5326/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
830256796 MDU6SXNzdWU4MzAyNTY3OTY= 5026 Datetime accessor fails on cftime arrays with missing values aulemahal 20629530 open 0     0 2021-03-12T16:05:43Z 2021-04-19T02:41:00Z   CONTRIBUTOR      

What happened: I have a computation that output dates but that sometimes also outputs missing data. (It computes the start date of a run in a timeseries, if there is no run, it outputs NaN). Afterwards, I'd like to convert those dates to dayofyear, thus I call out.dt.dayofyear. In a case where the first value of out is missing, it fails.

What you expected to happen: I expected out.dt.dayofyear to return an array where the first value would be NaN.

Minimal Complete Verifiable Example:

```python import xarray as xr import numpy as np

da = xr.DataArray( [[np.nan, np.nan], [1, 2]], dims=('x', 'time'), coords={'x': [1, 2], 'time': xr.cftime_range('2000-01-01', periods=2)}, )

out is a "object" array, where the first element is NaN

out = da.idxmin('time')

out.dt.dayofyear

Expected : [nan, 1.]

Got:


TypeError Traceback (most recent call last) <ipython-input-56-06aa9bdfd6b8> in <module> ----> 1 da.idxmin('time').dt.dayofyear

~/.conda/envs/xclim-dev/lib/python3.8/site-packages/xarray/core/utils.py in get(self, obj, cls) 917 return self._accessor 918 --> 919 return self._accessor(obj) 920 921

~/.conda/envs/xclim-dev/lib/python3.8/site-packages/xarray/core/accessor_dt.py in new(cls, obj) 514 # do all the validation here. 515 if not _contains_datetime_like_objects(obj): --> 516 raise TypeError( 517 "'.dt' accessor only available for " 518 "DataArray with datetime64 timedelta64 dtype or "

TypeError: '.dt' accessor only available for DataArray with datetime64 timedelta64 dtype or for arrays containing cftime datetime objects. ```

Anything else we need to know?: This also triggers computation when da is lazy. A lazy .dt accessor would be useful.

The laziness of it aside, would it be meaningful to change: https://github.com/pydata/xarray/blob/d4b7a608bab0e7c140937b0b59ca45115d205145/xarray/core/common.py#L1822 to cycle on the array while np.isnan(sample) ?

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.16-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: fr_CA.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.4.0 cftime: 1.3.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.2.1 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.12.0 distributed: 2.20.0 matplotlib: 3.3.4 cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: 5.4.3 IPython: 7.21.0 sphinx: 3.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5026/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
824917345 MDU6SXNzdWU4MjQ5MTczNDU= 5010 DataArrays inside apply_ufunc with dask=parallelized aulemahal 20629530 closed 0     3 2021-03-08T20:19:41Z 2021-03-08T20:37:15Z 2021-03-08T20:35:01Z CONTRIBUTOR      

Is your feature request related to a problem? Please describe. Currently, when using apply_ufunc with dask=parallelized the wrapped function receives numpy arrays upon computation.
Some xarray operations generate enormous amount of chunks (best example : da.groupby('time.dayofyear'), so any complex script using dask ends up with huge task graphs. Dask's scheduler becomes overloaded, sometimes even hangs, sometimes uses way more RAM than its workers.

Describe the solution you'd like I'd want to profit from both the tools of xarray and the power of dask parallelization. I'd like to be able to do something like this:

```python3 def func(da): """Example of an operation not (easily) possible with numpy.""" return da.groupby('time').mean()

xr.apply_ufunc( da, func, input_core_dims=[['time']], pass_xr=True, dask='parallelized' ) `` I'd like the wrapped func to receive DataArrays resembling the inputs (named dims, coords and all), but only with the subset of that dask chunk. Doing this, the whole function gets parallelized : dask only sees 1 task and I can code using xarray. Depending on the implementation, it might be less efficient thandask=allowed` for small dataset, but I think this could be beneficial for long and complex computations on large datasets.

Describe alternatives you've considered The alternative is to reduce the size of the datasets (looping on other dimensions), but that defeats the purpose of dask.

Another alternative I am currently testing, is to add a layer between apply_ufunc and the func. That layer reconstruct a DataArray and deconstructs it before returning the result, so xarray/dask only passing by. If this works and is elegant enough, I can maybe suggest an implementation within xarray.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5010/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
625123449 MDU6SXNzdWU2MjUxMjM0NDk= 4097 CFTime offsets missing for milli- and micro-seconds aulemahal 20629530 closed 0     0 2020-05-26T19:13:37Z 2021-02-10T21:44:26Z 2021-02-10T21:44:25Z CONTRIBUTOR      

The smallest cftime offset defined in xarray.coding.cftime_offsets.py is "second" (S), but the precision of cftime objects goes down to the millisecond (L) and microsecond (U). They should be easily added.

PR #4033 adds a xr.infer_freq that supports the two, but they are currently untested as xr.cftime_range cannot generate an index.

MCVE Code Sample

python xr.cftime_range("2000-01-01", periods=3, freq='10L')

Expected Output

CFTimeIndex([2000-01-01 00:00:00, 2000-01-01 00:00:00.010000, 2000-01-01 00:00:00.020000], dtype='object')

Problem Description

An error gets raised : ValueError: Invalid frequency string provided.

Versions

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 08:20:52) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.6.13-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: fr_CA.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.15.2.dev9+g6378a711.d20200505 pandas: 1.0.3 numpy: 1.18.4 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.1.1.2 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.16.0 distributed: 2.16.0 matplotlib: 3.2.1 cartopy: None seaborn: None numbagg: None pint: 0.11 setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: 5.4.1 IPython: 7.13.0 sphinx: 3.0.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4097/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
798592803 MDU6SXNzdWU3OTg1OTI4MDM= 4853 cftime default's datetime breaks CFTimeIndex aulemahal 20629530 open 0     4 2021-02-01T18:14:21Z 2021-02-05T18:48:31Z   CONTRIBUTOR      

What happened: With cftime 1.2.0, one can create datetime object with cftime.datetime(*args, calendar='calendar'), instead of using one of the subclasses (ex cftime.DatetimeNoLeap(*args)). In the latest release (1.4.0, yesterday), the subclasses have been deprecated, but kept as legacy. While all xr code still works (it is using the legacy subclasses), the CFTimeIndex object relies on the type of the datetime object in order to infer the calendar. If the datetime was created outside xarray, using the now default constructor, the returned type is not understood and CFTimeIndexbreaks.

What you expected to happen: I expected CFTimeIndex to be independent of the way the datetime object is created.

Minimal Complete Verifiable Example:

```python3 import cftime import numpy as np import xarray as xr

A datetime array, not created in xarray

time = cftime.num2date(np.arange(365), "days since 2000-01-01", calendar="noleap") a = xr.DataArray(np.zeros(365), dims=('time',), coords={'time': time})

a.indexes['time'] Fails with :python3 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 342, in repr attrs_str = format_attrs(self) File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 264, in format_attrs attrs["freq"] = f"'{index.freq}'" if len(index) >= 3 else None File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 692, in freq return infer_freq(self) File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/frequencies.py", line 96, in infer_freq inferer = _CFTimeFrequencyInferer(index) File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/frequencies.py", line 105, in init self.values = index.asi8 File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 673, in asi8 [ File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 674, in <listcomp> _total_microseconds(exact_cftime_datetime_difference(epoch, date)) File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/core/resample_cftime.py", line 370, in exact_cftime_datetime_difference seconds = b.replace(microsecond=0) - a.replace(microsecond=0) File "src/cftime/_cftime.pyx", line 1153, in cftime._cftime.datetime.sub ValueError: cannot compute the time difference between dates with different calendars ``` Anything else we need to know?:

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Jul 31 2020, 02:39:48) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.10.11-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: fr_CA.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.20.0 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.4.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.01.1 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 46.1.3 pip: 20.1 conda: None pytest: 6.2.2 IPython: 7.19.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4853/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
709272776 MDU6SXNzdWU3MDkyNzI3NzY= 4463 Interpolation with multiple mutlidimensional arrays sharing dims fails aulemahal 20629530 open 0     5 2020-09-25T20:45:54Z 2020-09-28T19:10:56Z   CONTRIBUTOR      

What happened: When trying to interpolate a N-D array with 2 other arrays sharing a common (new) dimension and with one (at least) being multidimensional fails. Kinda a complex edge case I agree. Here's a MWE: python3 da = xr.DataArray([[[1, 2, 3], [2, 3, 4]], [[1, 2, 3], [2, 3, 4]]], dims=('t', 'x', 'y'), coords={'x': [1, 2], 'y': [1, 2, 3], 't': [10, 12]}) dy = xr.DataArray([1.5, 2.5], dims=('u',), coords={'u': [45, 55]}) dx = xr.DataArray([[1.5, 1.5], [1.5, 1.5]], dims=('t', 'u'), coords={'u': [45, 55], 't': [10, 12]}) So we have da a 3D array with dims (t, x, y). We have dy, containing the values of y along new dimension u. And dx containing the values of x along both u and t. We want to interpolate with: python3 out = da.interp(y=dy, x=dx, method='linear') As so to have a new array over dims t and u.

What you expected to happen: I expected (with the dummy data I gave): python3 xr.DataArray([[2, 3], [2, 3]], dims=('t', 'u'), coords={'u': [45, 55], 't': [10, 12]})

But instead it fails with ValueError: axes don't match array.

Full traceback:

```python3 --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-19-b968c6f3dae9> in <module> ----> 1 a.interp(y=y, x=x, method='linear') ~/Python/xarray/xarray/core/dataarray.py in interp(self, coords, method, assume_sorted, kwargs, **coords_kwargs) 1473 "Given {}.".format(self.dtype) 1474 ) -> 1475 ds = self._to_temp_dataset().interp( 1476 coords, 1477 method=method, ~/Python/xarray/xarray/core/dataset.py in interp(self, coords, method, assume_sorted, kwargs, **coords_kwargs) 2691 if k in var.dims 2692 } -> 2693 variables[name] = missing.interp(var, var_indexers, method, **kwargs) 2694 elif all(d not in indexers for d in var.dims): 2695 # keep unrelated object array ~/Python/xarray/xarray/core/missing.py in interp(var, indexes_coords, method, **kwargs) 652 else: 653 out_dims.add(d) --> 654 result = result.transpose(*tuple(out_dims)) 655 return result 656 ~/Python/xarray/xarray/core/variable.py in transpose(self, *dims) 1395 return self.copy(deep=False) 1396 -> 1397 data = as_indexable(self._data).transpose(axes) 1398 return type(self)(dims, data, self._attrs, self._encoding, fastpath=True) 1399 ~/Python/xarray/xarray/core/indexing.py in transpose(self, order) 1288 1289 def transpose(self, order): -> 1290 return self.array.transpose(order) 1291 1292 def __getitem__(self, key): ValueError: axes don't match array ```

Anything else we need to know?: It works if dx doesn't vary along t. I .e.: da.interp(y=dy, x=dx.isel(t=0, drop=True), method='linear') works.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Jul 31 2020, 02:39:48) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-514.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.16.2.dev9+gc0399d3 pandas: 1.0.3 numpy: 1.18.4 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.1.3 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.4 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.17.2 distributed: 2.23.0 matplotlib: 3.3.1 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.11 setuptools: 49.6.0.post20200814 pip: 20.2.2 conda: None pytest: None IPython: 7.17.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4463/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
617704483 MDU6SXNzdWU2MTc3MDQ0ODM= 4058 Interpolating with a common chunked dim fails aulemahal 20629530 closed 0     2 2020-05-13T19:38:48Z 2020-09-24T16:50:11Z 2020-09-24T16:50:10Z CONTRIBUTOR      

Interpolating a dataarray with another one fails if one of them is a dask array and they share a chunked dimension. Even if the interpolation is independent of that dimension.

MCVE Code Sample

```python import xarray as xr import numpy as np

g = xr.DataArray(np.zeros((10, 10)), dims=('x', 'c'), coords={k: np.arange(10) for k in ['x', 'c']}) b = xr.DataArray([5, 6.6, 8.8], dims=('new',)).expand_dims(c=g.c) gc = g.chunk({'c': 1}) gc.interp(x=b) ```

Expected Output

An array with coords "new" and "c", with values of g interpolated along x at positions in b, for each c. As there is no interpolation along c, I would expect the fact that it is chunked to be irrelevant.

Problem Description

Raises: NotImplementedError: Chunking along the dimension to be interpolated (1) is not yet supported.;

I didn't see any issue about this, so I thought it ought to be noted as a needed enhancement.

Versions

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.2 | packaged by conda-forge | (default, Apr 16 2020, 18:04:51) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-514.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.15.2.dev42+g0cd14a5 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.1.1.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.14.0 distributed: 2.14.0 matplotlib: 3.2.1 cartopy: 0.17.0 seaborn: None numbagg: None pint: 0.11 setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: 5.4.1 IPython: 7.13.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4058/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
650044968 MDExOlB1bGxSZXF1ZXN0NDQzNjEwOTI2 4193 Fix polyfit fail on deficient rank aulemahal 20629530 closed 0     5 2020-07-02T16:00:21Z 2020-08-20T14:20:43Z 2020-08-20T08:34:45Z CONTRIBUTOR   0 pydata/xarray/pulls/4193
  • [x] Closes #4190
  • [x] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

Fixes #4190. In cases where the input matrix had a deficient rank (matrix rank != order) because of the number of NaN values, polyfit would fail, simply because numpy's lstsq returned an empty array for the residuals (instead of a size 1 array). This fixes the problem by catching the case and returning np.nan instead.

The other point in the issue was that RankWarning is also not raised in that case. That was due to the fact that da.polyfit was computing the rank from the coordinate (Vandermonde) matrix, instead of the masked data. Thus, is a given line has too many NaN values, its deficient rank was not detected. I added a test and warning at all places where a rank is computed (5 different lines). Also, to match np.polyfit behaviour of no warning when full=True, I changed the warning filters using a context manager, ignoring the RankWarning in that case. Overall, it feels a bi ugly because of the duplicated code and it will print the warning for every line of an array that has a deficient rank, which can be a lot...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4193/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
635542241 MDExOlB1bGxSZXF1ZXN0NDMxODg5NjQ0 4135 Correct dask handling for 1D idxmax/min on ND data aulemahal 20629530 closed 0     1 2020-06-09T15:36:09Z 2020-06-25T16:09:59Z 2020-06-25T03:59:52Z CONTRIBUTOR   0 pydata/xarray/pulls/4135
  • [x] Closes #4123
  • [x] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

Based on comments on dask/dask#3096, I fixed the dask indexing error that occurred when idxmax/idxmin were called on ND data (where N > 2). Added tests are very simplistic, I believe the 1D and 2D tests already cover most cases, I just wanted to test that is was indeed working on ND data, assuming that non-dask data was already treated properly.

I believe this doesn't conflict with #3936.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4135/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
631681216 MDU6SXNzdWU2MzE2ODEyMTY= 4123 idxmax/idxmin not working with dask arrays of more than 2 dims. aulemahal 20629530 closed 0     0 2020-06-05T15:19:41Z 2020-06-25T03:59:52Z 2020-06-25T03:59:51Z CONTRIBUTOR      

In opposition to argmin/argmax, idxmax/idxmin fails on DataArrays of more than 2 dimensions, when the data is stored in dask arrays.

MCVE Code Sample

```python

Your code here

import xarray as xr ds = xr.tutorial.open_dataset('air_temperature').resample(time='D').mean() dsc = ds.chunk({'time':-1, 'lat': 5, 'lon': 5}) dsc.air.argmax('time').values # Works (I added .values to be sure all computation is done) dsc.air.idxmin('time') # Fails ```

Expected Output

Something like: <xarray.DataArray 'time' (lat: 25, lon: 53)> dask.array<where, shape=(25, 53), dtype=datetime64[ns], chunksize=(5, 5), chunktype=numpy.ndarray> Coordinates: * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0

Problem Description

Throws an error: ```


NotImplementedError Traceback (most recent call last) <ipython-input-11-0b9bf50bc3ab> in <module> 3 dsc = ds.chunk({'time':-1, 'lat': 5, 'lon': 5}) 4 dsc.air.argmax('time').values ----> 5 dsc.air.idxmin('time')

~/Python/myxarray/xarray/core/dataarray.py in idxmin(self, dim, skipna, fill_value, keep_attrs) 3626 * y (y) int64 -1 0 1 3627 """ -> 3628 return computation._calc_idxminmax( 3629 array=self, 3630 func=lambda x, args, kwargs: x.argmin(args, **kwargs),

~/Python/myxarray/xarray/core/computation.py in _calc_idxminmax(array, func, dim, skipna, fill_value, keep_attrs) 1564 chunks = dict(zip(array.dims, array.chunks)) 1565 dask_coord = dask.array.from_array(array[dim].data, chunks=chunks[dim]) -> 1566 res = indx.copy(data=dask_coord[(indx.data,)]) 1567 # we need to attach back the dim name 1568 res.name = dim

~/.conda/envs/xarray-xclim-dev/lib/python3.8/site-packages/dask/array/core.py in getitem(self, index) 1539 1540 if any(isinstance(i, Array) and i.dtype.kind in "iu" for i in index2): -> 1541 self, index2 = slice_with_int_dask_array(self, index2) 1542 if any(isinstance(i, Array) and i.dtype == bool for i in index2): 1543 self, index2 = slice_with_bool_dask_array(self, index2)

~/.conda/envs/xarray-xclim-dev/lib/python3.8/site-packages/dask/array/slicing.py in slice_with_int_dask_array(x, index) 934 out_index.append(slice(None)) 935 else: --> 936 raise NotImplementedError( 937 "Slicing with dask.array of ints only permitted when " 938 "the indexer has zero or one dimensions"

NotImplementedError: Slicing with dask.array of ints only permitted when the indexer has zero or one dimensions ```

I saw #3922 and thought this PR was aiming to make this work, so I'm a bit confused.

(I tested with dask 2.17.2 also and it still fails)

Versions

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 08:20:52) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.6.15-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: fr_CA.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.15.2.dev9+g6378a711.d20200505 pandas: 1.0.3 numpy: 1.18.4 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.1.1.2 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.16.0 distributed: 2.17.0 matplotlib: 3.2.1 cartopy: None seaborn: None numbagg: None pint: 0.12 setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: 5.4.1 IPython: 7.13.0 sphinx: 3.0.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4123/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
607616849 MDU6SXNzdWU2MDc2MTY4NDk= 4009 Incoherencies between docs in open_mfdataset and combine_by_coords and its behaviour. aulemahal 20629530 closed 0     2 2020-04-27T14:55:33Z 2020-06-24T18:22:19Z 2020-06-24T18:22:19Z CONTRIBUTOR      

PR #3877 adds nice control over the attrs of the ouput, but there are some incoherencies in the docs and the behaviour that break previously fine code.

MCVE Code Sample

python import xarray as xr out = xr.open_mfdataset('/files/with/*_conflicting_attrs.nc', combine='by_coords')

Expected Output

out having the attributes from the first file in the sorted glob list.

Problem Description

Fails with a MergeError .

In the doc of open_mfdataset it is said: attrs_file : str or pathlib.Path, optional Path of the file used to read global attributes from. By default global attributes are read from the first file provided, with wildcard matches sorted by filename. But in the code, open_mfdataset calls combine_by_coords without specifying its combine_attrs argument, which defaults to 'no_conflicts', instead of the expected 'override' or 'drop'. The attributes are anyway managed by open_mfdataset further down, but in the case of conflicts the code never reaches that point.

Also, in the doc of combine_by_coords the wrong default is specified: ``` combine_attrs : {'drop', 'identical', 'no_conflicts', 'override'}, default 'drop' String indicating how to combine attrs of the objects being merged:

    - 'drop': empty attrs on returned Dataset.
    - 'identical': all attrs must be the same on every object.
    - 'no_conflicts': attrs from all objects are combined, any that have
      the same name must also have the same value.
    - 'override': skip comparing and copy attrs from the first dataset to
      the result.

```

I think we expect either combine_by_coords to have 'drop' as the default or open_mfdataset to pass combine_attrs='drop'.

Versions

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 08:20:52) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.6.7-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: fr_CA.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.15.2.dev29+g7eeba59f pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.1.1.2 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.14.0 distributed: 2.14.0 matplotlib: 3.2.1 cartopy: None seaborn: None numbagg: None setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: 5.4.1 IPython: 7.13.0 sphinx: 3.0.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4009/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
625942676 MDExOlB1bGxSZXF1ZXN0NDI0MDQ4Mzg3 4099 Allow non-unique and non-monotonic coordinates in get_clean_interp_index and polyfit aulemahal 20629530 closed 0     0 2020-05-27T18:48:58Z 2020-06-05T15:46:00Z 2020-06-05T15:46:00Z CONTRIBUTOR   0 pydata/xarray/pulls/4099
  • [ ] Closes #xxxx
  • [x] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

Pull #3733 added da.polyfit and xr.polyval and is using xr.core.missing.get_clean_interp_index in order to get the fitting coordinate. However, this method is stricter than what polyfit needs: as in numpy.polyfit, non-unique and non-monotonic indexes are acceptable. This PR adds a strict keyword argument to get_clean_interp_index so we can skip the uniqueness and monotony tests.

ds.polyfit and xr.polyval were modified to use that keyword. I only added tests for get_clean_interp_index, could add more for polyfit if requested.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4099/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
612846594 MDExOlB1bGxSZXF1ZXN0NDEzNzEzODg2 4033 xr.infer_freq aulemahal 20629530 closed 0     3 2020-05-05T19:39:05Z 2020-05-30T18:11:36Z 2020-05-30T18:08:27Z CONTRIBUTOR   0 pydata/xarray/pulls/4033
  • [x] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

This PR adds a xr.infer_freq method to copy pandas infer_freq but on CFTimeIndex objects. I tried to subclass pandas _FrequencyInferer and to only override as little as possible.

Two things are problematic right now and I would like to get feedback on how to implement them if this PR gets the dev's approval.

1) pd.DatetimeIndex.asi8 returns integers representing nanoseconds since 1970-1-1, while xr.CFTimeIndex.asi8 returns microseconds. In order not to break the API, I patched the _CFTimeFrequencyInferer to store 1000x the values. Not sure if this is the best, but it works.

2) As of now, xr.infer_freq will fail on weekly indexes. This is because pandas is using datetime.weekday() at some point but cftime objects do not implement that (they use dayofwk instead). I'm not sure what to do? Cftime could implement it to completly mirror python's datetime or pandas could use dayofwk since it's available on the TimeStamp objects.

Another option, cleaner but longer, would be to reimplement _FrequencyInferer from scratch. I may have time for this, cause I really think a xr.infer_freq method would be useful.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4033/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
557627188 MDExOlB1bGxSZXF1ZXN0MzY5MTg0Mjk0 3733 Implementation of polyfit and polyval aulemahal 20629530 closed 0     9 2020-01-30T16:58:51Z 2020-03-26T00:22:17Z 2020-03-25T17:17:45Z CONTRIBUTOR   0 pydata/xarray/pulls/3733
  • [x] Closes #3349
  • [x] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

Following discussions in #3349, I suggest here an implementation of polyfit and polyval for xarray. However, this is still work in progress, a lot of testing is missing, all docstrings are missing. But, mainly, I have questions on how to properly conduct this.

My implementation mostly duplicates the code of np.polyfit, but making use of dask.array.linalg.lstsq and dask.array.apply_along_axis for dask arrays. The same method as in xscale.signal.fitting.polyfit, but I add NaN-awareness in a 1-D manner. The version with numpy is also slightly different of np.polyfit because of the NaN skipping, but I wanted the function to replicate its behaviour. It returns a variable number of DataArrays, depending on the keyword arguments (coefficients, [ residuals, matrix rank, singular values ] / [covariance matrix]). Thus giving a medium-length function that has a lot of duplicated code from numpy.polyfit. I thought of simply using a xr.apply_ufunc, but that makes chunking along the fitted dimension forbidden and difficult to return the ancillary results (residuals, rank, covariance matrix...).

Questions: 1 ) Are the functions where they should go? 2 ) Should xarray's implementation really replicate the behaviour of numpy's? A lot of extra code could be removed if we'd say we only want to compute and return the residuals and the coefficients. All the other variables are a few lines of code away for the user that really wants them, and they don't need the power of xarray and dask anyway.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3733/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 1290.819ms · About: xarray-datasette