github: issues: 7 rows where comments = 1, type = "issue" and user = 20629530 sorted by updated

7 rows where comments = 1, type = "issue" and user = 20629530 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
1442443970	I_kwDOAMm_X85V-fLC	7275	REG: `nc_time_axis` not imported anymore	aulemahal 20629530	closed	1	2022-11-09T17:02:59Z	2022-11-10T21:45:28Z	2022-11-10T21:45:28Z	CONTRIBUTOR	What happened? With xarray 2022.11.0, plotting a DataArray with a `cftime` time axis fails. It fails with a matplotlib error : `TypeError: float() argument must be a string or a real number, not 'cftime._cftime.DatetimeNoLeap'` What did you expect to happen? With previous versions of xarray, the `nc_time_axis` package was imported by xarray and these errors were avoided. Minimal Complete Verifiable Example `Python import xarray as xr da = xr.DataArray( list(range(10)), dims=('time',), coords={'time': xr.cftime_range('1900-01-01', periods=10, calendar='noleap', freq='D')} ) da.plot()` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output ```Python TypeError Traceback (most recent call last) Cell In [1], line 7 1 import xarray as xr 2 da = xr.DataArray( 3 list(range(10)), 4 dims=('time',), 5 coords={'time': xr.cftime_range('1900-01-01', periods=10, calendar='noleap', freq='D')} 6 ) ----> 7 da.plot() File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/xarray/plot/accessor.py:46, in DataArrayPlotAccessor.call(self, kwargs) 44 @functools.wraps(dataarray_plot.plot, assigned=("doc", "annotations")) 45 def call(self, kwargs) -> Any: ---> 46 return dataarray_plot.plot(self._da, kwargs) File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/xarray/plot/dataarray_plot.py:312, in plot(darray, row, col, col_wrap, ax, hue, subplot_kws, kwargs) 308 plotfunc = hist 310 kwargs["ax"] = ax --> 312 return plotfunc(darray,** kwargs) File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/xarray/plot/dataarray_plot.py:517, in line(darray, row, col, figsize, aspect, size, ax, hue, x, y, xincrease, yincrease, xscale, yscale, xticks, yticks, xlim, ylim, add_legend, _labels, args, kwargs) 513 ylabel = label_from_attrs(yplt, extra=y_suffix) 515 _ensure_plottable(xplt_val, yplt_val) --> 517 primitive = ax.plot(xplt_val, yplt_val, args, *kwargs) 519 if _labels: 520 if xlabel is not None: File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/matplotlib/axes/_axes.py:1664, in Axes.plot(self, scalex, scaley, data, args,* kwargs) 1662 lines = [self._get_lines(args, data=data, kwargs)] 1663 for line in lines: -> 1664 self.add_line(line) 1665 if scalex: 1666 self._request_autoscale_view("x") File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/matplotlib/axes/_base.py:2340, in _AxesBase.add_line(self, line) 2337 if line.get_clip_path() is None: 2338 line.set_clip_path(self.patch) -> 2340 self._update_line_limits(line) 2341 if not line.get_label(): 2342 line.set_label(f'_child{len(self._children)}') File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/matplotlib/axes/_base.py:2363, in _AxesBase._update_line_limits(self, line) 2359 def _update_line_limits(self, line): 2360 """ 2361 Figures out the data limit of the given line, updating self.dataLim. 2362 """ -> 2363 path = line.get_path() 2364 if path.vertices.size == 0: 2365 return File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/matplotlib/lines.py:1031, in Line2D.get_path(self) 1029 """Return the `~matplotlib.path.Path` associated with this line.""" 1030 if self._invalidy or self._invalidx: -> 1031 self.recache() 1032 return self._path File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/matplotlib/lines.py:659, in Line2D.recache(self, always) 657 if always or self._invalidx: 658 xconv = self.convert_xunits(self._xorig) --> 659 x = _to_unmasked_float_array(xconv).ravel() 660 else: 661 x = self._x File ~/mambaforge/envs/xclim/lib/python3.10/site-packages/matplotlib/cbook/init.py:1369, in _to_unmasked_float_array(x) 1367 return np.ma.asarray(x, float).filled(np.nan) 1368 else: -> 1369 return np.asarray(x, float) TypeError: float() argument must be a string or a real number, not 'cftime._cftime.DatetimeNoLeap' ``` Anything else we need to know? I suspect #7179. This line: https://github.com/pydata/xarray/blob/cc7e09a3507fa342b3790b5c109e700fa12f0b17/xarray/plot/utils.py#L27 does not* import `nc_time_axis`. Further down, the variable gets checked and if `False` an error is raised, but if the package still is not imported if `True`. Previously we had: https://github.com/pydata/xarray/blob/fc9026b59d38146a21769cc2d3026a12d58af059/xarray/plot/utils.py#L27-L32 where the package is always imported. Maybe there's a way to import `nc_time_axis` only when needed? Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 \| packaged by conda-forge \| (main, Aug 22 2022, 20:36:39) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 6.0.5-200.fc36.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_CA.UTF-8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2022.11.0 pandas: 1.5.1 numpy: 1.23.4 scipy: 1.8.1 netCDF4: 1.6.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.10.2 distributed: 2022.10.2 matplotlib: 3.6.2 cartopy: None seaborn: None numbagg: None fsspec: 2022.10.0 cupy: None pint: 0.20.1 sparse: None flox: None numpy_groupies: None setuptools: 65.5.1 pip: 22.3.1 conda: None pytest: 7.2.0 IPython: 8.6.0 sphinx: 5.3.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7275/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1242388766	I_kwDOAMm_X85KDVke	6623	Cftime arrays not supported by polyval	aulemahal 20629530	closed	1	2022-05-19T22:19:14Z	2022-05-31T17:16:04Z	2022-05-31T17:16:04Z	CONTRIBUTOR	What happened? I was trying to use polyval with a cftime coordinate and it failed with `TypeError: unsupported operand type(s) for : 'float' and 'cftime._cftime.DatetimeNoLeap'`. The error seems to originate from #6548, where the process transforming coordinates to numerical values was modified. The new `_ensure_numeric` method seems to ignore the possibility of `cftime` arrays. What did you expect to happen? A polynomial to be evaluated along my coordinate. Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np use_cftime=False will work t = xr.date_range('2001-01-01', periods=100, use_cftime=True, freq='YS') da = xr.DataArray(np.arange(100) * 3, dims=('time',), coords={'time': t}) coeffs = da.polyfit('time', 4) da2 = xr.polyval(da.time, coeffs).polyfit_coefficients ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output ```Python TypeError Traceback (most recent call last) Input In [5], in <cell line: 4>() 2 da = xr.DataArray(np.arange(100) ** 3, dims=('time',), coords={'time': t}) 3 coeffs = da.polyfit('time', 4) ----> 4 da2 = xr.polyval(da.time, coeffs).polyfit_coefficients File ~/Python/xarray/xarray/core/computation.py:1931, in polyval(coord, coeffs, degree_dim) 1929 res = zeros_like(coord) + coeffs.isel({degree_dim: max_deg}, drop=True) 1930 for deg in range(max_deg - 1, -1, -1): -> 1931 res = coord 1932 res += coeffs.isel({degree_dim: deg}, drop=True) 1934 return res File ~/Python/xarray/xarray/core/_typed_ops.py:103, in DatasetOpsMixin.imul(self, other) 102 def imul(self, other): --> 103 return self._inplace_binary_op(other, operator.imul) File ~/Python/xarray/xarray/core/dataset.py:6107, in Dataset._inplace_binary_op(self, other, f) 6105 other = other.reindex_like(self, copy=False) 6106 g = ops.inplace_to_noninplace_op(f) -> 6107 ds = self._calculate_binary_op(g, other, inplace=True) 6108 self._replace_with_new_dims( 6109 ds._variables, 6110 ds._coord_names, (...) 6113 inplace=True, 6114 ) 6115 return self File ~/Python/xarray/xarray/core/dataset.py:6154, in Dataset._calculate_binary_op(self, f, other, join, inplace) 6152 else: 6153 other_variable = getattr(other, "variable", other) -> 6154 new_vars = {k: f(self.variables[k], other_variable) for k in self.data_vars} 6155 ds._variables.update(new_vars) 6156 ds._dims = calculate_dimensions(ds._variables) File ~/Python/xarray/xarray/core/dataset.py:6154, in <dictcomp>(.0) 6152 else: 6153 other_variable = getattr(other, "variable", other) -> 6154 new_vars = {k: f(self.variables[k], other_variable) for k in self.data_vars} 6155 ds._variables.update(new_vars) 6156 ds._dims = calculate_dimensions(ds._variables) File ~/Python/xarray/xarray/core/_typed_ops.py:402, in VariableOpsMixin.mul(self, other) 401 def mul(self, other): --> 402 return self._binary_op(other, operator.mul) File ~/Python/xarray/xarray/core/variable.py:2494, in Variable._binary_op(self, other, f, reflexive) 2491 attrs = self._attrs if keep_attrs else None 2492 with np.errstate(all="ignore"): 2493 new_data = ( -> 2494 f(self_data, other_data) if not reflexive else f(other_data, self_data) 2495 ) 2496 result = Variable(dims, new_data, attrs=attrs) 2497 return result TypeError: unsupported operand type(s) for : 'float' and 'cftime._cftime.DatetimeGregorian' ``` Anything else we need to know? I also noticed that since the Horner PR, `polyfit` and `polyval` do not use the same function to convert coordinates into numerical values. Isn't this dangerous? Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.4 \| packaged by conda-forge \| (main, Mar 24 2022, 17:38:57) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.1.dev267+gd711d58 pandas: 1.4.2 numpy: 1.21.6 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: 1.4.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.4 dask: 2022.04.1 distributed: 2022.4.1 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: None numbagg: None fsspec: 2022.3.0 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.0 numpy_groupies: 0.9.15 setuptools: 62.1.0 pip: 22.0.4 conda: None pytest: None IPython: 8.2.0 sphinx: 4.5.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6623/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
969079775	MDU6SXNzdWU5NjkwNzk3NzU=	5701	Performance issues using map_blocks with cftime indexes.	aulemahal 20629530	open	1	2021-08-12T15:47:29Z	2022-04-19T02:44:37Z		CONTRIBUTOR	What happened: When using `map_blocks` on an object that is dask-backed and has a `CFTimeIndex` coordinate, the construction step (not computation done) is very slow. I've seen up to 100x slower than an equivalent object with a numpy datetime index. What you expected to happen: I would understand a performance difference since numpy/pandas objects are usually more optimized than cftime/xarray objects, but the difference is quite large here. Minimal Complete Verifiable Example: Here is a MCVE that I ran in a jupyter notebook. Performance is basically measured by execution time (wall time). I included the current workaround I have for my usecase. ```python import numpy as np import pandas as pd import xarray as xr import dask.array as da from dask.distributed import Client c = Client(n_workers=1, threads_per_worker=8) Test Data Nt = 10_000 Nx = Ny = 100 chks = (Nt, 10, 10) A = xr.DataArray( da.zeros((Nt, Ny, Nx), chunks=chks), dims=('time', 'y', 'x'), coords={'time': pd.date_range('1900-01-01', freq='D', periods=Nt), 'x': np.arange(Nx), 'y': np.arange(Ny) }, name='data' ) Copy of a, but with a cftime coordinate B = A.copy() B['time'] = xr.cftime_range('1900-01-01', freq='D', periods=Nt, calendar='noleap') A dumb function to apply def func(data): return data + data Test 1 : numpy-backed time coordinate %time outA = A.map_blocks(func, template=A) # %time outA.load(); Res on my machine: CPU times: user 130 ms, sys: 6.87 ms, total: 136 ms Wall time: 127 ms CPU times: user 3.01 s, sys: 8.09 s, total: 11.1 s Wall time: 13.4 s Test 2 : cftime-backed time coordinate %time outB = B.map_blocks(func, template=B) %time outB.load(); Res on my machine CPU times: user 4.42 s, sys: 219 ms, total: 4.64 s Wall time: 4.48 s CPU times: user 13.2 s, sys: 3.1 s, total: 16.3 s Wall time: 26 s Workaround in my code def func_cf(data): data['time'] = xr.decode_cf(data.coords.to_dataset()).time return data + data def map_blocks_cf(func, data): data2 = data.copy() data2['time'] = xr.conventions.encode_cf_variable(data.time) return data2.map_blocks(func, template=data) Test 3 : cftime time coordinate with encoding-decoding %time outB2 = map_blocks_cf(func_cf, B) %time outB2.load(); Res CPU times: user 536 ms, sys: 10.5 ms, total: 546 ms Wall time: 528 ms CPU times: user 9.57 s, sys: 2.23 s, total: 11.8 s Wall time: 21.7 s ``` Anything else we need to know?: After exploration I found 2 culprits for this slowness. I used `%%prun` to profile the construction phase of `map_blocks` and found that in the second case (cftime time coordinate): In `map_blocks` calls to `dask.base.tokenize` take the most time. Precisely, tokenizing a numpy ndarray of O dtype goes through the pickling process of the array. This is already quite slow and cftime objects take even more time to pickle. See Unidata/cftime#253 for the corresponding issue. Most of the construction phase execution time is spent pickling the same datetime array at least once per chunk. Second, but only significant when the time coordinate is very large (55000 in my use case). `CFTimeIndex.__new__` is called more than twice as many times as there are chunks. And within the object creation there is this line : https://github.com/pydata/xarray/blob/3956b73a7792f41e4410349f2c40b9a9a80decd2/xarray/coding/cftimeindex.py#L228 The larger the array, the more time is spent in this iteration. Changing the example above to use `Nt = 50_000`, the code spent a total of 25 s in `dask.base.tokenize` calls and 5 s in `CFTimeIndex.__new__` calls. My workaround is not the best, but it was easy to code without touching xarray. The encoding of the time coordinate changes it to an integer array, which is super fast to tokenize. And the speed up of the construction phase is because there is only one call to `encode_cf_variable` compared to `N_chunks` calls to the pickling, As shown above, I have not seen a slowdown in the computation phase. I think this is mostly because the added `decode_cf` calls are done in parallel, but there might be other reason I do not understand. I do not know for sure how/why this tokenization works, but I guess the best improvment in xarray could be to: - Look into the inputs of `map_blocks` and spot cftime-backed coordinates - Convert those coordinates to a ndarray of a basic dtype. - At the moment of tokenization of the time coordinates, do a switheroo and pass the converted arrays instead. I have no idea if that would work, but if it does that would be the best speed-up I think. Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 \| packaged by conda-forge \| (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-514.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.1.dev18+g4bb9d9c.d20210810 pandas: 1.3.1 numpy: 1.21.1 scipy: 1.7.0 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.07.1 distributed: 2021.07.1 matplotlib: 3.4.2 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.2.1 conda: None pytest: None IPython: 7.25.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5701/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1175454678	I_kwDOAMm_X85GEAPW	6393	DataArray groupby returning Dataset broken in some cases	aulemahal 20629530	closed	1	2022-03-21T14:17:25Z	2022-03-21T15:26:20Z	2022-03-21T15:26:20Z	CONTRIBUTOR	What happened? This is a the reverse problem of #6379, the `DataArrayGroupBy._combine` method seems broken when the mapped function returns a Dataset (which worked before #5692). What did you expect to happen? No response Minimal Complete Verifiable Example ```Python import xarray as xr ds = xr.tutorial.open_dataset("air_temperature") ds.air.resample(time="YS").map(lambda grp: grp.mean("time").to_dataset()) ``` Relevant log output ```Python TypeError Traceback (most recent call last) Input In [3], in <module> ----> 1 ds.air.resample(time="YS").map(lambda grp: grp.mean("time").to_dataset()) File ~/Python/myxarray/xarray/core/resample.py:223, in DataArrayResample.map(self, func, shortcut, args, kwargs) 180 """Apply a function to each array in the group and concatenate them 181 together into a new array. 182 (...) 219 The result of splitting, applying and combining this array. 220 """ 221 # TODO: the argument order for Resample doesn't match that for its parent, 222 # GroupBy --> 223 combined = super().map(func, shortcut=shortcut, args=args, kwargs) 225 # If the aggregation function didn't drop the original resampling 226 # dimension, then we need to do so before we can rename the proxy 227 # dimension we used. 228 if self._dim in combined.coords: File ~/Python/myxarray/xarray/core/groupby.py:835, in DataArrayGroupByBase.map(self, func, shortcut, args, *kwargs) 833 grouped = self._iter_grouped_shortcut() if shortcut else self._iter_grouped() 834 applied = (maybe_wrap_array(arr, func(arr, args,** kwargs)) for arr in grouped) --> 835 return self._combine(applied, shortcut=shortcut) File ~/Python/myxarray/xarray/core/groupby.py:869, in DataArrayGroupByBase._combine(self, applied, shortcut) 867 index, index_vars = create_default_index_implicit(coord) 868 indexes = {k: index for k in index_vars} --> 869 combined = combined._overwrite_indexes(indexes, coords=index_vars) 870 combined = self._maybe_restore_empty_groups(combined) 871 combined = self._maybe_unstack(combined) TypeError: _overwrite_indexes() got an unexpected keyword argument 'coords' ``` Anything else we need to know? I guess the same solution as #6386 could be used! Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 \| packaged by conda-forge \| (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.16.13-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 2022.3.1.dev16+g3ead17ea pandas: 1.4.0 numpy: 1.20.3 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.0 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.08.0 distributed: 2021.08.0 matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.07.0 cupy: None pint: 0.18 sparse: None setuptools: 57.4.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 8.0.1 sphinx: 4.1.2	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6393/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1173980959	I_kwDOAMm_X85F-Ycf	6379	Dataset groupby returning DataArray broken in some cases	aulemahal 20629530	closed	1	2022-03-18T20:07:37Z	2022-03-20T18:55:26Z	2022-03-20T18:55:26Z	CONTRIBUTOR	What happened? Got a TypeError when resampling a dataset along a dimension, mapping a function to each group. The function returns a DataArray. Failed with : `TypeError: _overwrite_indexes() got an unexpected keyword argument 'variables'` What did you expect to happen? This worked before the merging of #5692. A DataArray was returned as expected. Minimal Complete Verifiable Example ```Python import xarray as xr ds = xr.tutorial.open_dataset("air_temperature") ds.resample(time="YS").map(lambda grp: grp.air.mean("time")) ``` Relevant log output ```Python TypeError Traceback (most recent call last) Input In [37], in <module> ----> 1 ds.resample(time="YS").map(lambda grp: grp.air.mean("time")) File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/xarray/core/resample.py:300, in DatasetResample.map(self, func, args, shortcut, *kwargs) 298 # ignore shortcut if set (for now) 299 applied = (func(ds, args,** kwargs) for ds in self._iter_grouped()) --> 300 combined = self._combine(applied) 302 return combined.rename({self._resample_dim: self._dim}) File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/xarray/core/groupby.py:999, in DatasetGroupByBase._combine(self, applied) 997 index, index_vars = create_default_index_implicit(coord) 998 indexes = {k: index for k in index_vars} --> 999 combined = combined._overwrite_indexes(indexes, variables=index_vars) 1000 combined = self._maybe_restore_empty_groups(combined) 1001 combined = self._maybe_unstack(combined) TypeError: _overwrite_indexes() got an unexpected keyword argument 'variables' ``` Anything else we need to know? In the docstring of `DatasetGroupBy.map` it is not made clear that the passed function should return a dataset, but the opposite is also not said. This worked before and I think the issues comes from #5692, which introduced different signatures for `DataArray._overwrite_indexes` (which is called in my case) and `Dataset._overwrite_indexes` (which is expected by the new `_combine`). If the function passed to `Dataset.resample(...).map` should only return `Dataset`s then I believe a more explicit error is needed, as well as some notice in the docs and a breaking change entry in the changelog. If `DataArray`s should be accepted, then we have a regression here. I may have time to help on this. Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 \| packaged by conda-forge \| (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.16.13-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 2022.3.1.dev16+g3ead17ea pandas: 1.4.0 numpy: 1.20.3 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.0 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.08.0 distributed: 2021.08.0 matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.07.0 cupy: None pint: 0.18 sparse: None setuptools: 57.4.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 8.0.1 sphinx: 4.1.2	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6379/reactions", "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }	completed	xarray 13221727	issue
1173997225	I_kwDOAMm_X85F-cap	6380	Attributes of concatenation coordinate are dropped	aulemahal 20629530	closed	1	2022-03-18T20:31:17Z	2022-03-20T18:53:46Z	2022-03-20T18:53:46Z	CONTRIBUTOR	What happened? When concatenating two objects with `xr.concat` along a new dimension given through a `DataArray`, the attributes of this given coordinate are lost in the concatenation. What did you expect to happen? I expected the concatenation coordinate to be identical to the 1D DataArray I gave to `concat`. Minimal Complete Verifiable Example ```Python import xarray as xr ds = xr.tutorial.open_dataset("air_temperature") concat_dim = xr.DataArray([1, 2], dims=("condim",), attrs={"an_attr": "yep"}, name="condim") out = xr.concat([ds, ds], concat_dim) out.condim.attrs ``` Before #5692, I get: `{'an_attr': 'yep'}` with the current master, I get: `{}` Anything else we need to know? I'm not 100% sure, but I think the change is due to `xr.core.concat._calc_concat_dim_coord` being replaced by `xr.core.concat.__calc_concat_dim_index`. The former didn't touch the concatenation coordinate, while the latter casts it as an index, thus dropping the attributes in the process. If the solution is to add a check in `xr.concat`, I may have time to implement something simple. Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 \| packaged by conda-forge \| (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.16.13-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 2022.3.1.dev16+g3ead17ea pandas: 1.4.0 numpy: 1.20.3 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.0 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.08.0 distributed: 2021.08.0 matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.07.0 cupy: None pint: 0.18 sparse: None setuptools: 57.4.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 8.0.1 sphinx: 4.1.2	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6380/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1035607476	I_kwDOAMm_X849uh20	5897	ds.mean bugs with cftime objects	aulemahal 20629530	open	1	2021-10-25T21:55:12Z	2021-10-27T14:51:07Z		CONTRIBUTOR	What happened: Given a dataset that has a variable with cftime objects along dimension A, averaging (`mean`) leads to buggy behaviour: Averaging over 'A' drops the variable instead of averaging it. Averaging over any other dimension will fail if that variable is on the dask backend. What you expected to happen: I expected the average to fail in the case of a dask-backed cftime variable, given that this code exists: https://github.com/pydata/xarray/blob/fdabf3bea5c750939a4a2ae60f80ed34a6aebd58/xarray/core/duck_array_ops.py#L562-L572 And I expected the average to work (not drop the var) in the case of the numpy backend. I expected the fact that dask is used to be irrelevant to the result. I expected the mean to conserve the cftime variable as-is since it doesn't include the averaged dimension. Minimal Complete Verifiable Example: ```python Put your MCVE code here import xarray as xr ds = xr.Dataset({ 'var1': (('time',), xr.cftime_range('2021-10-31', periods=10, freq='D')), 'var2': (('x',), list(range(10))) }) var1 contains cftime objects var2 contains integers They do not share dims ds.mean('time') # var1 has disappeared instead of being averaged ds.mean('x') # Everything ok dsc = ds.chunk({}) dsc.mean('time') # var1 has disappeared. I would expected this line to fail. dsc.mean('x') # Raises NotImplementedError. I would expect this line to run flawlessly. ``` Anything else we need to know?: A culprit is #5393, but maybe the bug is older? I think the change introduced there causes the issue (2) above. In `duck_array_ops.py` the mean operation is declared `numeric_only`, which is kinda incoherent with the implementation allowing means of datetime objects. This setting causes my (1) above. Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: fdabf3bea5c750939a4a2ae60f80ed34a6aebd58 python: 3.9.7 \| packaged by conda-forge \| (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.14.12-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.19.1.dev89+gfdabf3be pandas: 1.3.4 numpy: 1.21.3 scipy: 1.7.1 netCDF4: 1.5.7 pydap: installed h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.1 cftime: 1.5.1 nc_time_axis: 1.4.0 PseudoNetCDF: installed rasterio: 1.2.10 cfgrib: 0.9.9.1 iris: 3.1.0 bottleneck: 1.3.2 dask: 2021.10.0 distributed: 2021.10.0 matplotlib: 3.4.3 cartopy: 0.20.1 seaborn: 0.11.2 numbagg: 0.2.1 fsspec: 2021.10.1 cupy: None pint: 0.17 sparse: 0.13.0 setuptools: 58.2.0 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.28.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5897/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

7 rows where comments = 1, type = "issue" and user = 20629530 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

```Python

Anything else we need to know?

Environment

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

use_cftime=False will work

MVCE confirmation

Relevant log output

```Python

Anything else we need to know?

Environment

Test Data

Copy of a, but with a cftime coordinate

A dumb function to apply

Test 1 : numpy-backed time coordinate

Res on my machine:

CPU times: user 130 ms, sys: 6.87 ms, total: 136 ms

Wall time: 127 ms

CPU times: user 3.01 s, sys: 8.09 s, total: 11.1 s

Wall time: 13.4 s

Test 2 : cftime-backed time coordinate

Res on my machine

CPU times: user 4.42 s, sys: 219 ms, total: 4.64 s

Wall time: 4.48 s

CPU times: user 13.2 s, sys: 3.1 s, total: 16.3 s

Wall time: 26 s

Workaround in my code

Test 3 : cftime time coordinate with encoding-decoding

Res

CPU times: user 536 ms, sys: 10.5 ms, total: 546 ms

Wall time: 528 ms

CPU times: user 9.57 s, sys: 2.23 s, total: 11.8 s

Wall time: 21.7 s

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Relevant log output

```Python

Anything else we need to know?

Environment

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Relevant log output

```Python

Anything else we need to know?

Environment

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Anything else we need to know?

Environment

Put your MCVE code here

var1 contains cftime objects

var2 contains integers

They do not share dims

Advanced export