home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

9 rows where state = "open" and user = 20629530 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 2

  • issue 7
  • pull 2

state 1

  • open · 9 ✖

repo 1

  • xarray 9
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1617939324 I_kwDOAMm_X85gb8t8 7602 Inconsistent coordinate attributes handling in apply_ufunc aulemahal 20629530 open 0     0 2023-03-09T20:19:25Z 2023-03-15T17:04:35Z   CONTRIBUTOR      

What happened?

When calling apply_ufunc with keep_attrs=False, the coordinate attributes are dropped only if there is more than one argument to the call.

What did you expect to happen?

I expected the behaviour to be the same, no matter the number of arguments.

I also expected the coordinate attributes to be preserved if that coordinate was appearing on only one argument.

Minimal Complete Verifiable Example

```Python import xarray as xr

def wrapper(ar1, ar2=None): return ar1.mean(axis=-1)

ds = xr.tutorial.open_dataset("air_temperature")

o1 = xr.apply_ufunc( wrapper, ds.air, ds.time, input_core_dims=[['time'], ['time']], keep_attrs=False ) print(o1.lat.attrs) # {}

o2 = xr.apply_ufunc( wrapper, ds.air, input_core_dims=[['time']], keep_attrs=False ) print(o2.lat.attrs) # {'standard_name': ... } ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

The behaviour stems from this if/else:

https://github.com/pydata/xarray/blob/6d771fc82228bdaf8a4b77d0ceec1cc444ebd090/xarray/core/computation.py#L252-L260

The upper part (1 arg) doesn't touch the attributes, but in the else (more than 1 arg) , two levels deeper in merge_coordinates_without_align , we have:

https://github.com/pydata/xarray/blob/6d771fc82228bdaf8a4b77d0ceec1cc444ebd090/xarray/core/merge.py#L283-L286

When apply_ufunc is called with keep_attrs=False, the combine_attrs above is "drop". In merge_attrs even though there is only one attribute dict passed for lat, it returns an empty dict.

My preference would be for keep_attrs to only refer to the data attributes and that coordinate attributes would be preserved, or even merged if needed. This was my expectation here, as this is the behaviour in many other places of xarray. For example : python3 with xr.set_options(keep_attrs=False): o = ds.air.mean('time') This drops attributes of air, but preserves those of lat and lon.

I see no easy way out here, except by handling it explicitly somewhere in apply_ufunc ? If the decision is that "untouched" coordinate attribute preservation is not ensured by xarray, I think it would be worth noting somewhere (but I don't know where). And I would change my codes to "manually" preserve those where appropriate.

Environment

INSTALLED VERSIONS ------------------ commit: 6d771fc82228bdaf8a4b77d0ceec1cc444ebd090 python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 6.1.11-100.fc36.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_CA.UTF-8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.2.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.10.1 netCDF4: 1.6.3 pydap: installed h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.13.6 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: 3.2.2 rasterio: 1.3.6 cfgrib: 0.9.10.3 iris: 3.4.1 bottleneck: 1.3.7 dask: 2023.3.0 distributed: 2023.3.0 matplotlib: 3.7.1 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: 0.2.2 fsspec: 2023.3.0 cupy: None pint: 0.20.1 sparse: 0.14.0 flox: 0.6.8 numpy_groupies: 0.9.20 setuptools: 67.6.0 pip: 23.0.1 conda: None pytest: 7.2.2 mypy: None IPython: 8.11.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7602/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
991544027 MDExOlB1bGxSZXF1ZXN0NzI5OTkzMTE0 5781 Add encodings to save_mfdataset aulemahal 20629530 open 0     1 2021-09-08T21:24:13Z 2022-10-06T21:44:18Z   CONTRIBUTOR   0 pydata/xarray/pulls/5781
  • [ ] Closes #xxxx
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

Simply add a encodings argument to save_mfdataset. As for the other args, it expects a list of dictionaries, with encoding information to be passed to to_netcdf for each dataset. Added a minimal test, simply to see if the argument was taken into account.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5781/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
906175200 MDExOlB1bGxSZXF1ZXN0NjU3MjA1NTM2 5402 `dt.to_pytimedelta` to allow arithmetic with cftime objects aulemahal 20629530 open 0     1 2021-05-28T22:48:50Z 2022-06-09T14:50:16Z   CONTRIBUTOR   0 pydata/xarray/pulls/5402
  • [ ] Closes #xxxx
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

When playing with cftime objects a problem I encountered many times is that I can sub two arrays and them add it back to another. Subtracting to cftime datetime arrays result in an array of np.timedelta64. And when trying to add it back to another cftime array, we get a UFuncTypeError because the two arrays have incompatible dtypes : '<m8[ns]' and 'O'.

Example: ```python import xarray as xr da = xr.DataArray(xr.cftime_range('1900-01-01', freq='D', periods=10), dims=('time',))

An array of timedelta64[ns]

dt = da - da[0]

da[-1] + dt # Fails ```

However, if the two arrays were of 'O' dtype, then the subtraction would be made by cftime which supports datetime.timedelta objects.

This solution here adds a to_pytimedelta to the TimedeltaAccessor, mirroring the name of the similar function on pd.Series.dt. It uses a monkeypatching workaround to prevent xarray to case the array back into numpy objects.

The user still has to check if the data is in cftime or numpy to adapt the operation (calling dt.to_pytimedelta or not), but custom workaround were always overly complicated for such a simple problem, this helps.

Also, this doesn't work with dask arrays because loading a dask array triggers the variable constructor and thus recasts the array of datetime.timedelta to numpy.timedelta[64].

I realize I maybe should have opened an issue before, but I had this idea and it all rushed along.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5402/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
969079775 MDU6SXNzdWU5NjkwNzk3NzU= 5701 Performance issues using map_blocks with cftime indexes. aulemahal 20629530 open 0     1 2021-08-12T15:47:29Z 2022-04-19T02:44:37Z   CONTRIBUTOR      

What happened: When using map_blocks on an object that is dask-backed and has a CFTimeIndex coordinate, the construction step (not computation done) is very slow. I've seen up to 100x slower than an equivalent object with a numpy datetime index.

What you expected to happen: I would understand a performance difference since numpy/pandas objects are usually more optimized than cftime/xarray objects, but the difference is quite large here.

Minimal Complete Verifiable Example:

Here is a MCVE that I ran in a jupyter notebook. Performance is basically measured by execution time (wall time). I included the current workaround I have for my usecase.

```python import numpy as np import pandas as pd import xarray as xr import dask.array as da from dask.distributed import Client

c = Client(n_workers=1, threads_per_worker=8)

Test Data

Nt = 10_000 Nx = Ny = 100 chks = (Nt, 10, 10)

A = xr.DataArray( da.zeros((Nt, Ny, Nx), chunks=chks), dims=('time', 'y', 'x'), coords={'time': pd.date_range('1900-01-01', freq='D', periods=Nt), 'x': np.arange(Nx), 'y': np.arange(Ny) }, name='data' )

Copy of a, but with a cftime coordinate

B = A.copy() B['time'] = xr.cftime_range('1900-01-01', freq='D', periods=Nt, calendar='noleap')

A dumb function to apply

def func(data): return data + data

Test 1 : numpy-backed time coordinate

%time outA = A.map_blocks(func, template=A) # %time outA.load();

Res on my machine:

CPU times: user 130 ms, sys: 6.87 ms, total: 136 ms

Wall time: 127 ms

CPU times: user 3.01 s, sys: 8.09 s, total: 11.1 s

Wall time: 13.4 s

Test 2 : cftime-backed time coordinate

%time outB = B.map_blocks(func, template=B) %time outB.load();

Res on my machine

CPU times: user 4.42 s, sys: 219 ms, total: 4.64 s

Wall time: 4.48 s

CPU times: user 13.2 s, sys: 3.1 s, total: 16.3 s

Wall time: 26 s

Workaround in my code

def func_cf(data): data['time'] = xr.decode_cf(data.coords.to_dataset()).time return data + data

def map_blocks_cf(func, data): data2 = data.copy() data2['time'] = xr.conventions.encode_cf_variable(data.time) return data2.map_blocks(func, template=data)

Test 3 : cftime time coordinate with encoding-decoding

%time outB2 = map_blocks_cf(func_cf, B) %time outB2.load();

Res

CPU times: user 536 ms, sys: 10.5 ms, total: 546 ms

Wall time: 528 ms

CPU times: user 9.57 s, sys: 2.23 s, total: 11.8 s

Wall time: 21.7 s

```

Anything else we need to know?: After exploration I found 2 culprits for this slowness. I used %%prun to profile the construction phase of map_blocks and found that in the second case (cftime time coordinate):

  1. In map_blocks calls to dask.base.tokenize take the most time. Precisely, tokenizing a numpy ndarray of O dtype goes through the pickling process of the array. This is already quite slow and cftime objects take even more time to pickle. See Unidata/cftime#253 for the corresponding issue. Most of the construction phase execution time is spent pickling the same datetime array at least once per chunk.
  2. Second, but only significant when the time coordinate is very large (55000 in my use case). CFTimeIndex.__new__ is called more than twice as many times as there are chunks. And within the object creation there is this line : https://github.com/pydata/xarray/blob/3956b73a7792f41e4410349f2c40b9a9a80decd2/xarray/coding/cftimeindex.py#L228 The larger the array, the more time is spent in this iteration. Changing the example above to use Nt = 50_000, the code spent a total of 25 s in dask.base.tokenize calls and 5 s in CFTimeIndex.__new__ calls.

My workaround is not the best, but it was easy to code without touching xarray. The encoding of the time coordinate changes it to an integer array, which is super fast to tokenize. And the speed up of the construction phase is because there is only one call to encode_cf_variable compared to N_chunks calls to the pickling, As shown above, I have not seen a slowdown in the computation phase. I think this is mostly because the added decode_cf calls are done in parallel, but there might be other reason I do not understand.

I do not know for sure how/why this tokenization works, but I guess the best improvment in xarray could be to: - Look into the inputs of map_blocks and spot cftime-backed coordinates - Convert those coordinates to a ndarray of a basic dtype. - At the moment of tokenization of the time coordinates, do a switheroo and pass the converted arrays instead.

I have no idea if that would work, but if it does that would be the best speed-up I think.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-514.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.1.dev18+g4bb9d9c.d20210810 pandas: 1.3.1 numpy: 1.21.1 scipy: 1.7.0 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.07.1 distributed: 2021.07.1 matplotlib: 3.4.2 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.2.1 conda: None pytest: None IPython: 7.25.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5701/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1035607476 I_kwDOAMm_X849uh20 5897 ds.mean bugs with cftime objects aulemahal 20629530 open 0     1 2021-10-25T21:55:12Z 2021-10-27T14:51:07Z   CONTRIBUTOR      

What happened: Given a dataset that has a variable with cftime objects along dimension A, averaging (mean) leads to buggy behaviour:

  1. Averaging over 'A' drops the variable instead of averaging it.
  2. Averaging over any other dimension will fail if that variable is on the dask backend.

What you expected to happen:

  1. I expected the average to fail in the case of a dask-backed cftime variable, given that this code exists: https://github.com/pydata/xarray/blob/fdabf3bea5c750939a4a2ae60f80ed34a6aebd58/xarray/core/duck_array_ops.py#L562-L572

And I expected the average to work (not drop the var) in the case of the numpy backend.

  1. I expected the fact that dask is used to be irrelevant to the result. I expected the mean to conserve the cftime variable as-is since it doesn't include the averaged dimension.

Minimal Complete Verifiable Example:

```python

Put your MCVE code here

import xarray as xr

ds = xr.Dataset({ 'var1': (('time',), xr.cftime_range('2021-10-31', periods=10, freq='D')), 'var2': (('x',), list(range(10))) })

var1 contains cftime objects

var2 contains integers

They do not share dims

ds.mean('time') # var1 has disappeared instead of being averaged

ds.mean('x') # Everything ok

dsc = ds.chunk({})

dsc.mean('time') # var1 has disappeared. I would expected this line to fail.

dsc.mean('x') # Raises NotImplementedError. I would expect this line to run flawlessly. ```

Anything else we need to know?: A culprit is #5393, but maybe the bug is older? I think the change introduced there causes the issue (2) above.

In duck_array_ops.py the mean operation is declared numeric_only, which is kinda incoherent with the implementation allowing means of datetime objects. This setting causes my (1) above.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: fdabf3bea5c750939a4a2ae60f80ed34a6aebd58 python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.14.12-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.19.1.dev89+gfdabf3be pandas: 1.3.4 numpy: 1.21.3 scipy: 1.7.1 netCDF4: 1.5.7 pydap: installed h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.1 cftime: 1.5.1 nc_time_axis: 1.4.0 PseudoNetCDF: installed rasterio: 1.2.10 cfgrib: 0.9.9.1 iris: 3.1.0 bottleneck: 1.3.2 dask: 2021.10.0 distributed: 2021.10.0 matplotlib: 3.4.3 cartopy: 0.20.1 seaborn: 0.11.2 numbagg: 0.2.1 fsspec: 2021.10.1 cupy: None pint: 0.17 sparse: 0.13.0 setuptools: 58.2.0 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.28.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5897/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
932898075 MDU6SXNzdWU5MzI4OTgwNzU= 5551 cftime 1.5.0 changes behviour upon pickling : breaks get_clean_interp_index with a dask distributed scheduler. aulemahal 20629530 open 0     2 2021-06-29T16:29:45Z 2021-09-24T20:55:27Z   CONTRIBUTOR      

What happened:

Quite a specific bug! Using map_blocks to wrap a polyfit computation, using a dask client (not the local scheduler) and a time axis with a cftime calendar, I got the error : TypeError: cannot compute the time difference between dates with different calendars.

What you expected to happen:

No bug.

Minimal Complete Verifiable Example:

```python ds = xr.tutorial.open_dataset('rasm').chunk({'x': 25, 'y': 25})

templ = ds.Tair

def func(ds, verbose=False): # Dummy function that call get_clean_interp_index function # Return the Tair as-is just for the test. if verbose: print(ds.time) print(type(ds.time[0].item())) x = xr.core.missing.get_clean_interp_index(ds, 'time')

return ds.Tair

This works (time is a coordinate, so it is already loaded

x = xr.core.missing.get_clean_interp_index(ds, 'time')

This works too. The local scheduler is used.

out = ds.map_blocks( func, template=templ, kwargs={'verbose': False} ) out.load()

This fails!

with Client(n_workers=1, threads_per_worker=8, dashboard_address=8786, memory_limit='7GB') as c: out = ds.map_blocks( func, template=templ, kwargs={'verbose': True} ) out.load() ```

The full traceback is here:

```python --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-12-de89288ffcd5> in <module> 27 kwargs={'verbose': True} 28 ) ---> 29 out.load() /exec/pbourg/.conda/x38/lib/python3.9/site-packages/xarray/core/dataarray.py in load(self, **kwargs) 883 dask.compute 884 """ --> 885 ds = self._to_temp_dataset().load(**kwargs) 886 new = self._from_temp_dataset(ds) 887 self._variable = new._variable /exec/pbourg/.conda/x38/lib/python3.9/site-packages/xarray/core/dataset.py in load(self, **kwargs) 848 849 # evaluate all the dask arrays simultaneously --> 850 evaluated_data = da.compute(*lazy_data.values(), **kwargs) 851 852 for k, data in zip(lazy_data, evaluated_data): /exec/pbourg/.conda/x38/lib/python3.9/site-packages/dask/base.py in compute(*args, **kwargs) 565 postcomputes.append(x.__dask_postcompute__()) 566 --> 567 results = schedule(dsk, keys, **kwargs) 568 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) 569 /exec/pbourg/.conda/x38/lib/python3.9/site-packages/distributed/client.py in get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs) 2705 should_rejoin = False 2706 try: -> 2707 results = self.gather(packed, asynchronous=asynchronous, direct=direct) 2708 finally: 2709 for f in futures.values(): /exec/pbourg/.conda/x38/lib/python3.9/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous) 2019 else: 2020 local_worker = None -> 2021 return self.sync( 2022 self._gather, 2023 futures, /exec/pbourg/.conda/x38/lib/python3.9/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs) 860 return future 861 else: --> 862 return sync( 863 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs 864 ) /exec/pbourg/.conda/x38/lib/python3.9/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs) 336 if error[0]: 337 typ, exc, tb = error[0] --> 338 raise exc.with_traceback(tb) 339 else: 340 return result[0] /exec/pbourg/.conda/x38/lib/python3.9/site-packages/distributed/utils.py in f() 319 if callback_timeout is not None: 320 future = asyncio.wait_for(future, callback_timeout) --> 321 result[0] = yield future 322 except Exception: 323 error[0] = sys.exc_info() /exec/pbourg/.conda/x38/lib/python3.9/site-packages/tornado/gen.py in run(self) 760 761 try: --> 762 value = future.result() 763 except Exception: 764 exc_info = sys.exc_info() /exec/pbourg/.conda/x38/lib/python3.9/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker) 1884 exc = CancelledError(key) 1885 else: -> 1886 raise exception.with_traceback(traceback) 1887 raise exc 1888 if errors == "skip": /exec/pbourg/.conda/x38/lib/python3.9/site-packages/xarray/core/parallel.py in _wrapper() 284 ] 285 --> 286 result = func(*converted_args, **kwargs) 287 288 # check all dims are present <ipython-input-12-de89288ffcd5> in func() 8 print(ds.time) 9 print(type(ds.time[0].item())) ---> 10 x = xr.core.missing.get_clean_interp_index(ds, 'time') 11 12 return ds.Tair /exec/pbourg/.conda/x38/lib/python3.9/site-packages/xarray/core/missing.py in get_clean_interp_index() 276 index = index.values 277 index = Variable( --> 278 data=datetime_to_numeric(index, offset=offset, datetime_unit="ns"), 279 dims=(dim,), 280 ) /exec/pbourg/.conda/x38/lib/python3.9/site-packages/xarray/core/duck_array_ops.py in datetime_to_numeric() 462 # For np.datetime64, this can silently yield garbage due to overflow. 463 # One option is to enforce 1970-01-01 as the universal offset. --> 464 array = array - offset 465 466 # Scalar is converted to 0d-array src/cftime/_cftime.pyx in cftime._cftime.datetime.__sub__() TypeError: cannot compute the time difference between dates with different calendars ```

The printout to the console. I am calling this in a jupyter notebook so the prints from within workers are in the console, not in the cell's output. I removed useless lines to shorten it.

``` <xarray.DataArray 'time' (time: 36)> array([cftime.datetime(1980, 9, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True), cftime.datetime(1980, 10, 17, 0, 0, 0, 0, calendar='noleap', has_year_zero=True), cftime.datetime(1980, 11, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True), .... )], dtype=object) Coordinates: * time (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00 Attributes: long_name: time type_preferred: int

<class 'cftime._cftime.datetime'> ```

And for reference: ```python

ds.time array([cftime.DatetimeNoLeap(1980, 9, 16, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(1980, 10, 17, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(1980, 11, 16, 12, 0, 0, 0, has_year_zero=True), ... type(ds.time[0].item()) cftime._cftime.DatetimeNoLeap ``` Anything else we need to know?:

I'm not sure where the exact breaking change lies (dask or cftime?), but this worked with dask 2021.5 and cftime <= 1.4.1. The problem lies in get_clean_interp_index, specifically these lines:

https://github.com/pydata/xarray/blob/5ccb06951cecd59b890c1457e36ee3c2030a67aa/xarray/core/missing.py#L274-L280

On the original dataset, the class of the time values is DatetimeNoLeap whereas the time coordinates received by func are of class datetime, the calendar is only a kwargs. Thus, in get_clean_interp_index the offset is created with the default "standard" calendar and becomes incompatible with the array itself. Which makes datetime_to_numeric fail.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.5 | packaged by conda-forge | (default, Jun 19 2021, 00:32:32) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-514.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.18.2 pandas: 1.2.5 numpy: 1.21.0 scipy: 1.7.0 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.06.2 distributed: 2021.06.2 matplotlib: 3.4.2 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.1.3 conda: None pytest: None IPython: 7.25.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5551/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
830256796 MDU6SXNzdWU4MzAyNTY3OTY= 5026 Datetime accessor fails on cftime arrays with missing values aulemahal 20629530 open 0     0 2021-03-12T16:05:43Z 2021-04-19T02:41:00Z   CONTRIBUTOR      

What happened: I have a computation that output dates but that sometimes also outputs missing data. (It computes the start date of a run in a timeseries, if there is no run, it outputs NaN). Afterwards, I'd like to convert those dates to dayofyear, thus I call out.dt.dayofyear. In a case where the first value of out is missing, it fails.

What you expected to happen: I expected out.dt.dayofyear to return an array where the first value would be NaN.

Minimal Complete Verifiable Example:

```python import xarray as xr import numpy as np

da = xr.DataArray( [[np.nan, np.nan], [1, 2]], dims=('x', 'time'), coords={'x': [1, 2], 'time': xr.cftime_range('2000-01-01', periods=2)}, )

out is a "object" array, where the first element is NaN

out = da.idxmin('time')

out.dt.dayofyear

Expected : [nan, 1.]

Got:


TypeError Traceback (most recent call last) <ipython-input-56-06aa9bdfd6b8> in <module> ----> 1 da.idxmin('time').dt.dayofyear

~/.conda/envs/xclim-dev/lib/python3.8/site-packages/xarray/core/utils.py in get(self, obj, cls) 917 return self._accessor 918 --> 919 return self._accessor(obj) 920 921

~/.conda/envs/xclim-dev/lib/python3.8/site-packages/xarray/core/accessor_dt.py in new(cls, obj) 514 # do all the validation here. 515 if not _contains_datetime_like_objects(obj): --> 516 raise TypeError( 517 "'.dt' accessor only available for " 518 "DataArray with datetime64 timedelta64 dtype or "

TypeError: '.dt' accessor only available for DataArray with datetime64 timedelta64 dtype or for arrays containing cftime datetime objects. ```

Anything else we need to know?: This also triggers computation when da is lazy. A lazy .dt accessor would be useful.

The laziness of it aside, would it be meaningful to change: https://github.com/pydata/xarray/blob/d4b7a608bab0e7c140937b0b59ca45115d205145/xarray/core/common.py#L1822 to cycle on the array while np.isnan(sample) ?

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.16-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: fr_CA.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.4.0 cftime: 1.3.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.2.1 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.12.0 distributed: 2.20.0 matplotlib: 3.3.4 cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: 5.4.3 IPython: 7.21.0 sphinx: 3.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5026/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
798592803 MDU6SXNzdWU3OTg1OTI4MDM= 4853 cftime default's datetime breaks CFTimeIndex aulemahal 20629530 open 0     4 2021-02-01T18:14:21Z 2021-02-05T18:48:31Z   CONTRIBUTOR      

What happened: With cftime 1.2.0, one can create datetime object with cftime.datetime(*args, calendar='calendar'), instead of using one of the subclasses (ex cftime.DatetimeNoLeap(*args)). In the latest release (1.4.0, yesterday), the subclasses have been deprecated, but kept as legacy. While all xr code still works (it is using the legacy subclasses), the CFTimeIndex object relies on the type of the datetime object in order to infer the calendar. If the datetime was created outside xarray, using the now default constructor, the returned type is not understood and CFTimeIndexbreaks.

What you expected to happen: I expected CFTimeIndex to be independent of the way the datetime object is created.

Minimal Complete Verifiable Example:

```python3 import cftime import numpy as np import xarray as xr

A datetime array, not created in xarray

time = cftime.num2date(np.arange(365), "days since 2000-01-01", calendar="noleap") a = xr.DataArray(np.zeros(365), dims=('time',), coords={'time': time})

a.indexes['time'] Fails with :python3 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 342, in repr attrs_str = format_attrs(self) File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 264, in format_attrs attrs["freq"] = f"'{index.freq}'" if len(index) >= 3 else None File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 692, in freq return infer_freq(self) File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/frequencies.py", line 96, in infer_freq inferer = _CFTimeFrequencyInferer(index) File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/frequencies.py", line 105, in init self.values = index.asi8 File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 673, in asi8 [ File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 674, in <listcomp> _total_microseconds(exact_cftime_datetime_difference(epoch, date)) File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/core/resample_cftime.py", line 370, in exact_cftime_datetime_difference seconds = b.replace(microsecond=0) - a.replace(microsecond=0) File "src/cftime/_cftime.pyx", line 1153, in cftime._cftime.datetime.sub ValueError: cannot compute the time difference between dates with different calendars ``` Anything else we need to know?:

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Jul 31 2020, 02:39:48) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.10.11-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: fr_CA.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.20.0 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.4.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.01.1 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 46.1.3 pip: 20.1 conda: None pytest: 6.2.2 IPython: 7.19.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4853/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
709272776 MDU6SXNzdWU3MDkyNzI3NzY= 4463 Interpolation with multiple mutlidimensional arrays sharing dims fails aulemahal 20629530 open 0     5 2020-09-25T20:45:54Z 2020-09-28T19:10:56Z   CONTRIBUTOR      

What happened: When trying to interpolate a N-D array with 2 other arrays sharing a common (new) dimension and with one (at least) being multidimensional fails. Kinda a complex edge case I agree. Here's a MWE: python3 da = xr.DataArray([[[1, 2, 3], [2, 3, 4]], [[1, 2, 3], [2, 3, 4]]], dims=('t', 'x', 'y'), coords={'x': [1, 2], 'y': [1, 2, 3], 't': [10, 12]}) dy = xr.DataArray([1.5, 2.5], dims=('u',), coords={'u': [45, 55]}) dx = xr.DataArray([[1.5, 1.5], [1.5, 1.5]], dims=('t', 'u'), coords={'u': [45, 55], 't': [10, 12]}) So we have da a 3D array with dims (t, x, y). We have dy, containing the values of y along new dimension u. And dx containing the values of x along both u and t. We want to interpolate with: python3 out = da.interp(y=dy, x=dx, method='linear') As so to have a new array over dims t and u.

What you expected to happen: I expected (with the dummy data I gave): python3 xr.DataArray([[2, 3], [2, 3]], dims=('t', 'u'), coords={'u': [45, 55], 't': [10, 12]})

But instead it fails with ValueError: axes don't match array.

Full traceback:

```python3 --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-19-b968c6f3dae9> in <module> ----> 1 a.interp(y=y, x=x, method='linear') ~/Python/xarray/xarray/core/dataarray.py in interp(self, coords, method, assume_sorted, kwargs, **coords_kwargs) 1473 "Given {}.".format(self.dtype) 1474 ) -> 1475 ds = self._to_temp_dataset().interp( 1476 coords, 1477 method=method, ~/Python/xarray/xarray/core/dataset.py in interp(self, coords, method, assume_sorted, kwargs, **coords_kwargs) 2691 if k in var.dims 2692 } -> 2693 variables[name] = missing.interp(var, var_indexers, method, **kwargs) 2694 elif all(d not in indexers for d in var.dims): 2695 # keep unrelated object array ~/Python/xarray/xarray/core/missing.py in interp(var, indexes_coords, method, **kwargs) 652 else: 653 out_dims.add(d) --> 654 result = result.transpose(*tuple(out_dims)) 655 return result 656 ~/Python/xarray/xarray/core/variable.py in transpose(self, *dims) 1395 return self.copy(deep=False) 1396 -> 1397 data = as_indexable(self._data).transpose(axes) 1398 return type(self)(dims, data, self._attrs, self._encoding, fastpath=True) 1399 ~/Python/xarray/xarray/core/indexing.py in transpose(self, order) 1288 1289 def transpose(self, order): -> 1290 return self.array.transpose(order) 1291 1292 def __getitem__(self, key): ValueError: axes don't match array ```

Anything else we need to know?: It works if dx doesn't vary along t. I .e.: da.interp(y=dy, x=dx.isel(t=0, drop=True), method='linear') works.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Jul 31 2020, 02:39:48) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-514.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.16.2.dev9+gc0399d3 pandas: 1.0.3 numpy: 1.18.4 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.1.3 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.4 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.17.2 distributed: 2.23.0 matplotlib: 3.3.1 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.11 setuptools: 49.6.0.post20200814 pip: 20.2.2 conda: None pytest: None IPython: 7.17.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4463/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 26.777ms · About: xarray-datasette