home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

10 rows where user = 6063709 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, closed_at, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 9
  • pull 1

state 2

  • closed 8
  • open 2

repo 1

  • xarray 10
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
677296128 MDU6SXNzdWU2NzcyOTYxMjg= 4336 cftime_range fails for base cftime.datetime object aidanheerdegen 6063709 open 0     8 2020-08-12T00:56:18Z 2023-11-26T22:40:03Z   CONTRIBUTOR      

What happened:

xarray.cftime_range does not accept dates that use base classcftime.datetime objects.

What you expected to happen:

I expected xarray.cftime_range to raise an exception that this is an unsupported cftime.datetime type and for the documentation to reflect this.

Minimal Complete Verifiable Example:

python import cftime import xarray date = cftime.datetime(10,1,1) xarray.cftime_range(date, periods=3, freq='Y')

Anything else we need to know?:

Returns this error: ```python


TypeError Traceback (most recent call last) <ipython-input-29-d090ea15e436> in <module> 2 import xarray 3 date = cftime.datetime(10,1,1) ----> 4 xarray.cftime_range(date, periods=3, freq='Y')

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftime_offsets.py in cftime_range(start, end, periods, freq, normalize, name, closed, calendar) 973 else: 974 offset = to_offset(freq) --> 975 dates = np.array(list(_generate_range(start, end, periods, offset))) 976 977 left_closed = False

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftime_offsets.py in _generate_range(start, end, periods, offset) 744 """ 745 if start: --> 746 start = offset.rollforward(start) 747 748 if end:

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftime_offsets.py in rollforward(self, date) 526 def rollforward(self, date): 527 """Roll date forward to nearest end of year""" --> 528 if self.onOffset(date): 529 return date 530 else:

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftime_offsets.py in onOffset(self, date) 522 """Check if the given date is in the set of possible dates created 523 using a length-one version of this offset class.""" --> 524 return date.day == _days_in_month(date) and date.month == self.month 525 526 def rollforward(self, date):

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftime_offsets.py in _days_in_month(date) 195 else: 196 reference = type(date)(date.year, date.month + 1, 1) --> 197 return (reference - timedelta(days=1)).day 198 199

TypeError: unsupported operand type(s) for -: 'cftime._cftime.datetime' and 'datetime.timedelta' Works if a `datetime` object with a calendar is used: import cftime import xarray date = cftime.DatetimeGregorian(10,1,1) xarray.cftime_range(date, periods=3, freq='Y') Returns:python CFTimeIndex([0010-12-31 00:00:00, 0011-12-31 00:00:00, 0012-12-31 00:00:00], dtype='object') ``` as expected.

The error occurs here

https://github.com/pydata/xarray/blob/master/xarray/coding/cftime_offsets.py#L197

because this operation is not defined for the base class

https://github.com/Unidata/cftime/blob/master/cftime/_cftime.pyx#L1054

The relevant tests all seem to use datetime strings which are by default standard calendar:

https://github.com/pydata/xarray/blob/master/xarray/coding/cftime_offsets.py#L788

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 2.6.32-754.18.2.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.1.0 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.3 pydap: installed h5netcdf: 0.8.1 h5py: 2.10.0 Nio: 1.5.5 zarr: 2.4.0 cftime: 1.0.3.4 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.5 cfgrib: 0.9.8.4 iris: 2.4.0 bottleneck: 1.3.2 dask: 2.22.0 distributed: 2.22.0 matplotlib: 3.3.0 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: None pint: 0.14 setuptools: 49.2.0.post20200712 pip: 20.1.1 conda: installed pytest: 6.0.1 IPython: 7.17.0 sphinx: 3.2.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4336/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
553930127 MDU6SXNzdWU1NTM5MzAxMjc= 3717 reduce on groupby auto-adds axis argument and complains when axis argument is specified aidanheerdegen 6063709 open 0     3 2020-01-23T04:29:58Z 2022-04-06T15:38:59Z   CONTRIBUTOR      

The behaviour of reduce appears to have changed in recent versions of xarray such that previous code that worked now throws errors.

MCVE Code Sample

I have repurposed someone else's nice code sample for this, thanks!

```python import pandas as pd import xarray as xr import numpy as np

s_date = '1990-01-01' e_date = '2019-05-01' days = pd.date_range(start=s_date, end=e_date, freq='B', name='time') items = pd.Index([str(i) for i in range(300)], name = 'item') dat = xr.DataArray(np.random.rand(len(days), len(items)), coords=[days, items])

print(dat)

def simplesum(array, axis): print(axis) return np.sum(array, axis)

dat.groupby('time.month').reduce(simplesum) dat.groupby('time.month').reduce(simplesum, axis=0) ```

The reduce appears to insert an axis argument if none is specified. This is the output of the first groupby operations with no axis argument: python 0 0 0 0 0 0 0 0 0 0 0 0 Out[41]: <xarray.DataArray (month: 12, item: 300)> array([[330.18949303, 336.97901528, 337.80472647, ..., 322.37053342, 326.84789948, 342.22782336], [300.3301059 , 307.79967902, 322.53148357, ..., 310.20975273, 291.04344738, 310.56010997], [325.71587689, 337.25153307, 331.35493521, ..., 332.43547569, 328.23330226, 326.43909063], ..., [322.96255713, 321.44723754, 312.59983716, ..., 318.79682437, 315.81592617, 314.27316547], [294.29894222, 291.77253983, 310.85452639, ..., 314.0461447 , 298.99012623, 326.08321702], [323.6778518 , 332.71638634, 324.47244831, ..., 326.82774826, 322.09233181, 327.6385762 ]]) Coordinates: * item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299' * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12

The second groupby with axis=0 argument throws an error: ```python


ValueError Traceback (most recent call last) <ipython-input-42-381dec6862e6> in <module> ----> 1 dat.groupby('time.month').reduce(simplesum, axis=0)

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.01/lib/python3.7/site-packages/xarray/core/groupby.py in reduce(self, func, dim, axis, keep_attrs, shortcut, **kwargs) 836 check_reduce_dims(dim, self.dims) 837 --> 838 return self.map(reduce_array, shortcut=shortcut) 839 840

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.01/lib/python3.7/site-packages/xarray/core/groupby.py in map(self, func, shortcut, args, kwargs) 755 grouped = self._iter_grouped() 756 applied = (maybe_wrap_array(arr, func(arr, *args, kwargs)) for arr in grouped) --> 757 return self._combine(applied, shortcut=shortcut) 758 759 def apply(self, func, shortcut=False, args=(), **kwargs):

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.01/lib/python3.7/site-packages/xarray/core/groupby.py in _combine(self, applied, restore_coord_dims, shortcut) 774 def _combine(self, applied, restore_coord_dims=False, shortcut=False): 775 """Recombine the applied objects like the original.""" --> 776 applied_example, applied = peek_at(applied) 777 coord, dim, positions = self._infer_concat_args(applied_example) 778 if shortcut:

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.01/lib/python3.7/site-packages/xarray/core/utils.py in peek_at(iterable) 180 """ 181 gen = iter(iterable) --> 182 peek = next(gen) 183 return peek, itertools.chain([peek], gen) 184

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.01/lib/python3.7/site-packages/xarray/core/groupby.py in <genexpr>(.0) 754 else: 755 grouped = self._iter_grouped() --> 756 applied = (maybe_wrap_array(arr, func(arr, args, *kwargs)) for arr in grouped) 757 return self._combine(applied, shortcut=shortcut) 758

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.01/lib/python3.7/site-packages/xarray/core/groupby.py in reduce_array(ar) 832 833 def reduce_array(ar): --> 834 return ar.reduce(func, dim, axis, keep_attrs=keep_attrs, **kwargs) 835 836 check_reduce_dims(dim, self.dims)

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.01/lib/python3.7/site-packages/xarray/core/variable.py in reduce(self, func, dim, axis, keep_attrs, keepdims, allow_lazy, **kwargs) 1511 dim = None 1512 if dim is not None and axis is not None: -> 1513 raise ValueError("cannot supply both 'axis' and 'dim' arguments") 1514 1515 if dim is not None:

ValueError: cannot supply both 'axis' and 'dim' arguments ```

Expected Output

I would expect the output of both groupby operations to be the same, though reduce says it should flatten the input if there is no dim or axis argument supplied, it doesn't seem to do this.

The second groupby, with axis=0 argument works with older versions of xarray(0.13.0).

Problem Description

It is impossible to specify a dim argument to reduce. It defaults to axis=0 and when a different axis is specified it throws an error.

Output of xr.show_versions()

Version used and produces error:

INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-80.11.2.el8_0.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: en_AU.ISO8859-1 LOCALE: en_AU.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.1 xarray: 0.14.1 pandas: 0.25.3 numpy: 1.17.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: installed h5netcdf: 0.7.4 h5py: 2.10.0 Nio: 1.5.5 zarr: 2.4.0 cftime: 1.0.3.4 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.1 cfgrib: 0.9.7.6 iris: 2.3.0 bottleneck: 1.3.1 dask: 2.9.2 distributed: 2.9.3 matplotlib: 2.2.4 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 45.0.0.post20200113 pip: 19.3.1 conda: None pytest: 5.3.4 IPython: 7.11.1 sphinx: None None

The version of xarray does not throw an error when axis argument is supplied:

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:18:42) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-80.11.2.el8_0.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: en_AU.ISO8859-1 LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.13.0 pandas: 0.25.1 numpy: 1.17.2 scipy: 1.2.1 netCDF4: 1.5.1.2 pydap: installed h5netcdf: 0.7.4 h5py: 2.9.0 Nio: 1.5.5 zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: 0.9.7.2 iris: 2.2.1dev0 bottleneck: 1.2.1 dask: 2.4.0 distributed: 2.4.0 matplotlib: 2.2.4 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: 5.1.2 IPython: 7.8.0 sphinx: None None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3717/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1120276279 I_kwDOAMm_X85Cxg83 6226 open_mfdataset fails with cftime index when using parallel and dask delayed client aidanheerdegen 6063709 closed 0     6 2022-02-01T06:14:07Z 2022-02-10T22:37:37Z 2022-02-10T22:37:37Z CONTRIBUTOR      

What happened?

A call to open_mfdataset with parallel=true fails when using a dask delayed client with newer version of cftime and xarray. This happens with cftime==1.5.2 and xarray==0.20.2 but not cftime==1.5.1 and xarray==0.20.2.

What did you expect to happen?

I expected the call to open_mfdataset to work without error with parallel=True as it does with parallel=False and a previous version of cftime

Minimal Complete Verifiable Example

```python import xarray as xr import numpy as np from dask.distributed import Client

Need a main routine for dask.distributed if run as script

if name == "main":

client = Client(n_workers=1)

t = xr.cftime_range('20010101','20010501', closed='left', calendar='noleap')
x = np.arange(100)
v = np.random.random((t.size,x.size))

da = xr.DataArray(v, coords=[('time',t), ('x',x)])
da.to_netcdf('sample.nc')

# Works
xr.open_mfdataset('sample.nc', parallel=False)

# Throws TypeError exception
xr.open_mfdataset('sample.nc', parallel=True)

```

Relevant log output

python distributed.protocol.core - CRITICAL - Failed to deserialize [32/525] Traceback (most recent call last): File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/core.py", line 111, in loads return msgpack.loads( File "msgpack/_unpacker.pyx", line 194, in msgpack._cmsgpack.unpackb File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/core.py", line 103, in _decode_default return merge_and_deserialize( File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/serialize.py", line 488, in merge_and_deserialize return deserialize(header, merged_frames, deserializers=deserializers) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/serialize.py", line 417, in deserialize return loads(header, frames) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/serialize.py", line 96, in pickle_loads return pickle.loads(x, buffers=new) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 75, in loads return pickle.loads(x) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 255, in _new_Index return cls.__new__(cls, **d) TypeError: __new__() got an unexpected keyword argument 'dtype' Traceback (most recent call last): File "/g/data/v45/aph502/notebooks/test_pickle.py", line 21, in <module> xr.open_mfdataset('sample.nc', parallel=True) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/xarray/backends/api.py", line 916, in open_mfdataset datasets, closers = dask.compute(datasets, closers) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/dask/base.py", line 571, in compute results = schedule(dsk, keys, **kwargs) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/client.py", line 2746, in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/client.py", line 1946, in gather return self.sync( File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/utils.py", line 310, in sync return sync( File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/utils.py", line 364, in sync raise exc.with_traceback(tb) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/utils.py", line 349, in f result[0] = yield future File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/tornado/gen.py", line 762, in run value = future.result() File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/client.py", line 1840, in _gather response = await future File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/client.py", line 1891, in _gather_remote response = await retry_operation(self.scheduler.gather, keys=keys) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/utils_comm.py", line 385, in retry_operation return await retry( File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/utils_comm.py", line 370, in retry return await coro() File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/core.py", line 900, in send_recv_from_rpc return await send_recv(comm=comm, op=key, **kwargs) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/core.py", line 669, in send_recv response = await comm.read(deserializers=deserializers) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/comm/tcp.py", line 232, in read msg = await from_frames( File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/comm/utils.py", line 78, in from_frames res = _from_frames() File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/comm/utils.py", line 61, in _from_frames return protocol.loads( File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/core.py", line 111, in loads return msgpack.loads( File "msgpack/_unpacker.pyx", line 194, in msgpack._cmsgpack.unpackb File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/core.py", line 103, in _decode_default return merge_and_deserialize( File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/serialize.py", line 488, in merge_and_deserialize return deserialize(header, merged_frames, deserializers=deserializers) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/serialize.py", line 417, in deserialize return loads(header, frames) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/serialize.py", line 96, in pickle_loads return pickle.loads(x) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 75, in loads return pickle.loads(x) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 255, in _new_Index return cls.__new__(cls, **d) TypeError: __new__() got an unexpected keyword argument 'dtype'

Anything else we need to know?

It seems similar to previous issues with pickling https://github.com/pydata/xarray/issues/5686 which was fixed in cftime https://github.com/Unidata/cftime/pull/252 but the tests in previous issues still work, so it isn't exactly the same.

Environment

```

INSTALLED VERSIONS

commit: None python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-348.2.1.el8.nci.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: en_AU.ISO8859-1 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4

xarray: 0.20.2 pandas: 1.4.0 numpy: 1.22.1 scipy: 1.7.3 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.13.1 h5py: 3.6.0 Nio: None zarr: 2.10.3 cftime: 1.5.2 nc_time_axis: 1.4.0 PseudoNetCDF: None rasterio: 1.2.6 cfgrib: 0.9.9.1 iris: 3.1.0 bottleneck: 1.3.2 dask: 2022.01.0 distributed: 2022.01.0 matplotlib: 3.5.1 cartopy: 0.19.0.post1 seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: 10.1.0 pint: 0.18 sparse: 0.13.0 setuptools: 59.8.0 pip: 21.3.1 conda: 4.11.0 pytest: 6.2.5 IPython: 8.0.1 sphinx: 4.4.0 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6226/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
963688125 MDU6SXNzdWU5NjM2ODgxMjU= 5686 xindexes set incorrectly for mfdataset with dask client and parallel=True aidanheerdegen 6063709 closed 0     8 2021-08-09T06:29:41Z 2021-08-09T23:44:10Z 2021-08-09T22:36:53Z CONTRIBUTOR      

What happened: Using open_mfdataset with parallel=True with a dask.distributed client active fails to set .xindexes correctly.

What you expected to happen: The indexes should contain an index that can be printed correctly. When using repr the .xindexes fails with TypeError: cannot compute the time difference between dates with different calendars due to an error in .asi8

Minimal Complete Verifiable Example:

```python import xarray as xr import numpy as np from dask.distributed import Client

Need a main routine for dask.distributed if run as script

if name == "main":

client = Client(n_workers=1)

# Create some synthetic data
time_365_decade = xr.cftime_range(start="2100", periods=120, freq="1MS", calendar="noleap")

ds = xr.Dataset(
        {"a": ("time", np.arange(time_365_decade.size))},
        coords={"time": time_365_decade},
)

index_microseconds = ds.xindexes['time'].array.asi8

# Save to a file per year
years, datasets = zip(*ds.groupby("time.year"))
xr.save_mfdataset(datasets, [f"{y}.nc" for y in years])

# Open saved files, parallel=False and asi8 ok
assert (index_microseconds == xr.open_mfdataset('2???.nc', parallel=False).xindexes['time'].array.asi8).all()

# Open saved files, parallel=True and asi8 fails
assert (index_microseconds == xr.open_mfdataset('2???.nc', parallel=True).xindexes['time'].array.asi8).all()

```

Anything else we need to know?: the asi8 function fails

https://github.com/pydata/xarray/blob/main/xarray/coding/cftimeindex.py#L677

because python epoch = self.date_type(1970, 1, 1) returns a cftime.datetime with a calendar and has_year_zero attribute that do not match the index (Pdb) p epoch cftime.datetime(1970, 1, 1, 0, 0, 0, 0, calendar='gregorian', has_year_zero=False)

Previously reported this as https://github.com/pydata/xarray/issues/5677

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-305.7.1.el8.nci.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: en_AU.ISO8859-1 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.3.1 numpy: 1.21.1 scipy: 1.7.1 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 2.10.0 Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: 1.2.6 cfgrib: 0.9.9.0 iris: 3.0.4 bottleneck: 1.3.2 dask: 2021.07.2 distributed: 2021.07.2 matplotlib: 3.4.2 cartopy: 0.19.0.post1 seaborn: 0.11.1 numbagg: None pint: 0.17 setuptools: 52.0.0.post20210125 pip: 21.1.3 conda: 4.10.3 pytest: 6.2.4 IPython: 7.26.0 sphinx: 4.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5686/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
962467654 MDU6SXNzdWU5NjI0Njc2NTQ= 5677 sel slice fails with cftime index when using dask.distributed client aidanheerdegen 6063709 closed 0     2 2021-08-06T07:16:20Z 2021-08-09T06:30:26Z 2021-08-09T06:30:26Z CONTRIBUTOR      

What happened: Tried to .sel() a time slice from a multi-file dataset when dask.distributed client active. Got this error:

```python

KeyError Traceback (most recent call last) /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3360 try: -> 3361 return self._engine.get_loc(casted_key) 3362 except KeyError as err:

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: cftime.datetime(2086, 1, 1, 0, 0, 0, 0, calendar='gregorian', has_year_zero=False)

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last) /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_slice_bound(self, label, side, kind) 5801 try: -> 5802 slc = self.get_loc(label) 5803 except KeyError as err:

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in get_loc(self, key, method, tolerance) 465 else: --> 466 return pd.Index.get_loc(self, key, method=method, tolerance=tolerance) 467

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3362 except KeyError as err: -> 3363 raise KeyError(key) from err 3364

KeyError: cftime.datetime(2086, 1, 1, 0, 0, 0, 0, calendar='gregorian', has_year_zero=False)

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) src/cftime/_cftime.pyx in cftime._cftime.datetime.richcmp()

src/cftime/_cftime.pyx in cftime._cftime.datetime.change_calendar()

ValueError: change_calendar only works for real-world calendars

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last) /local/v45/aph502/tmp/ipykernel_108691/1049912036.py in <module> ----> 1 u.sel(time=slice(start_time,end_time))

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/core/dataarray.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs) 1313 Dimensions without coordinates: points 1314 """ -> 1315 ds = self._to_temp_dataset().sel( 1316 indexers=indexers, 1317 drop=drop,

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/core/dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs) 2472 """ 2473 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel") -> 2474 pos_indexers, new_indexes = remap_label_indexers( 2475 self, indexers=indexers, method=method, tolerance=tolerance 2476 )

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/core/coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, **indexers_kwargs) 419 } 420 --> 421 pos_indexers, new_indexes = indexing.remap_label_indexers( 422 obj, v_indexers, method=method, tolerance=tolerance 423 )

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/core/indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance) 115 for dim, index in indexes.items(): 116 labels = grouped_indexers[dim] --> 117 idxr, new_idx = index.query(labels, method=method, tolerance=tolerance) 118 pos_indexers[dim] = idxr 119 if new_idx is not None:

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/core/indexes.py in query(self, labels, method, tolerance) 196 197 if isinstance(label, slice): --> 198 indexer = _query_slice(index, label, coord_name, method, tolerance) 199 elif is_dict_like(label): 200 raise ValueError(

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/core/indexes.py in _query_slice(index, label, coord_name, method, tolerance) 89 "cannot use method argument if any indexers are slice objects" 90 ) ---> 91 indexer = index.slice_indexer( 92 _sanitize_slice_element(label.start), 93 _sanitize_slice_element(label.stop),

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in slice_indexer(self, start, end, step, kind) 5684 slice(1, 3, None) 5685 """ -> 5686 start_slice, end_slice = self.slice_locs(start, end, step=step) 5687 5688 # return a slice

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in slice_locs(self, start, end, step, kind) 5886 start_slice = None 5887 if start is not None: -> 5888 start_slice = self.get_slice_bound(start, "left") 5889 if start_slice is None: 5890 start_slice = 0

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_slice_bound(self, label, side, kind) 5803 except KeyError as err: 5804 try: -> 5805 return self._searchsorted_monotonic(label, side) 5806 except ValueError: 5807 # raise the original KeyError

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in _searchsorted_monotonic(self, label, side) 5754 def _searchsorted_monotonic(self, label, side: str_t = "left"): 5755 if self.is_monotonic_increasing: -> 5756 return self.searchsorted(label, side=side) 5757 elif self.is_monotonic_decreasing: 5758 # np.searchsorted expects ascending sort order, have to reverse

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/base.py in searchsorted(self, value, side, sorter) 1219 @doc(_shared_docs["searchsorted"], klass="Index") 1220 def searchsorted(self, value, side="left", sorter=None) -> np.ndarray: -> 1221 return algorithms.searchsorted(self._values, value, side=side, sorter=sorter) 1222 1223 def drop_duplicates(self, keep="first"):

/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/algorithms.py in searchsorted(arr, value, side, sorter) 1583 arr = ensure_wrapped_if_datetimelike(arr) 1584 -> 1585 return arr.searchsorted(value, side=side, sorter=sorter) 1586 1587

src/cftime/_cftime.pyx in cftime._cftime.datetime.richcmp()

TypeError: cannot compare cftime.datetime(2086, 5, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True) and cftime.datetime(2086, 1, 1, 0, 0, 0, 0, calendar='gregorian', has_year_zero=False) ```

So the slice indexing has created a bounding value with the wrong calendar, should be 365_year but is gregorian. python KeyError: cftime.datetime(2086, 1, 1, 0, 0, 0, 0, calendar='gregorian', has_year_zero=False) Note that this only happens when a dask.distributed client is loaded

What you expected to happen: expected it to return the same slice it does without error if the client is not active.

Minimal Complete Verifiable Example: I tried really really hard to create a synthetic example but I couldn't make one that would fail, but loading the mfdataset from disk will make it fail reliably. I have tested multiple times.

The dataset:

xarray.DataArray
'u'
  • time: 15
  • st_ocean: 75
  • yu_ocean: 2700
  • xu_ocean: 3600
  • <label for="section-cde91b8b-6f17-415e-a2cc-e525088a0a57" title="Show/hide data repr" style="box-sizing: unset; grid-column-start: 1; grid-column-end: auto; vertical-align: top; color: var(--xr-font-color2); cursor: pointer;"><svg class="icon xr-icon-database"><use xlink:href="#icon-database"></use></svg></label>
    Array Chunk Bytes 40.74 GiB 3.20 MiB Shape (15, 75, 2700, 3600) (1, 7, 300, 400) Count 26735 Tasks 13365 Chunks Type float32 numpy.ndarray |   | Array | Chunk | Bytes | 40.74 GiB | 3.20 MiB | Shape | (15, 75, 2700, 3600) | (1, 7, 300, 400) | Count | 26735 Tasks | 13365 Chunks | Type | float32 | numpy.ndarray | 1513600270075 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --   40.74 GiB | 3.20 MiB (15, 75, 2700, 3600) | (1, 7, 300, 400) 26735 Tasks | 13365 Chunks float32 | numpy.ndarray
  • <label for="section-c8832f0d-583a-448f-9577-08c50450d161" class="xr-section-summary" style="box-sizing: unset; grid-column-start: 1; grid-column-end: auto; color: var(--xr-font-color2); font-weight: 500; padding-top: 4px; padding-bottom: 4px; cursor: pointer;">Coordinates: 
    • st_ocean
      (st_ocean)
      float64
      0.5413 1.681 ... 5.709e+03
      <label for="attrs-460bfc52-3f95-4c90-80f6-fbf61ba08e31" title="Show/Hide attributes" style="box-sizing: unset; background-color: var(--xr-background-color-row-odd); margin-bottom: 0px; color: var(--xr-font-color2); cursor: pointer;"><svg class="icon xr-icon-file-text2"><use xlink:href="#icon-file-text2"></use></svg></label><label for="data-d437d9a9-1b0b-4ddf-95ea-6ec48973a4a1" title="Show/Hide data repr" style="box-sizing: unset; background-color: var(--xr-background-color-row-odd); margin-bottom: 0px; color: var(--xr-font-color2); cursor: pointer;"><svg class="icon xr-icon-database"><use xlink:href="#icon-database"></use></svg></label>
    • time
      (time)
      object
      2085-10-16 12:00:00 ... 2086-12-...
      <label for="attrs-5c3c11ea-3616-4e6c-8da5-d90a3de74cc8" title="Show/Hide attributes" style="box-sizing: unset; background-color: var(--xr-background-color-row-even); margin-bottom: 0px; color: var(--xr-font-color2); cursor: pointer;"><svg class="icon xr-icon-file-text2"><use xlink:href="#icon-file-text2"></use></svg></label><label for="data-c74ab087-7010-4076-9e77-fe8556853756" title="Show/Hide data repr" style="box-sizing: unset; background-color: var(--xr-background-color-row-even); margin-bottom: 0px; color: var(--xr-font-color2); cursor: pointer;"><svg class="icon xr-icon-database"><use xlink:href="#icon-database"></use></svg></label>
      array([cftime.datetime(2085, 10, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2085, 11, 16, 0, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2085, 12, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2086, 1, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2086, 2, 15, 0, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2086, 3, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2086, 4, 16, 0, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2086, 5, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2086, 6, 16, 0, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2086, 7, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2086, 8, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2086, 9, 16, 0, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2086, 10, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2086, 11, 16, 0, 0, 0, 0, calendar='noleap', has_year_zero=True),
             cftime.datetime(2086, 12, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True)],
            dtype=object)
    • xu_ocean
      (xu_ocean)
      float64
      -279.9 -279.8 -279.7 ... 79.9 80.0
      <label for="attrs-deb0e0ca-d92a-4695-8544-a9985caa3df3" title="Show/Hide attributes" style="box-sizing: unset; background-color: var(--xr-background-color-row-odd); margin-bottom: 0px; color: var(--xr-font-color2); cursor: pointer;"><svg class="icon xr-icon-file-text2"><use xlink:href="#icon-file-text2"></use></svg></label><label for="data-aafd5159-4edd-4505-a77a-687ba340da33" title="Show/Hide data repr" style="box-sizing: unset; background-color: var(--xr-background-color-row-odd); margin-bottom: 0px; color: var(--xr-font-color2); cursor: pointer;"><svg class="icon xr-icon-database"><use xlink:href="#icon-database"></use></svg></label>
    • yu_ocean
      (yu_ocean)
      float64
      -81.09 -81.05 -81.0 ... 89.96 90.0
      <label for="attrs-0cea6a87-ca0c-47ab-a25c-5784ea14a5ba" title="Show/Hide attributes" style="box-sizing: unset; background-color: var(--xr-background-color-row-even); margin-bottom: 0px; color: var(--xr-font-color2); cursor: pointer;"><svg class="icon xr-icon-file-text2"><use xlink:href="#icon-file-text2"></use></svg></label><label for="data-282162ad-9547-401b-976f-a22fa5efeae9" title="Show/Hide data repr" style="box-sizing: unset; background-color: var(--xr-background-color-row-even); margin-bottom: 0px; color: var(--xr-font-color2); cursor: pointer;"><svg class="icon xr-icon-database"><use xlink:href="#icon-database"></use></svg></label>
    • <label for="section-c71a9525-5800-445c-b401-78088cfc4247" class="xr-section-summary" style="box-sizing: unset; grid-column-start: 1; grid-column-end: auto; color: var(--xr-font-color2); font-weight: 500; padding-top: 4px; padding-bottom: 4px; cursor: pointer;">Attributes: 
      <dl class="xr-attrs" style="box-sizing: unset; padding: 0px; grid-column-start: 1; grid-column-end: -1; display: grid; width: 700px; overflow: hidden; margin: 0px; grid-template-columns: 125px auto;"><dt style="box-sizing: unset; display: block; font-weight: normal; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; grid-column-start: 1; grid-column-end: auto;">long_name :</dt><dd style="box-sizing: unset; display: block; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; grid-column-start: 2; grid-column-end: auto; white-space: pre-wrap; word-break: break-all;">i-current</dd><dt style="box-sizing: unset; display: block; font-weight: normal; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; grid-column-start: 1; grid-column-end: auto;">units :</dt><dd style="box-sizing: unset; display: block; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; grid-column-start: 2; grid-column-end: auto; white-space: pre-wrap; word-break: break-all;">m/sec</dd><dt style="box-sizing: unset; display: block; font-weight: normal; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; grid-column-start: 1; grid-column-end: auto;">valid_range :</dt><dd style="box-sizing: unset; display: block; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; grid-column-start: 2; grid-column-end: auto; white-space: pre-wrap; word-break: break-all;">[-10. 10.]</dd><dt style="box-sizing: unset; display: block; font-weight: normal; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; grid-column-start: 1; grid-column-end: auto;">cell_methods :</dt><dd style="box-sizing: unset; display: block; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; grid-column-start: 2; grid-column-end: auto; white-space: pre-wrap; word-break: break-all;">time: mean</dd><dt style="box-sizing: unset; display: block; font-weight: normal; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; grid-column-start: 1; grid-column-end: auto;">time_avg_info :</dt><dd style="box-sizing: unset; display: block; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; grid-column-start: 2; grid-column-end: auto; white-space: pre-wrap; word-break: break-all;">average_T1,average_T2,average_DT</dd><dt style="box-sizing: unset; display: block; font-weight: normal; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; grid-column-start: 1; grid-column-end: auto;">coordinates :</dt><dd style="box-sizing: unset; display: block; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; grid-column-start: 2; grid-column-end: auto; white-space: pre-wrap; word-break: break-all;">geolon_c geolat_c</dd><dt style="box-sizing: unset; display: block; font-weight: normal; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; grid-column-start: 1; grid-column-end: auto;">standard_name :</dt><dd style="box-sizing: unset; display: block; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; grid-column-start: 2; grid-column-end: auto; white-space: pre-wrap; word-break: break-all;">sea_water_x_velocity</dd><dt style="box-sizing: unset; display: block; font-weight: normal; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; grid-column-start: 1; grid-column-end: auto;">time_bounds :</dt><dd style="box-sizing: unset; display: block; float: left; width: auto; padding: 0px 10px 0px 0px; margin: 0px; grid-column-start: 2; grid-column-end: auto; white-space: pre-wrap; word-break: break-all;"><xarray.DataArray 'time_bounds' (time: 15, nv: 2)> dask.array<concatenate, shape=(15, 2), dtype=timedelta64[ns], chunksize=(1, 2), chunktype=numpy.ndarray> Coordinates: * time (time) object 2085-10-16 12:00:00 ... 2086-12-16 12:00:00 * nv (nv) float64 1.0 2.0 Attributes: long_name: time axis boundaries calendar: NOLEAP</dd></dl>
      </label>
    </label>

```python

FWIW

start_time = '2086-01-01' end_time = '2086-12-31' u.sel(time=slice(start_time,end_time)) ```

Anything else we need to know?: I tried following the code execution through with pdb and it seems to start going wrong here

https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/core/indexing.py#L55

by line 63 data_obj.xindexes is already in a bad state

https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/core/indexing.py#L63

python (Pdb) data_obj.xindexes *** TypeError: cannot compute the time difference between dates with different calendars

It is called here

https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/core/indexing.py#L106-L108

but it isn't obvious to me how that bad state is generated.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-326.el8.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: en_US.UTF-8 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.3.1 numpy: 1.21.1 scipy: 1.7.0 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 2.10.0 Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: 1.2.6 cfgrib: 0.9.9.0 iris: 3.0.4 bottleneck: 1.3.2 dask: 2021.07.2 distributed: 2021.07.2 matplotlib: 3.4.2 cartopy: 0.19.0.post1 seaborn: 0.11.1 numbagg: None pint: 0.17 setuptools: 52.0.0.post20210125 pip: 21.1.3 conda: 4.10.3 pytest: 6.2.4 IPython: 7.26.0 sphinx: 4.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5677/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
677307460 MDU6SXNzdWU2NzczMDc0NjA= 4337 cftime_range does not support default cftime.datetime formatted output strings aidanheerdegen 6063709 closed 0     5 2020-08-12T01:28:30Z 2020-08-17T23:27:07Z 2020-08-17T23:27:07Z CONTRIBUTOR      

Is your feature request related to a problem? Please describe.

The xarray.cftime_range does not support datetime strings that are the default output from cftime.datetime.strftime() which are the format which cftime_range itself uses internally.

python import cftime import xarray date = cftime.datetime(10,1,1).strftime() print(date) xarray.cftime_range(date, periods=3, freq='Y') outputs ``` 10-01-01 00:00:00


ValueError Traceback (most recent call last) <ipython-input-70-a16c1fcab8d6> in <module> 3 date = cftime.datetime(10,1,1).strftime() 4 print(date) ----> 5 xarray.cftime_range(date, periods=3, freq='Y')

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftime_offsets.py in cftime_range(start, end, periods, freq, normalize, name, closed, calendar) 963 964 if start is not None: --> 965 start = to_cftime_datetime(start, calendar) 966 start = _maybe_normalize_date(start, normalize) 967 if end is not None:

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftime_offsets.py in to_cftime_datetime(date_str_or_date, calendar) 683 "a calendar type must be provided" 684 ) --> 685 date, _ = _parse_iso8601_with_reso(get_date_type(calendar), date_str_or_date) 686 return date 687 elif isinstance(date_str_or_date, cftime.datetime):

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftimeindex.py in _parse_iso8601_with_reso(date_type, timestr) 101 102 default = date_type(1, 1, 1) --> 103 result = parse_iso8601(timestr) 104 replace = {} 105

/g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftimeindex.py in parse_iso8601(datetime_string) 94 if match: 95 return match.groupdict() ---> 96 raise ValueError("no ISO-8601 match for string: %s" % datetime_string) 97 98

ValueError: no ISO-8601 match for string: 10-01-01 00:00:00 ```

Describe the solution you'd like It would be good if xarray.cftime_range supported the default strftime format output from cftime.datetime objects. It is confusing that it uses this format with repr but explicitly does not support it.

Describe alternatives you've considered

Specifying an ISO-8601 compatible format (using T separator) isn't general as it doesn't work for years < 1000 because the year field is not zero padded. python import cftime import xarray date = cftime.datetime(10,1,1).strftime('%Y-%m-%dT%H:%M:%S') print('|{}|'.format(date)) xarray.cftime_range(date, periods=3, freq='Y') produces | 10-01-01T00:00:00| and the error as above.

A work-around is to zero-pad manually python import cftime import xarray date = '{:0>19}'.format(cftime.datetime(10,1,1).strftime('%Y-%m-%dT%H:%M:%S').lstrip()) print(date) xarray.cftime_range(date, periods=3, freq='Y') produces 0010-01-01T00:00:00 CFTimeIndex([0010-12-31 00:00:00, 0011-12-31 00:00:00, 0012-12-31 00:00:00], dtype='object')

Additional context I think this is a relatively small addition to the codebase but would make it easier and less confusing to use the default format that is also used by the the function itself. It is easy to support as it is consistent and uniform.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4337/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
481005183 MDExOlB1bGxSZXF1ZXN0MzA3NTkwNDYw 3220 BUG: Fixes GH3215 aidanheerdegen 6063709 closed 0     7 2019-08-15T05:55:36Z 2019-08-28T06:45:42Z 2019-08-28T06:45:35Z CONTRIBUTOR   0 pydata/xarray/pulls/3220

Explicit cast to numpy array to avoid np.ravel calling out to dask

  • [x] Closes #3215
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3220/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
480512400 MDU6SXNzdWU0ODA1MTI0MDA= 3215 decode_cf called on mfdataset throws error: 'Array' object has no attribute 'tolist' aidanheerdegen 6063709 closed 0     9 2019-08-14T06:56:35Z 2019-08-28T06:45:35Z 2019-08-28T06:45:35Z CONTRIBUTOR      

MCVE Code Sample

```python import xarray

file = 'temp_048.nc'

Works ok with open_dataset

ds = xarray.open_dataset(file, decode_cf=True) ds = xarray.open_dataset(file, decode_cf=False) ds = xarray.decode_cf(ds)

Fails with open_mfdataset

ds = xarray.open_mfdataset(file, decode_cf=True) ds = xarray.open_mfdataset(file, decode_cf=False)

This line throws an exception

ds = xarray.decode_cf(ds) ```

Expected Output

Nothing

Problem Description

When opening data with open_mfdataset calling decode_cf throws an error, when called as a separate step, but works as part of the open_mfdataset call. Error is: Traceback (most recent call last): File "tmp.py", line 11, in <module> ds = xarray.decode_cf(ds) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 479, in decode_cf decode_coords, drop_variables=drop_variables, use_cftime=use_cftime) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 401, in decode_cf_variables stack_char_dim=stack_char_dim, use_cftime=use_cftime) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 306, in decode_cf_variable var = coder.decode(var, name=name) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 419, in decode self.use_cftime) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 90, in _decode_cf_datetime_dtype last_item(values) or [0]]) File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/core/formatting.py", line 99, in last_item return np.ravel(array[indexer]).tolist() AttributeError: 'Array' object has no attribute 'tolist'

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:18:42) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.21.3.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: C LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.25.0 numpy: 1.17.0 scipy: 1.2.1 netCDF4: 1.5.1.2 pydap: installed h5netcdf: 0.7.4 h5py: 2.9.0 Nio: 1.5.5 zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: 1.2.0 PseudonetCDF: None rasterio: None cfgrib: 0.9.7.1 iris: 2.2.1dev0 bottleneck: 1.2.1 dask: 2.2.0 distributed: 2.2.0 matplotlib: 2.2.4 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 41.0.1 pip: 19.1.1 conda: installed pytest: 5.0.1 IPython: 7.7.0 sphinx: None

There is no error using an older version of numpy with the same xarray version:

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.21.3.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: C LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.2.1 netCDF4: 1.5.1.2 pydap: installed h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: 1.2.0 PseudonetCDF: None rasterio: None cfgrib: 0.9.7 iris: 2.2.1dev0 bottleneck: 1.2.1 dask: 1.2.2 distributed: 1.28.1 matplotlib: 2.2.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 41.0.1 pip: 19.1.1 conda: installed pytest: 4.6.3 IPython: 7.5.0 sphinx: None

Looks like the tollst() method has disappeared from something, but even in the debugger it isn't obvious to me exactly why this is happening. I can call list on np.ravel(array[indexer]) at the same point and it works.

The netcdf file I am using can be recreated from this CDL dump ``` netcdf temp_048 { dimensions: time = UNLIMITED ; // (5 currently) nv = 2 ; variables: double average_T1(time) ; average_T1:long_name = "Start time for average period" ; average_T1:units = "days since 1958-01-01 00:00:00" ; average_T1:missing_value = 1.e+20 ; average_T1:_FillValue = 1.e+20 ; double time(time) ; time:long_name = "time" ; time:units = "days since 1958-01-01 00:00:00" ; time:cartesian_axis = "T" ; time:calendar_type = "GREGORIAN" ; time:calendar = "GREGORIAN" ; time:bounds = "time_bounds" ; double time_bounds(time, nv) ; time_bounds:long_name = "time axis boundaries" ; time_bounds:units = "days" ; time_bounds:missing_value = 1.e+20 ; time_bounds:_FillValue = 1.e+20 ;

// global attributes: :filename = "ocean.nc" ; :title = "MOM5" ; :grid_type = "mosaic" ; :grid_tile = "1" ; :history = "Wed Aug 14 16:38:53 2019: ncks -O -v average_T1 /g/data3/hh5/tmp/cosima/access-om2/1deg_jra55v13_iaf_spinup1_B1_lastcycle/output048/ocean/ocean.nc temp_048.nc" ; :NCO = "netCDF Operators version 4.7.7 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ; data:

average_T1 = 87659, 88024, 88389, 88754, 89119 ;

time = 87841.5, 88206.5, 88571.5, 88936.5, 89301.5 ;

time_bounds = 87659, 88024, 88024, 88389, 88389, 88754, 88754, 89119, 89119, 89484 ; } ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3215/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
334778045 MDU6SXNzdWUzMzQ3NzgwNDU= 2244 Implement shift for CFTimeIndex aidanheerdegen 6063709 closed 0     3 2018-06-22T07:42:16Z 2018-10-02T14:44:30Z 2018-10-02T14:44:30Z CONTRIBUTOR      

Code Sample

```python import numpy as np import xarray as xr import pandas as pd

from cftime import num2date, DatetimeNoLeap

times = num2date(np.arange(730), calendar='noleap', units='days since 0001-01-01') da = xr.DataArray(np.arange(730), coords=[times], dims=['time']) ```

Problem description

I am trying to shift a time index as I need to align datasets to a common start point.

Directly incrementing one of the CFTimeIndex values works: ```python

da.time.get_index('time')[0] + pd.Timedelta('365 days') cftime.DatetimeNoLeap(2, 1, 1, 0, 0, 0, 0, -1, 1) Trying to use `shift` does not:python da.time.get_index('time').shift(1,'Y') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.04/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2629, in shift type(self).name) NotImplementedError: Not supported for type CFTimeIndex ```

If I want to shift a time index is the only way currently is to loop over all the individual elements of the index and add a time offset to each.

Expected Output

I would expect to have CFTimeIndex shifted by the desired time delta.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-693.17.1.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C LOCALE: None.None xarray: 0.10.7 pandas: 0.23.1 numpy: 1.14.5 scipy: 1.1.0 netCDF4: 1.3.1 h5netcdf: 0.5.1 h5py: 2.8.0 Nio: None zarr: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.17.5 distributed: 1.21.8 matplotlib: 1.5.3 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.2.0 pip: 9.0.3 conda: None pytest: 3.6.1 IPython: 6.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2244/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
102703065 MDU6SXNzdWUxMDI3MDMwNjU= 548 Support for netcdf4/hdf5 compression aidanheerdegen 6063709 closed 0     4 2015-08-24T04:22:07Z 2015-10-08T01:08:51Z 2015-10-08T01:08:51Z CONTRIBUTOR      

It would be great to be able to specify netCDF4 compression parameters when saving datasets.

If this is unlikely to be supported, can you suggest a reasonable work-around? I am assuming it would involve directly accessing a backend?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/548/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 22.103ms · About: xarray-datasette