id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1120276279,I_kwDOAMm_X85Cxg83,6226,open_mfdataset fails with cftime index when using parallel and dask delayed client,6063709,closed,0,,,6,2022-02-01T06:14:07Z,2022-02-10T22:37:37Z,2022-02-10T22:37:37Z,CONTRIBUTOR,,,,"### What happened? A call to `open_mfdataset` with `parallel=true` fails when using a dask delayed client with newer version of `cftime` and `xarray`. This happens with `cftime==1.5.2` and `xarray==0.20.2` but not `cftime==1.5.1` and `xarray==0.20.2`. ### What did you expect to happen? I expected the call to `open_mfdataset` to work without error with `parallel=True` as it does with `parallel=False` and a previous version of `cftime` ### Minimal Complete Verifiable Example ```python import xarray as xr import numpy as np from dask.distributed import Client # Need a main routine for dask.distributed if run as script if __name__ == ""__main__"": client = Client(n_workers=1) t = xr.cftime_range('20010101','20010501', closed='left', calendar='noleap') x = np.arange(100) v = np.random.random((t.size,x.size)) da = xr.DataArray(v, coords=[('time',t), ('x',x)]) da.to_netcdf('sample.nc') # Works xr.open_mfdataset('sample.nc', parallel=False) # Throws TypeError exception xr.open_mfdataset('sample.nc', parallel=True) ``` ### Relevant log output ```python distributed.protocol.core - CRITICAL - Failed to deserialize [32/525] Traceback (most recent call last): File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/core.py"", line 111, in loads return msgpack.loads( File ""msgpack/_unpacker.pyx"", line 194, in msgpack._cmsgpack.unpackb File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/core.py"", line 103, in _decode_default return merge_and_deserialize( File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/serialize.py"", line 488, in merge_and_deserialize return deserialize(header, merged_frames, deserializers=deserializers) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/serialize.py"", line 417, in deserialize return loads(header, frames) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/serialize.py"", line 96, in pickle_loads return pickle.loads(x, buffers=new) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/pickle.py"", line 75, in loads return pickle.loads(x) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/pandas/core/indexes/base.py"", line 255, in _new_Index return cls.__new__(cls, **d) TypeError: __new__() got an unexpected keyword argument 'dtype' Traceback (most recent call last): File ""/g/data/v45/aph502/notebooks/test_pickle.py"", line 21, in xr.open_mfdataset('sample.nc', parallel=True) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/xarray/backends/api.py"", line 916, in open_mfdataset datasets, closers = dask.compute(datasets, closers) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/dask/base.py"", line 571, in compute results = schedule(dsk, keys, **kwargs) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/client.py"", line 2746, in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/client.py"", line 1946, in gather return self.sync( File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/utils.py"", line 310, in sync return sync( File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/utils.py"", line 364, in sync raise exc.with_traceback(tb) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/utils.py"", line 349, in f result[0] = yield future File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/tornado/gen.py"", line 762, in run value = future.result() File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/client.py"", line 1840, in _gather response = await future File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/client.py"", line 1891, in _gather_remote response = await retry_operation(self.scheduler.gather, keys=keys) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/utils_comm.py"", line 385, in retry_operation return await retry( File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/utils_comm.py"", line 370, in retry return await coro() File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/core.py"", line 900, in send_recv_from_rpc return await send_recv(comm=comm, op=key, **kwargs) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/core.py"", line 669, in send_recv response = await comm.read(deserializers=deserializers) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/comm/tcp.py"", line 232, in read msg = await from_frames( File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/comm/utils.py"", line 78, in from_frames res = _from_frames() File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/comm/utils.py"", line 61, in _from_frames return protocol.loads( File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/core.py"", line 111, in loads return msgpack.loads( File ""msgpack/_unpacker.pyx"", line 194, in msgpack._cmsgpack.unpackb File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/core.py"", line 103, in _decode_default return merge_and_deserialize( File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/serialize.py"", line 488, in merge_and_deserialize return deserialize(header, merged_frames, deserializers=deserializers) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/serialize.py"", line 417, in deserialize return loads(header, frames) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/serialize.py"", line 96, in pickle_loads return pickle.loads(x) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/distributed/protocol/pickle.py"", line 75, in loads return pickle.loads(x) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/site-packages/pandas/core/indexes/base.py"", line 255, in _new_Index return cls.__new__(cls, **d) TypeError: __new__() got an unexpected keyword argument 'dtype' ``` ### Anything else we need to know? It seems similar to previous issues with pickling https://github.com/pydata/xarray/issues/5686 which was fixed in `cftime` https://github.com/Unidata/cftime/pull/252 but the tests in previous issues still work, so it isn't *exactly* the same. ### Environment ``` INSTALLED VERSIONS ------------------ commit: None python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-348.2.1.el8.nci.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: en_AU.ISO8859-1 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.20.2 pandas: 1.4.0 numpy: 1.22.1 scipy: 1.7.3 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.13.1 h5py: 3.6.0 Nio: None zarr: 2.10.3 cftime: 1.5.2 nc_time_axis: 1.4.0 PseudoNetCDF: None rasterio: 1.2.6 cfgrib: 0.9.9.1 iris: 3.1.0 bottleneck: 1.3.2 dask: 2022.01.0 distributed: 2022.01.0 matplotlib: 3.5.1 cartopy: 0.19.0.post1 seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: 10.1.0 pint: 0.18 sparse: 0.13.0 setuptools: 59.8.0 pip: 21.3.1 conda: 4.11.0 pytest: 6.2.5 IPython: 8.0.1 sphinx: 4.4.0 ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6226/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 963688125,MDU6SXNzdWU5NjM2ODgxMjU=,5686,xindexes set incorrectly for mfdataset with dask client and parallel=True,6063709,closed,0,,,8,2021-08-09T06:29:41Z,2021-08-09T23:44:10Z,2021-08-09T22:36:53Z,CONTRIBUTOR,,,," **What happened**: Using `open_mfdataset` with `parallel=True` with a `dask.distributed` client active fails to set `.xindexes` correctly. **What you expected to happen**: The `indexes` should contain an index that can be printed correctly. When using `repr` the `.xindexes` fails with `TypeError: cannot compute the time difference between dates with different calendars` due to an error in `.asi8` **Minimal Complete Verifiable Example**: ```python import xarray as xr import numpy as np from dask.distributed import Client # Need a main routine for dask.distributed if run as script if __name__ == ""__main__"": client = Client(n_workers=1) # Create some synthetic data time_365_decade = xr.cftime_range(start=""2100"", periods=120, freq=""1MS"", calendar=""noleap"") ds = xr.Dataset( {""a"": (""time"", np.arange(time_365_decade.size))}, coords={""time"": time_365_decade}, ) index_microseconds = ds.xindexes['time'].array.asi8 # Save to a file per year years, datasets = zip(*ds.groupby(""time.year"")) xr.save_mfdataset(datasets, [f""{y}.nc"" for y in years]) # Open saved files, parallel=False and asi8 ok assert (index_microseconds == xr.open_mfdataset('2???.nc', parallel=False).xindexes['time'].array.asi8).all() # Open saved files, parallel=True and asi8 fails assert (index_microseconds == xr.open_mfdataset('2???.nc', parallel=True).xindexes['time'].array.asi8).all() ``` **Anything else we need to know?**: the `asi8` function fails https://github.com/pydata/xarray/blob/main/xarray/coding/cftimeindex.py#L677 because ```python epoch = self.date_type(1970, 1, 1) ``` returns a `cftime.datetime` with a calendar and `has_year_zero` attribute that do not match the index ``` (Pdb) p epoch cftime.datetime(1970, 1, 1, 0, 0, 0, 0, calendar='gregorian', has_year_zero=False) ``` Previously reported this as https://github.com/pydata/xarray/issues/5677 **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-305.7.1.el8.nci.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: en_AU.ISO8859-1 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.3.1 numpy: 1.21.1 scipy: 1.7.1 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 2.10.0 Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: 1.2.6 cfgrib: 0.9.9.0 iris: 3.0.4 bottleneck: 1.3.2 dask: 2021.07.2 distributed: 2021.07.2 matplotlib: 3.4.2 cartopy: 0.19.0.post1 seaborn: 0.11.1 numbagg: None pint: 0.17 setuptools: 52.0.0.post20210125 pip: 21.1.3 conda: 4.10.3 pytest: 6.2.4 IPython: 7.26.0 sphinx: 4.1.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5686/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 962467654,MDU6SXNzdWU5NjI0Njc2NTQ=,5677,sel slice fails with cftime index when using dask.distributed client,6063709,closed,0,,,2,2021-08-06T07:16:20Z,2021-08-09T06:30:26Z,2021-08-09T06:30:26Z,CONTRIBUTOR,,,," **What happened**: Tried to `.sel()` a time slice from a multi-file dataset when `dask.distributed` client active. Got this error: ```python --------------------------------------------------------------------------- KeyError Traceback (most recent call last) /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3360 try: -> 3361 return self._engine.get_loc(casted_key) 3362 except KeyError as err: /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: cftime.datetime(2086, 1, 1, 0, 0, 0, 0, calendar='gregorian', has_year_zero=False) The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_slice_bound(self, label, side, kind) 5801 try: -> 5802 slc = self.get_loc(label) 5803 except KeyError as err: /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in get_loc(self, key, method, tolerance) 465 else: --> 466 return pd.Index.get_loc(self, key, method=method, tolerance=tolerance) 467 /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3362 except KeyError as err: -> 3363 raise KeyError(key) from err 3364 KeyError: cftime.datetime(2086, 1, 1, 0, 0, 0, 0, calendar='gregorian', has_year_zero=False) During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) src/cftime/_cftime.pyx in cftime._cftime.datetime.__richcmp__() src/cftime/_cftime.pyx in cftime._cftime.datetime.change_calendar() ValueError: change_calendar only works for real-world calendars During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) /local/v45/aph502/tmp/ipykernel_108691/1049912036.py in ----> 1 u.sel(time=slice(start_time,end_time)) /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/core/dataarray.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs) 1313 Dimensions without coordinates: points 1314 """""" -> 1315 ds = self._to_temp_dataset().sel( 1316 indexers=indexers, 1317 drop=drop, /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/core/dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs) 2472 """""" 2473 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, ""sel"") -> 2474 pos_indexers, new_indexes = remap_label_indexers( 2475 self, indexers=indexers, method=method, tolerance=tolerance 2476 ) /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/core/coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, **indexers_kwargs) 419 } 420 --> 421 pos_indexers, new_indexes = indexing.remap_label_indexers( 422 obj, v_indexers, method=method, tolerance=tolerance 423 ) /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/core/indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance) 115 for dim, index in indexes.items(): 116 labels = grouped_indexers[dim] --> 117 idxr, new_idx = index.query(labels, method=method, tolerance=tolerance) 118 pos_indexers[dim] = idxr 119 if new_idx is not None: /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/core/indexes.py in query(self, labels, method, tolerance) 196 197 if isinstance(label, slice): --> 198 indexer = _query_slice(index, label, coord_name, method, tolerance) 199 elif is_dict_like(label): 200 raise ValueError( /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/xarray/core/indexes.py in _query_slice(index, label, coord_name, method, tolerance) 89 ""cannot use ``method`` argument if any indexers are slice objects"" 90 ) ---> 91 indexer = index.slice_indexer( 92 _sanitize_slice_element(label.start), 93 _sanitize_slice_element(label.stop), /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in slice_indexer(self, start, end, step, kind) 5684 slice(1, 3, None) 5685 """""" -> 5686 start_slice, end_slice = self.slice_locs(start, end, step=step) 5687 5688 # return a slice /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in slice_locs(self, start, end, step, kind) 5886 start_slice = None 5887 if start is not None: -> 5888 start_slice = self.get_slice_bound(start, ""left"") 5889 if start_slice is None: 5890 start_slice = 0 /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_slice_bound(self, label, side, kind) 5803 except KeyError as err: 5804 try: -> 5805 return self._searchsorted_monotonic(label, side) 5806 except ValueError: 5807 # raise the original KeyError /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/indexes/base.py in _searchsorted_monotonic(self, label, side) 5754 def _searchsorted_monotonic(self, label, side: str_t = ""left""): 5755 if self.is_monotonic_increasing: -> 5756 return self.searchsorted(label, side=side) 5757 elif self.is_monotonic_decreasing: 5758 # np.searchsorted expects ascending sort order, have to reverse /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/base.py in searchsorted(self, value, side, sorter) 1219 @doc(_shared_docs[""searchsorted""], klass=""Index"") 1220 def searchsorted(self, value, side=""left"", sorter=None) -> np.ndarray: -> 1221 return algorithms.searchsorted(self._values, value, side=side, sorter=sorter) 1222 1223 def drop_duplicates(self, keep=""first""): /g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/pandas/core/algorithms.py in searchsorted(arr, value, side, sorter) 1583 arr = ensure_wrapped_if_datetimelike(arr) 1584 -> 1585 return arr.searchsorted(value, side=side, sorter=sorter) 1586 1587 src/cftime/_cftime.pyx in cftime._cftime.datetime.__richcmp__() TypeError: cannot compare cftime.datetime(2086, 5, 16, 12, 0, 0, 0, calendar='noleap', has_year_zero=True) and cftime.datetime(2086, 1, 1, 0, 0, 0, 0, calendar='gregorian', has_year_zero=False) ``` So the slice indexing has created a bounding value with the wrong calendar, should be `365_year` but is `gregorian`. ```python KeyError: cftime.datetime(2086, 1, 1, 0, 0, 0, 0, calendar='gregorian', has_year_zero=False) ``` Note that this only happens when a `dask.distributed` client is loaded **What you expected to happen**: expected it to return the same slice it does without error if the client is not active. **Minimal Complete Verifiable Example**: I tried really really hard to create a synthetic example but I couldn't make one that would fail, but loading the `mfdataset` from disk will make it fail reliably. I have tested multiple times. The dataset:
xarray.DataArray
'u'
  • time: 15
  • st_ocean: 75
  • yu_ocean: 2700
  • xu_ocean: 3600
  • Array Chunk Bytes 40.74 GiB 3.20 MiB Shape (15, 75, 2700, 3600) (1, 7, 300, 400) Count 26735 Tasks 13365 Chunks Type float32 numpy.ndarray |   | Array | Chunk | Bytes | 40.74 GiB | 3.20 MiB | Shape | (15, 75, 2700, 3600) | (1, 7, 300, 400) | Count | 26735 Tasks | 13365 Chunks | Type | float32 | numpy.ndarray | 1513600270075 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --   40.74 GiB | 3.20 MiB (15, 75, 2700, 3600) | (1, 7, 300, 400) 26735 Tasks | 13365 Chunks float32 | numpy.ndarray
```python # FWIW start_time = '2086-01-01' end_time = '2086-12-31' u.sel(time=slice(start_time,end_time)) ``` **Anything else we need to know?**: I tried following the code execution through with `pdb` and it seems to start going wrong here https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/core/indexing.py#L55 by line 63 `data_obj.xindexes` is already in a bad state https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/core/indexing.py#L63 ```python (Pdb) data_obj.xindexes *** TypeError: cannot compute the time difference between dates with different calendars ``` It is called here https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/core/indexing.py#L106-L108 but it isn't obvious to me how that bad state is generated. **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-326.el8.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: en_US.UTF-8 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.3.1 numpy: 1.21.1 scipy: 1.7.0 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 2.10.0 Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: 1.2.6 cfgrib: 0.9.9.0 iris: 3.0.4 bottleneck: 1.3.2 dask: 2021.07.2 distributed: 2021.07.2 matplotlib: 3.4.2 cartopy: 0.19.0.post1 seaborn: 0.11.1 numbagg: None pint: 0.17 setuptools: 52.0.0.post20210125 pip: 21.1.3 conda: 4.10.3 pytest: 6.2.4 IPython: 7.26.0 sphinx: 4.1.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5677/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 677307460,MDU6SXNzdWU2NzczMDc0NjA=,4337,cftime_range does not support default cftime.datetime formatted output strings,6063709,closed,0,,,5,2020-08-12T01:28:30Z,2020-08-17T23:27:07Z,2020-08-17T23:27:07Z,CONTRIBUTOR,,,," **Is your feature request related to a problem? Please describe.** The `xarray.cftime_range` does not support datetime strings that are the default output from `cftime.datetime.strftime()` which are the format which `cftime_range` itself uses internally. ```python import cftime import xarray date = cftime.datetime(10,1,1).strftime() print(date) xarray.cftime_range(date, periods=3, freq='Y') ``` outputs ``` 10-01-01 00:00:00 --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 3 date = cftime.datetime(10,1,1).strftime() 4 print(date) ----> 5 xarray.cftime_range(date, periods=3, freq='Y') /g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftime_offsets.py in cftime_range(start, end, periods, freq, normalize, name, closed, calendar) 963 964 if start is not None: --> 965 start = to_cftime_datetime(start, calendar) 966 start = _maybe_normalize_date(start, normalize) 967 if end is not None: /g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftime_offsets.py in to_cftime_datetime(date_str_or_date, calendar) 683 ""a calendar type must be provided"" 684 ) --> 685 date, _ = _parse_iso8601_with_reso(get_date_type(calendar), date_str_or_date) 686 return date 687 elif isinstance(date_str_or_date, cftime.datetime): /g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftimeindex.py in _parse_iso8601_with_reso(date_type, timestr) 101 102 default = date_type(1, 1, 1) --> 103 result = parse_iso8601(timestr) 104 replace = {} 105 /g/data3/hh5/public/apps/miniconda3/envs/analysis3-20.07/lib/python3.7/site-packages/xarray/coding/cftimeindex.py in parse_iso8601(datetime_string) 94 if match: 95 return match.groupdict() ---> 96 raise ValueError(""no ISO-8601 match for string: %s"" % datetime_string) 97 98 ValueError: no ISO-8601 match for string: 10-01-01 00:00:00 ``` **Describe the solution you'd like** It would be good if `xarray.cftime_range` supported the default `strftime` format output from cftime.datetime objects. It is confusing that it uses this format with `repr` but explicitly does not support it. **Describe alternatives you've considered** Specifying an ISO-8601 compatible format (using `T` separator) isn't general as it doesn't work for years < 1000 because the year field is not zero padded. ```python import cftime import xarray date = cftime.datetime(10,1,1).strftime('%Y-%m-%dT%H:%M:%S') print('|{}|'.format(date)) xarray.cftime_range(date, periods=3, freq='Y') ``` produces ``` | 10-01-01T00:00:00| ``` and the error as above. A work-around is to zero-pad manually ```python import cftime import xarray date = '{:0>19}'.format(cftime.datetime(10,1,1).strftime('%Y-%m-%dT%H:%M:%S').lstrip()) print(date) xarray.cftime_range(date, periods=3, freq='Y') ``` produces ``` 0010-01-01T00:00:00 CFTimeIndex([0010-12-31 00:00:00, 0011-12-31 00:00:00, 0012-12-31 00:00:00], dtype='object') ``` **Additional context** I think this is a relatively small addition to the codebase but would make it easier and less confusing to use the default format that is also used by the the function itself. It is easy to support as it is consistent and uniform. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4337/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 480512400,MDU6SXNzdWU0ODA1MTI0MDA=,3215,decode_cf called on mfdataset throws error: 'Array' object has no attribute 'tolist',6063709,closed,0,,,9,2019-08-14T06:56:35Z,2019-08-28T06:45:35Z,2019-08-28T06:45:35Z,CONTRIBUTOR,,,,"#### MCVE Code Sample ```python import xarray file = 'temp_048.nc' # Works ok with open_dataset ds = xarray.open_dataset(file, decode_cf=True) ds = xarray.open_dataset(file, decode_cf=False) ds = xarray.decode_cf(ds) # Fails with open_mfdataset ds = xarray.open_mfdataset(file, decode_cf=True) ds = xarray.open_mfdataset(file, decode_cf=False) # This line throws an exception ds = xarray.decode_cf(ds) ``` #### Expected Output Nothing #### Problem Description When opening data with `open_mfdataset` calling `decode_cf` throws an error, when called as a separate step, but works as part of the `open_mfdataset` call. Error is: ``` Traceback (most recent call last): File ""tmp.py"", line 11, in ds = xarray.decode_cf(ds) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py"", line 479, in decode_cf decode_coords, drop_variables=drop_variables, use_cftime=use_cftime) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py"", line 401, in decode_cf_variables stack_char_dim=stack_char_dim, use_cftime=use_cftime) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py"", line 306, in decode_cf_variable var = coder.decode(var, name=name) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py"", line 419, in decode self.use_cftime) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py"", line 90, in _decode_cf_datetime_dtype last_item(values) or [0]]) File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/core/formatting.py"", line 99, in last_item return np.ravel(array[indexer]).tolist() AttributeError: 'Array' object has no attribute 'tolist' ``` #### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:18:42) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.21.3.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: C LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.25.0 numpy: 1.17.0 scipy: 1.2.1 netCDF4: 1.5.1.2 pydap: installed h5netcdf: 0.7.4 h5py: 2.9.0 Nio: 1.5.5 zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: 1.2.0 PseudonetCDF: None rasterio: None cfgrib: 0.9.7.1 iris: 2.2.1dev0 bottleneck: 1.2.1 dask: 2.2.0 distributed: 2.2.0 matplotlib: 2.2.4 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 41.0.1 pip: 19.1.1 conda: installed pytest: 5.0.1 IPython: 7.7.0 sphinx: None
There is no error using an older version of `numpy` with the same `xarray` version:
INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.21.3.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: C LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.2.1 netCDF4: 1.5.1.2 pydap: installed h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: 1.2.0 PseudonetCDF: None rasterio: None cfgrib: 0.9.7 iris: 2.2.1dev0 bottleneck: 1.2.1 dask: 1.2.2 distributed: 1.28.1 matplotlib: 2.2.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 41.0.1 pip: 19.1.1 conda: installed pytest: 4.6.3 IPython: 7.5.0 sphinx: None
Looks like the `tollst()` method has disappeared from something, but even in the debugger it isn't obvious to me exactly why this is happening. I can call `list` on `np.ravel(array[indexer])` at the same point and it works. The netcdf file I am using can be recreated from this CDL dump ``` netcdf temp_048 { dimensions: time = UNLIMITED ; // (5 currently) nv = 2 ; variables: double average_T1(time) ; average_T1:long_name = ""Start time for average period"" ; average_T1:units = ""days since 1958-01-01 00:00:00"" ; average_T1:missing_value = 1.e+20 ; average_T1:_FillValue = 1.e+20 ; double time(time) ; time:long_name = ""time"" ; time:units = ""days since 1958-01-01 00:00:00"" ; time:cartesian_axis = ""T"" ; time:calendar_type = ""GREGORIAN"" ; time:calendar = ""GREGORIAN"" ; time:bounds = ""time_bounds"" ; double time_bounds(time, nv) ; time_bounds:long_name = ""time axis boundaries"" ; time_bounds:units = ""days"" ; time_bounds:missing_value = 1.e+20 ; time_bounds:_FillValue = 1.e+20 ; // global attributes: :filename = ""ocean.nc"" ; :title = ""MOM5"" ; :grid_type = ""mosaic"" ; :grid_tile = ""1"" ; :history = ""Wed Aug 14 16:38:53 2019: ncks -O -v average_T1 /g/data3/hh5/tmp/cosima/access-om2/1deg_jra55v13_iaf_spinup1_B1_lastcycle/output048/ocean/ocean.nc temp_048.nc"" ; :NCO = ""netCDF Operators version 4.7.7 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)"" ; data: average_T1 = 87659, 88024, 88389, 88754, 89119 ; time = 87841.5, 88206.5, 88571.5, 88936.5, 89301.5 ; time_bounds = 87659, 88024, 88024, 88389, 88389, 88754, 88754, 89119, 89119, 89484 ; } ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3215/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 334778045,MDU6SXNzdWUzMzQ3NzgwNDU=,2244,Implement shift for CFTimeIndex ,6063709,closed,0,,,3,2018-06-22T07:42:16Z,2018-10-02T14:44:30Z,2018-10-02T14:44:30Z,CONTRIBUTOR,,,,"#### Code Sample ```python import numpy as np import xarray as xr import pandas as pd from cftime import num2date, DatetimeNoLeap times = num2date(np.arange(730), calendar='noleap', units='days since 0001-01-01') da = xr.DataArray(np.arange(730), coords=[times], dims=['time']) ``` #### Problem description I am trying to shift a time index as I need to align datasets to a common start point. Directly incrementing one of the `CFTimeIndex` values works: ```python >>> da.time.get_index('time')[0] + pd.Timedelta('365 days') cftime.DatetimeNoLeap(2, 1, 1, 0, 0, 0, 0, -1, 1) ``` Trying to use `shift` does not: ```python >>> da.time.get_index('time').shift(1,'Y') Traceback (most recent call last): File """", line 1, in File ""/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.04/lib/python3.6/site-packages/pandas/core/indexes/base.py"", line 2629, in shift type(self).__name__) NotImplementedError: Not supported for type CFTimeIndex ``` If I want to shift a time index is the only way currently is to loop over all the individual elements of the index and add a time offset to each. #### Expected Output I would expect to have CFTimeIndex shifted by the desired time delta. #### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-693.17.1.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C LOCALE: None.None xarray: 0.10.7 pandas: 0.23.1 numpy: 1.14.5 scipy: 1.1.0 netCDF4: 1.3.1 h5netcdf: 0.5.1 h5py: 2.8.0 Nio: None zarr: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.17.5 distributed: 1.21.8 matplotlib: 1.5.3 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.2.0 pip: 9.0.3 conda: None pytest: 3.6.1 IPython: 6.4.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2244/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 102703065,MDU6SXNzdWUxMDI3MDMwNjU=,548,Support for netcdf4/hdf5 compression,6063709,closed,0,,,4,2015-08-24T04:22:07Z,2015-10-08T01:08:51Z,2015-10-08T01:08:51Z,CONTRIBUTOR,,,,"It would be great to be able to specify netCDF4 compression parameters when saving datasets. If this is unlikely to be supported, can you suggest a reasonable work-around? I am assuming it would involve directly accessing a backend? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/548/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue