home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1063046540

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1063046540 I_kwDOAMm_X84_XM2M 6026 Delaying open produces different type of `cftime` object 42455466 closed 0     3 2021-11-25T00:47:22Z 2022-01-13T13:49:27Z 2022-01-13T13:49:27Z NONE      

What happened: The task is opening a dataset (e.g. a netcdf or zarr file) with a time coordinate using use_cftime=True. Delaying the task with dask results in the time coordinate being represented as cftime.datetime objects, whereas when the task is not delayed cftime.Datetime<Calendar> objects are used.

What you expected to happen: Consistent cftime objects to be used, regardless of whether the opening task is delayed or not.

Minimal Complete Verifiable Example:

```python import dask import numpy as np import xarray as xr from dask.distributed import LocalCluster, Client

cluster = LocalCluster() client = Client(cluster)

Write some data

var = np.random.random(4) time = xr.cftime_range('2000-01-01', periods=4, calendar='julian') ds = xr.Dataset(data_vars={'var': ('time', var)}, coords={'time': time}) ds.to_netcdf('test.nc', mode='w')

Open written data

ds1 = xr.open_dataset('test.nc', use_cftime=True) print(f'ds1: {ds1.time} \n')

Delayed open written data

ds2 = dask.delayed(xr.open_dataset)('test.nc', use_cftime=True) ds2 = dask.compute(ds2)[0] print(f'ds2: {ds2.time} \n')

Operations like xr.open_mfdataset which use dask.delayed internally

when parallel=True (I think) produce the same result as ds2

ds3 = xr.open_mfdataset('test.nc', use_cftime=True, parallel=True) print(f'ds3: {ds3.time}') returns ds1: <xarray.DataArray 'time' (time: 4)> array([cftime.DatetimeJulian(2000, 1, 1, 0, 0, 0, 0, has_year_zero=False), cftime.DatetimeJulian(2000, 1, 2, 0, 0, 0, 0, has_year_zero=False), cftime.DatetimeJulian(2000, 1, 3, 0, 0, 0, 0, has_year_zero=False), cftime.DatetimeJulian(2000, 1, 4, 0, 0, 0, 0, has_year_zero=False)], dtype=object) Coordinates: * time (time) object 2000-01-01 00:00:00 ... 2000-01-04 00:00:00

ds2: <xarray.DataArray 'time' (time: 4)> array([cftime.datetime(2000, 1, 1, 0, 0, 0, 0, calendar='julian', has_year_zero=False), cftime.datetime(2000, 1, 2, 0, 0, 0, 0, calendar='julian', has_year_zero=False), cftime.datetime(2000, 1, 3, 0, 0, 0, 0, calendar='julian', has_year_zero=False), cftime.datetime(2000, 1, 4, 0, 0, 0, 0, calendar='julian', has_year_zero=False)], dtype=object) Coordinates: * time (time) object 2000-01-01 00:00:00 ... 2000-01-04 00:00:00

ds3: <xarray.DataArray 'time' (time: 4)> array([cftime.datetime(2000, 1, 1, 0, 0, 0, 0, calendar='julian', has_year_zero=False), cftime.datetime(2000, 1, 2, 0, 0, 0, 0, calendar='julian', has_year_zero=False), cftime.datetime(2000, 1, 3, 0, 0, 0, 0, calendar='julian', has_year_zero=False), cftime.datetime(2000, 1, 4, 0, 0, 0, 0, calendar='julian', has_year_zero=False)], dtype=object) Coordinates: * time (time) object 2000-01-01 00:00:00 ... 2000-01-04 00:00:00 ```

Anything else we need to know?: I noticed this because the DatetimeAccessor ceil, floor and round methods return errors for cftime.datetime objects (but not cftime.Datetime<Calendar> objects) for all calendar types other than 'gregorian'. For example, python ds3.time.dt.floor('D') returns the following traceback: ```


TypeError Traceback (most recent call last) <ipython-input-10-613e63624953> in <module> ----> 1 ds3.time.dt.floor('D')

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/accessor_dt.py in floor(self, freq) 220 """ 221 --> 222 return self._tslib_round_accessor("floor", freq) 223 224 def ceil(self, freq):

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/accessor_dt.py in _tslib_round_accessor(self, name, freq) 202 def _tslib_round_accessor(self, name, freq): 203 obj_type = type(self._obj) --> 204 result = _round_field(self._obj.data, name, freq) 205 return obj_type(result, name=name, coords=self._obj.coords, dims=self._obj.dims) 206

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/accessor_dt.py in _round_field(values, name, freq) 142 ) 143 else: --> 144 return _round_through_series_or_index(values, name, freq) 145 146

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/accessor_dt.py in _round_through_series_or_index(values, name, freq) 110 method = getattr(values_as_cftimeindex, name) 111 --> 112 field_values = method(freq=freq).values 113 114 return field_values.reshape(values.shape)

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in floor(self, freq) 733 CFTimeIndex 734 """ --> 735 return self._round_via_method(freq, _floor_int) 736 737 def ceil(self, freq):

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in _round_via_method(self, freq, method) 714 715 unit = _total_microseconds(offset.as_timedelta()) --> 716 values = self.asi8 717 rounded = method(values, unit) 718 return _cftimeindex_from_i8(rounded, self.date_type, self.name)

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in asi8(self) 684 epoch = self.date_type(1970, 1, 1) 685 return np.array( --> 686 [ 687 _total_microseconds(exact_cftime_datetime_difference(epoch, date)) 688 for date in self.values

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in <listcomp>(.0) 685 return np.array( 686 [ --> 687 _total_microseconds(exact_cftime_datetime_difference(epoch, date)) 688 for date in self.values 689 ],

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/resample_cftime.py in exact_cftime_datetime_difference(a, b) 356 datetime.timedelta 357 """ --> 358 seconds = b.replace(microsecond=0) - a.replace(microsecond=0) 359 seconds = int(round(seconds.total_seconds())) 360 microseconds = b.microsecond - a.microsecond

src/cftime/_cftime.pyx in cftime._cftime.datetime.sub()

TypeError: cannot compute the time difference between dates with different calendars ``` My apologies for conflating two issues here. I'm happy to open a separate issue for this if that's preferred.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:13:33) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-305.19.1.el8.nci.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.20.1 pandas: 1.3.4 numpy: 1.21.4 scipy: 1.6.3 netCDF4: 1.5.6 pydap: None h5netcdf: 0.11.0 h5py: 3.3.0 Nio: None zarr: 2.9.5 cftime: 1.5.0 nc_time_axis: 1.4.0 PseudoNetCDF: None rasterio: 1.2.4 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.11.2 distributed: 2021.11.2 matplotlib: 3.4.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None fsspec: 2021.05.0 cupy: None pint: 0.18 sparse: None setuptools: 49.6.0.post20210108 pip: 21.1.2 conda: 4.10.1 pytest: None IPython: 7.24.0 sphinx: None ​
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6026/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 0.897ms · About: xarray-datasette