html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7210#issuecomment-1292699390,https://api.github.com/repos/pydata/xarray/issues/7210,1292699390,IC_kwDOAMm_X85NDQb-,89428916,2022-10-26T21:56:38Z,2022-10-27T15:30:57Z,NONE,"I have two options for workarounds that I'll share. Both use monkey patching to override functions in `xarray.coding.times` so I don't recommend actually using them but they may spur some conversation around if/how xarray may want to adopt them.
# Option 1:
This truly avoids the problem and ""fixes"" the timestamp in the attribute so that pandas can read it. This is probably a bit specific to this situation, but it works. What it does is use `cftime.num2date` to parse the `units` attribute and get the reference date used in the `units`. Then it creates a new string using an ISO-compliant date and calls the original `decode_cf_datetime`, which ends up using pandas.
```
import xarray as xr
import xarray.coding.times
import numpy as np
import cftime
orig_decode_cf_datetime = xarray.coding.times.decode_cf_datetime
def decode_cf_datetime(num_dates, units, calendar=None, use_cftime=None):
if cftime is not None:
reference_time = cftime.num2date(0, units, calendar)
units = f""{units.split('since')[0]} since {reference_time}""
return orig_decode_cf_datetime(num_dates, units, calendar, use_cftime)
xarray.coding.times.decode_cf_datetime = decode_cf_datetime
fill_val = -99999.0
time_vals = np.random.randint(0, 1000, 10)
time_vals[1] = fill_val
data_vars = {
'foo': (['x'], np.random.rand(10)),
'time': (
['x'],
time_vals,
{
'units': 'seconds since 2000-1-1 0:0:0 0',
'_FillValue': fill_val,
'scale_factor': 1.0,
'add_offset': 0.0,
'standard_name': 'time',
'calendar': 'standard',
'Axis': 'T',
'coverage_content_type': 'coordinate',
}
),
}
ds = xr.Dataset(
data_vars=data_vars,
coords={'x': (['x'], np.arange(10))}
)
nc_out_location = '/tmp/example.nc'
ds.to_netcdf(nc_out_location)
ds = xr.open_dataset(nc_out_location)
print(ds['time'])
```
Console Output
```
array(['2000-01-01T00:03:18.000000000', 'NaT',
'2000-01-01T00:06:58.000000000', '2000-01-01T00:07:32.000000000',
'2000-01-01T00:09:07.000000000', '2000-01-01T00:04:28.000000000',
'2000-01-01T00:11:04.000000000', '2000-01-01T00:12:42.000000000',
'2000-01-01T00:05:03.000000000', '2000-01-01T00:11:07.000000000'],
dtype='datetime64[ns]')
Coordinates:
* x (x) int64 0 1 2 3 4 5 6 7 8 9
Attributes:
standard_name: time
Axis: T
coverage_content_type: coordinate
```
# Option 2:
This option tries to get smart before handing the dates off to `cftime` by replacing `NaN` with `0` before calling `cftime.num2date`. Then after the dates are converted it puts the `NaN`s back where they were.
```
import xarray as xr
import xarray.coding.times
import numpy as np
import cftime
def _decode_datetime_with_cftime(num_dates, units, calendar):
if cftime is None:
raise ModuleNotFoundError(""No module named 'cftime'"")
indeces_of_nan = np.argwhere(np.isnan(num_dates))
num_dates.put(indeces_of_nan, 0)
as_dates = np.asarray(cftime.num2date(num_dates, units, calendar))
as_dates.put(indeces_of_nan, np.nan)
return as_dates
xarray.coding.times._decode_datetime_with_cftime = _decode_datetime_with_cftime
fill_val = -99999.0
time_vals = np.random.randint(0, 1000, 10)
time_vals[1] = fill_val
data_vars = {
'foo': (['x'], np.random.rand(10)),
'time': (
['x'],
time_vals,
{
'units': 'seconds since 2000-1-1 0:0:0 0',
'_FillValue': fill_val,
'scale_factor': 1.0,
'add_offset': 0.0,
'standard_name': 'time',
'calendar': 'standard',
'Axis': 'T',
'coverage_content_type': 'coordinate',
}
),
}
ds = xr.Dataset(
data_vars=data_vars,
coords={'x': (['x'], np.arange(10))}
)
nc_out_location = '/tmp/example.nc'
ds.to_netcdf(nc_out_location)
ds = xr.open_dataset(nc_out_location, use_cftime=True)
print(ds['time'])
```
Console Output
```
array([cftime.DatetimeGregorian(2000, 1, 1, 0, 12, 33, 0, has_year_zero=False),
nan,
cftime.DatetimeGregorian(2000, 1, 1, 0, 10, 50, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 1, 1, 0, 6, 27, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 44, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 1, 1, 0, 11, 18, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 35, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 1, 1, 0, 12, 37, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 1, 1, 0, 14, 44, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 1, 1, 0, 11, 17, 0, has_year_zero=False)],
dtype=object)
Coordinates:
* x (x) int64 0 1 2 3 4 5 6 7 8 9
Attributes:
standard_name: time
Axis: T
coverage_content_type: coordinate
```
There is one more problem with this solution though which is this (https://github.com/pydata/xarray/blob/076bd8e15f04878d7b97100fb29177697018138f/xarray/coding/times.py#L281-L296) validation code that happens after the `_decode_datetime_with_cftime` is called. That code calls `cftime_to_nptime` which does not handle `NaN`s. However, that validation code can be avoided by explicitly setting `use_cftime=True` which then works as expected.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1421718311