home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1292699390

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/7210#issuecomment-1292699390 https://api.github.com/repos/pydata/xarray/issues/7210 1292699390 IC_kwDOAMm_X85NDQb- 89428916 2022-10-26T21:56:38Z 2022-10-27T15:30:57Z NONE

I have two options for workarounds that I'll share. Both use monkey patching to override functions in xarray.coding.times so I don't recommend actually using them but they may spur some conversation around if/how xarray may want to adopt them.

Option 1:

This truly avoids the problem and "fixes" the timestamp in the attribute so that pandas can read it. This is probably a bit specific to this situation, but it works. What it does is use cftime.num2date to parse the units attribute and get the reference date used in the units. Then it creates a new string using an ISO-compliant date and calls the original decode_cf_datetime, which ends up using pandas. ``` import xarray as xr import xarray.coding.times import numpy as np import cftime

orig_decode_cf_datetime = xarray.coding.times.decode_cf_datetime def decode_cf_datetime(num_dates, units, calendar=None, use_cftime=None): if cftime is not None: reference_time = cftime.num2date(0, units, calendar) units = f"{units.split('since')[0]} since {reference_time}"

return orig_decode_cf_datetime(num_dates, units, calendar, use_cftime)

xarray.coding.times.decode_cf_datetime = decode_cf_datetime

fill_val = -99999.0 time_vals = np.random.randint(0, 1000, 10) time_vals[1] = fill_val

data_vars = { 'foo': (['x'], np.random.rand(10)), 'time': ( ['x'], time_vals, { 'units': 'seconds since 2000-1-1 0:0:0 0', '_FillValue': fill_val, 'scale_factor': 1.0, 'add_offset': 0.0, 'standard_name': 'time', 'calendar': 'standard', 'Axis': 'T', 'coverage_content_type': 'coordinate', } ), }

ds = xr.Dataset( data_vars=data_vars, coords={'x': (['x'], np.arange(10))} )

nc_out_location = '/tmp/example.nc' ds.to_netcdf(nc_out_location) ds = xr.open_dataset(nc_out_location) print(ds['time']) ```

Console Output ``` <xarray.DataArray 'time' (x: 10)> array(['2000-01-01T00:03:18.000000000', 'NaT', '2000-01-01T00:06:58.000000000', '2000-01-01T00:07:32.000000000', '2000-01-01T00:09:07.000000000', '2000-01-01T00:04:28.000000000', '2000-01-01T00:11:04.000000000', '2000-01-01T00:12:42.000000000', '2000-01-01T00:05:03.000000000', '2000-01-01T00:11:07.000000000'], dtype='datetime64[ns]') Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 Attributes: standard_name: time Axis: T coverage_content_type: coordinate ```

Option 2:

This option tries to get smart before handing the dates off to cftime by replacing NaN with 0 before calling cftime.num2date. Then after the dates are converted it puts the NaNs back where they were. ``` import xarray as xr import xarray.coding.times import numpy as np import cftime

def _decode_datetime_with_cftime(num_dates, units, calendar): if cftime is None: raise ModuleNotFoundError("No module named 'cftime'") indeces_of_nan = np.argwhere(np.isnan(num_dates)) num_dates.put(indeces_of_nan, 0) as_dates = np.asarray(cftime.num2date(num_dates, units, calendar)) as_dates.put(indeces_of_nan, np.nan) return as_dates xarray.coding.times._decode_datetime_with_cftime = _decode_datetime_with_cftime

fill_val = -99999.0 time_vals = np.random.randint(0, 1000, 10) time_vals[1] = fill_val

data_vars = { 'foo': (['x'], np.random.rand(10)), 'time': ( ['x'], time_vals, { 'units': 'seconds since 2000-1-1 0:0:0 0', '_FillValue': fill_val, 'scale_factor': 1.0, 'add_offset': 0.0, 'standard_name': 'time', 'calendar': 'standard', 'Axis': 'T', 'coverage_content_type': 'coordinate', } ), }

ds = xr.Dataset( data_vars=data_vars, coords={'x': (['x'], np.arange(10))} )

nc_out_location = '/tmp/example.nc' ds.to_netcdf(nc_out_location) ds = xr.open_dataset(nc_out_location, use_cftime=True) print(ds['time']) ```

Console Output ``` <xarray.DataArray 'time' (x: 10)> array([cftime.DatetimeGregorian(2000, 1, 1, 0, 12, 33, 0, has_year_zero=False), nan, cftime.DatetimeGregorian(2000, 1, 1, 0, 10, 50, 0, has_year_zero=False), cftime.DatetimeGregorian(2000, 1, 1, 0, 6, 27, 0, has_year_zero=False), cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 44, 0, has_year_zero=False), cftime.DatetimeGregorian(2000, 1, 1, 0, 11, 18, 0, has_year_zero=False), cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 35, 0, has_year_zero=False), cftime.DatetimeGregorian(2000, 1, 1, 0, 12, 37, 0, has_year_zero=False), cftime.DatetimeGregorian(2000, 1, 1, 0, 14, 44, 0, has_year_zero=False), cftime.DatetimeGregorian(2000, 1, 1, 0, 11, 17, 0, has_year_zero=False)], dtype=object) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 Attributes: standard_name: time Axis: T coverage_content_type: coordinate ```

There is one more problem with this solution though which is this (https://github.com/pydata/xarray/blob/076bd8e15f04878d7b97100fb29177697018138f/xarray/coding/times.py#L281-L296) validation code that happens after the _decode_datetime_with_cftime is called. That code calls cftime_to_nptime which does not handle NaNs. However, that validation code can be avoided by explicitly setting use_cftime=True which then works as expected.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1421718311
Powered by Datasette · Queries took 0.948ms · About: xarray-datasette