home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 296483716

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/1252#issuecomment-296483716 https://api.github.com/repos/pydata/xarray/issues/1252 296483716 MDEyOklzc3VlQ29tbWVudDI5NjQ4MzcxNg== 6628425 2017-04-23T19:47:26Z 2017-04-23T19:47:26Z MEMBER

We'll want to save this for a major release (v0.10), since it will be backwards incompatible.

Right. To what extent do we want to preserve the current behavior? As I understand it, what is currently done is regardless of the calendar type, every effort is made to convert decoded datetimes into np.datetime64 objects. What this means is that the only time one gets netcdftime.datetime objects in an xarray object is when either (or both):

  • A date present in an array does not exist in the standard calendar (e.g. say '2000-02-30' in a '360_day' calendar).
  • The dates do not fit into the range of years 1678 to 2262.

The main advantage of doing this is that it enables, wherever possible, 1D arrays of datetimes to be converted to DatetimeIndexes and all the nice things that comes with them:

  1. Field accessors (for use in groupby operations)
  2. Partial datetime string indexing
  3. Resample for downsampling only (accurate upsampling would need to be calendar-aware)
  4. "Not a time" support (i.e. support for missing values).

This practice has one disadvantage -- the calendar type of the datetimes is not preserved. Therefore if one tries to do timedelta arithmetic between values in the array, (e.g. Mar. 1st, 2001 minus Feb. 1st, 2001) one might get an inaccurate answer depending on what the original calendar type was (as was noted when this was originally implemented, and to be fair, I don't think timedelta arithmetic was even possible on netcdftime.datetime objects back then).

Options

I can think of two ways to proceed in integrating NetCDFTimeIndex into xarray:

  • Only use a NetCDFTimeIndex where a DatetimeIndexcurrently cannot be used
  • Always use netcdftime.datetime objects (and hence NetCDFTimeIndex) for representing dates with non-standard calendars

Tradeoffs

Only use a NetCDFTimeIndex where a DatetimeIndexcurrently cannot be used

This is perhaps the the least dramatic change one could make. It would involve not modifying the decoding logic at all (i.e. continuing to be aggressive in attempting to convert netcdftime.datetime objects to np.datetime64 types) and only using a NetCDFTimeIndex when dates were DatetimeIndex incompatible (in other words only when an array of netcdftime.datetime objects arrived at utils.safe_cast_to_index under the existing decoding logic). The timedelta issue would still remain, but this route would not change the behavior for time arrays with non-standard calendars that could be cast as ordinary DatetimeIndexes (so wherever you could use resample for non-standard calendars before, you could still do so here, and of course wherever you couldn't before, you still couldn't).

Always use netcdftime.datetime objects (and hence NetCDFTimeIndex) for representing dates with non-standard calendars

As has been discussed, the current implementation of NetCDFTimeIndex enables (1) and (2), as well as eliminates the timedelta problem, but it does not enable (3) or (4). Therefore, for instance, for those who used xarray to downsample arrays indexed by datetimes with a non-standard calendar, whose data did not violate the two bulleted specifications at the top, deciding to unilaterally use netcdftime.datetime objects for all non-standard calendar types would be a regression (though it would preserve the original calendar type).

I suppose this comes down to weighing the importance of addressing the timedelta issue (perhaps more generally preserving calendar types) versus preserving existing behavior that allows (3) and (4) for some cases with non-standard calendars.

Is this an accurate summary of the considerations we should make here? What are folks' opinions on these tradeoffs? What might be the preferred route to take?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  205473898
Powered by Datasette · Queries took 0.617ms · About: xarray-datasette