home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 265174968

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1084#issuecomment-265174968 https://api.github.com/repos/pydata/xarray/issues/1084 265174968 MDEyOklzc3VlQ29tbWVudDI2NTE3NDk2OA== 6628425 2016-12-06T15:13:41Z 2016-12-06T15:13:41Z MEMBER

@shoyer brings up a good point regarding partial datetime string indexing. For instance in my basic example, indexing with truncated string dates of the form '2000-01-01' (versus the full specification, 2000-01-01 00:00:00') works because netcdftime._parse_date simply assumes that you meant '2000-01-01 00:00:00' when you wrote '2000-01-01'.

This would mean that the same string specification could have different behavior for DatetimeIndex objects versus NetCDFTimeIndex objects, which is probably not desirable.

For instance, using the current setup in my basic example with sub-daily resolution data, selecting a time using '2000-01-01' would give you just the value associated with '2000-01-01 00:00:00': ``` In [20] dates = [netcdftime.DatetimeAllLeap(2000, 1, 1, 0), netcdftime.DatetimeAllLeap(2000, 1, 1, 3)] In [21] da = xr.DataArray(np.arange(2), coords=[NetCDFTimeIndex(dates)], dims=['time']) In [22] da.sel(time='2000-01-01')

Out [22] <xarray.DataArray ()> array(0) Coordinates: time object 2000-01-01 00:00:00 ```

but using a DatetimeIndex this would give you both values (because of partial datetime string selection):

``` In [23] from datetime import datetime In [24] dates = [datetime(2000, 1, 1, 0), datetime(2000, 1, 1, 3)] In [25] da = xr.DataArray(np.arange(2), coords=[dates], dims=['time']) In [26] da.sel(time='2000-01-01')

Out [26] <xarray.DataArray (time: 2)> array([0, 1]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-01T03:00:00 ```

I think if we were to include string-based indexing, it would be best if it were completely consistent with the DatetimeIndex version. I would love to be wrong, but I don't see a clean way of directly using existing code from pandas to enable this. At least in my (possibly naive) reading of the internals of DatetimeIndex, the functions associated with partial datetime string selection are somewhat tied to using datetimes with standard calendars (somewhat in the weeds, but more specifically I'm looking at pandas.tslib.parse_datetime_string_with_reso and pandas.tseries.index.DatetimeIndex._parsed_string_to_bounds), and it could take a fair bit of adapting that code for our purposes to unhitch that dependence. Is that a fair assessment?

So ultimately this raises the question, would we want to add just the field accessors to enable group-by operations for now and add string-based selection (and other features like resample) later, or should we put our heads down and work out a solution for partial datetime string based using netcdftime datetime objects?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  187591179
Powered by Datasette · Queries took 0.722ms · About: xarray-datasette