home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 455386887

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/2593#issuecomment-455386887 https://api.github.com/repos/pydata/xarray/issues/2593 455386887 MDEyOklzc3VlQ29tbWVudDQ1NTM4Njg4Nw== 8708062 2019-01-18T00:50:29Z 2019-01-18T00:50:29Z CONTRIBUTOR

Interesting, I'm not sure what's leading to the difference in behavior between platforms (I'm on a Mac). If you can distill the precision discrepancy to a minimal example, it might be worth reporting upstream to cftime. Though I agree, from xarray's perspective this is probably not a major concern.

Could this issue be related to the linear algebra library? A minimal example would be the earlier example you gave. For both cftime 1.0.0 and 1.0.3.4, I get "mismatch 60.0%".

The logic I wrote in CFTimeGrouper.first_items was in fact based in part on lib.generate_bins_dt64. Indeed I omitted these checks related to whether the time series values were all within the bin edges or not. I think it would actually be straightforward to add these in just before the call to np.searchsorted (in my code datetime_bins corresponds to binner in pandas and index corresponds to values). I would be inclined to do this rather than try to deviate from pandas' behavior here for now.

So I keep pandas' logic (the first bin has 1 day and 1 microsecond added to it) and raise a Value Error when either index[0] < datetime_bins[0] or index[lenidx - 1] > datetime_bins[lenbin - 1].

Which version of pandas are you testing against? I'm testing against the dev version and with your branch I can't seem to reproduce any failures that are not related to the "values falls before first bin" error in the XT case. I have not tried any of the 6AS_JUN cases yet.

I'm testing against the dev version, 11 commits behind. Could the errors for XT that I get but you don't be due to cftime/linear algebra library issue? There may be enough error accumulated for hourly frequencies over 140 years that cftime_range generates an extra bin compared to pandas date_range (haven't checked all manually but I believe the majority of the non-"values falls before first bin" errors are due to extra bin(s)). 6AS_JUN only has 8 failed tests all due to "x and y nan location mismatch".

Yeah that's too long :). I agree we'll eventually need to pare down the tests. A big reason for this is the length of the resampled time series in some of the cases (e.g. hourly frequencies over a 140 year period produce very long arrays, e.g. in the XT case). There's also probably a lot of overlap between the existing test cases as far as the logic they exercise (e.g. do we need to test against frequencies '2H', '5H', '7H', '12H', and '8001H' or would just '7H' suffice etc.?).

We probably don't. I forget the reason but early on in development, resampling tests failed for some time ranges when using purely odd frequencies while others failed with purely even ones. Resampling tests for12H/24H frequencies might not be needed now that `_adjust_bin_edges' is being used.

While it's definitely good to test directly against pandas for some examples, in place of some of those tests, we might want to consider writing some targeted unit tests for the methods in resample_cftime.py, like _get_time_bins, CFTimeGrouper.first_items, etc. With those it would probably be easier to write some tests that use minimal data but exercise all the conditional logic.

I'll look into writing targeted unit tests.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  387924616
Powered by Datasette · Queries took 156.84ms · About: xarray-datasette