issue_comments: 455386887

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/pull/2593#issuecomment-455386887	https://api.github.com/repos/pydata/xarray/issues/2593	455386887	MDEyOklzc3VlQ29tbWVudDQ1NTM4Njg4Nw==	8708062	2019-01-18T00:50:29Z	2019-01-18T00:50:29Z	CONTRIBUTOR	Interesting, I'm not sure what's leading to the difference in behavior between platforms (I'm on a Mac). If you can distill the precision discrepancy to a minimal example, it might be worth reporting upstream to cftime. Though I agree, from xarray's perspective this is probably not a major concern. Could this issue be related to the linear algebra library? A minimal example would be the earlier example you gave. For both cftime 1.0.0 and 1.0.3.4, I get "mismatch 60.0%". The logic I wrote in `CFTimeGrouper.first_items` was in fact based in part on `lib.generate_bins_dt64`. Indeed I omitted these checks related to whether the time series values were all within the bin edges or not. I think it would actually be straightforward to add these in just before the call to `np.searchsorted` (in my code `datetime_bins` corresponds to `binner` in pandas and `index` corresponds to `values`). I would be inclined to do this rather than try to deviate from pandas' behavior here for now. So I keep pandas' logic (the first bin has 1 day and 1 microsecond added to it) and raise a Value Error when either `index[0] < datetime_bins[0]` or `index[lenidx - 1] > datetime_bins[lenbin - 1]`. Which version of pandas are you testing against? I'm testing against the dev version and with your branch I can't seem to reproduce any failures that are not related to the "values falls before first bin" error in the `XT` case. I have not tried any of the `6AS_JUN` cases yet. I'm testing against the dev version, 11 commits behind. Could the errors for `XT` that I get but you don't be due to cftime/linear algebra library issue? There may be enough error accumulated for hourly frequencies over 140 years that cftime_range generates an extra bin compared to pandas date_range (haven't checked all manually but I believe the majority of the non-"values falls before first bin" errors are due to extra bin(s)). `6AS_JUN` only has 8 failed tests all due to "x and y nan location mismatch". Yeah that's too long :). I agree we'll eventually need to pare down the tests. A big reason for this is the length of the resampled time series in some of the cases (e.g. hourly frequencies over a 140 year period produce very long arrays, e.g. in the `XT` case). There's also probably a lot of overlap between the existing test cases as far as the logic they exercise (e.g. do we need to test against frequencies '2H', '5H', '7H', '12H', and '8001H' or would just '7H' suffice etc.?). We probably don't. I forget the reason but early on in development, resampling tests failed for some time ranges when using purely odd frequencies while others failed with purely even ones. Resampling tests for12H/24H frequencies might not be needed now that `_adjust_bin_edges' is being used. While it's definitely good to test directly against pandas for some examples, in place of some of those tests, we might want to consider writing some targeted unit tests for the methods in `resample_cftime.py`, like `_get_time_bins`, `CFTimeGrouper.first_items`, etc. With those it would probably be easier to write some tests that use minimal data but exercise all the conditional logic. I'll look into writing targeted unit tests.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		387924616