home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 450533035

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/2593#issuecomment-450533035 https://api.github.com/repos/pydata/xarray/issues/2593 450533035 MDEyOklzc3VlQ29tbWVudDQ1MDUzMzAzNQ== 6628425 2018-12-30T01:31:12Z 2018-12-30T01:31:12Z MEMBER

So the crux of the problem now seems to be in generating first_items; this is the Series that is used for both upsampling and downsampling a DataArray in xarray. For data indexed by a DatetimeIndex, it is straightforward to generate this Series (it just takes the construction of a simple Series with np.arange and the reference index, the construction of a pandas.Grouper object, and a call to groupby with the method first). In xarray, downsampling uses both the values (to define the groups) and index (to define the labels) of this Series, while upsampling only uses the index.

For data indexed by a CFTimeIndex, we do not have the luxury of a formal Grouper object; however, if we can create this first_items Series accurately, I think all other results of resample in xarray should follow.

I've put together a gist which compares the first_items Series generated with pandas with that generated by the cftime logic (the output of running the tests is also included). I've tried to use a fairly challenging set of initial time indexes as well as resample frequencies (different than what are currently used in the tests); there appear to be many mismatches under the "upsampling" case, but also a few errors show up in the "downsampling" case (to some extent I think these are related to the omission of the _adjust_bin_edges method, which it turns out I do think we may need). In theory though, because of how this first_items Series is created in the DatetimeIndex case, I don't think the way we create it in the CFTimeIndex case should depend on whether the length of the reference index is greater than or less than the length of the resample labels (upsampling or downsampling is determined instead by the resampling method used).

This inspired the alternative solution proposed in the second part of the gist (I've also added back in a call to a cftime version of the _adjust_bin_edges method); in this case there is no dependence on the relative lengths of the reference index and resample labels, and all of the test cases I've tried so far pass.

Let me know if this alternative solution makes sense. Digging in to the guts of the resample code in pandas/xarray is still fairly new for me too, so I could be missing something. In the gist I'm using this branch of xarray, the development version of pandas, and the latest version of cftime. Thanks again for your hard work on this!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  387924616
Powered by Datasette · Queries took 0.915ms · About: xarray-datasette