home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 218653355

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/818#issuecomment-218653355 https://api.github.com/repos/pydata/xarray/issues/818 218653355 MDEyOklzc3VlQ29tbWVudDIxODY1MzM1NQ== 1217238 2016-05-12T03:54:09Z 2016-05-12T03:54:09Z MEMBER

@naught101

I want to be able to run a scikit-learn model over a bunch of variables in a 3D (lat/lon/time) dataset, and return values for each coordinate point. Is something like this multi-dimensional groupby required (I'm thinking groupby(lat, lon) => 2D matrices that can be fed straight into scikit-learn), or is there already some other mechanism that could achieve something like this? Or is the best way at the moment just to create a null dataset, and loop over lat/lon and fill in the blanks as you go?

Can you clarify exactly what shape data you want to put into scikit-learn to make predictions? What are the dimensions of your input? In principle, this is exactly the sort of thing that multi-dimensional groupby should solve, although we might also need support for multiple arguments to handle lat/lon (this should not be too difficult).


For the bins argument, I should suggest a separate DataArray/Dataset method for creating the GroupBy object. The resample method in xarray should be updated to return a GroupBy object (like the pandas method), and extending resample to numbers would be a natural fit. Something like Dataset.resample(longitude=10) could be a good way to spell this. (We would deprecate the how, freq and dim arguments, and ideally make all the remaining arguments keyword only.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  146182176
Powered by Datasette · Queries took 0.575ms · About: xarray-datasette