html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/818#issuecomment-231256264,https://api.github.com/repos/pydata/xarray/issues/818,231256264,MDEyOklzc3VlQ29tbWVudDIzMTI1NjI2NA==,1217238,2016-07-08T01:50:30Z,2016-07-08T01:50:30Z,MEMBER,"OK, merging.....
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-230818687,https://api.github.com/repos/pydata/xarray/issues/818,230818687,MDEyOklzc3VlQ29tbWVudDIzMDgxODY4Nw==,1217238,2016-07-06T16:00:54Z,2016-07-06T16:00:54Z,MEMBER,"@rabernat I agree. I have a couple of minor style/pep8 issues, and we need an entry for ""what's new"", but let's merge this. I can then play around a little bit with potential fixes.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-224693231,https://api.github.com/repos/pydata/xarray/issues/818,224693231,MDEyOklzc3VlQ29tbWVudDIyNDY5MzIzMQ==,1217238,2016-06-08T18:58:45Z,2016-06-08T18:58:45Z,MEMBER,"Looks like I still have a bug (failing Travis builds). Let me see if I can
get that sorted out first.
On Wed, Jun 8, 2016 at 11:51 AM, Ryan Abernathey notifications@github.com
wrote:
> I think #875 https://github.com/pydata/xarray/pull/875 should fix the
> issue with concatenating index objects.
>
> Should I try to merge your branch with my branch...or wait for your branch
> to get merged into master?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> https://github.com/pydata/xarray/pull/818#issuecomment-224691235, or mute
> the thread
> https://github.com/notifications/unsubscribe/ABKS1oaZfZ0P384eSGKIQ8-0fbyH8KDWks5qJw86gaJpZM4IAuQH
> .
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-224484574,https://api.github.com/repos/pydata/xarray/issues/818,224484574,MDEyOklzc3VlQ29tbWVudDIyNDQ4NDU3NA==,1217238,2016-06-08T04:32:29Z,2016-06-08T04:32:29Z,MEMBER,"I think #875 should fix the issue with concatenating index objects.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-223999761,https://api.github.com/repos/pydata/xarray/issues/818,223999761,MDEyOklzc3VlQ29tbWVudDIyMzk5OTc2MQ==,1217238,2016-06-06T15:45:49Z,2016-06-06T15:45:49Z,MEMBER,"Empty groups should be straightforward -- we should be able handle them.
Indices which don't belong to any group are indeed more problematic. I think we have three options here:
1. Raise an error when calling `.groupby_bins(...)`
2. Raise an error when calling `.groupby_bins(...).apply(...)`
3. Simply concatenate back together whatever items _were_ grouped, and give up on the guarantee that the applying the identity function restores the original item.
I think my preference would be for option 3, though 1 or 2 could be reasonable work arounds for now (raising `NotImplementedError`), because 3 is likely to be a little tricky to implement.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-223870991,https://api.github.com/repos/pydata/xarray/issues/818,223870991,MDEyOklzc3VlQ29tbWVudDIyMzg3MDk5MQ==,1217238,2016-06-06T05:23:24Z,2016-06-06T05:23:24Z,MEMBER,"I think I can fix this, by making concatenation work properly on index objects. Stay tuned...
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-219096410,https://api.github.com/repos/pydata/xarray/issues/818,219096410,MDEyOklzc3VlQ29tbWVudDIxOTA5NjQxMA==,1217238,2016-05-13T16:42:58Z,2016-05-13T16:42:58Z,MEMBER,"> Why? This was in fact my original idea, but you encouraged me to use pd.cut instead. One thing I like about cut is that it is very flexible and well documented, while digitize is somewhat obscure.
If you're not going to use the labels it produces I'm not sure there's an advantage to `pd.cut`. Otherwise I thought they were pretty similar.
`groupby_bins` seems pretty reasonable.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-218879360,https://api.github.com/repos/pydata/xarray/issues/818,218879360,MDEyOklzc3VlQ29tbWVudDIxODg3OTM2MA==,1217238,2016-05-12T20:41:18Z,2016-05-12T20:41:18Z,MEMBER,"@rabernat It's possibly a better idea to use [`np.digitize`](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.digitize.html) rather than `pd.cut`.
I would strongly suggest controlling labeling with a keyword argument, maybe similar to [diff](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.diff.html#xarray.Dataset.diff).
Again, rather then further overloading the user facing API `.groupby()`, the binning is probably best expressed in a separate method. I would suggest a `.bin(bins)` method on Dataset/DataArray. Then you could just use a normal call to (multi-dimensional) groupby. So instead, we might have: `ds.sample_tsurf.assign(lat_bin=ds.TLAT.bin(lat_bins)).groupby('lat_bins').mean()`.
On second thought, this _is_ significantly more verbose, so maybe `bins` in the groupby call is OK.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-218806328,https://api.github.com/repos/pydata/xarray/issues/818,218806328,MDEyOklzc3VlQ29tbWVudDIxODgwNjMyOA==,1217238,2016-05-12T16:10:04Z,2016-05-12T16:10:04Z,MEMBER,"Ah, of course -- forcing_data is a Dataset. You definitely want to pull out
the DataArray first. Then .values if what you want.
On Wed, May 11, 2016 at 11:54 PM, naught101 notifications@github.com
wrote:
> forcing_data.isel(lat=lat, lon=lon).values() returns a ValuesView, which
> scikit-learn doesn't like. However, forcing_data.isel(lat=lat,
> lon=lon).to_array().T seems to work..
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly or view it on GitHub
> https://github.com/pydata/xarray/pull/818#issuecomment-218675077
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-218672116,https://api.github.com/repos/pydata/xarray/issues/818,218672116,MDEyOklzc3VlQ29tbWVudDIxODY3MjExNg==,1217238,2016-05-12T06:34:56Z,2016-05-12T06:34:56Z,MEMBER,"@naught101 I was mixing up how `to_dataframe()` works. Please ignore it! (I edited my earlier post.)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-218663446,https://api.github.com/repos/pydata/xarray/issues/818,218663446,MDEyOklzc3VlQ29tbWVudDIxODY2MzQ0Ng==,1217238,2016-05-12T05:27:11Z,2016-05-12T06:34:17Z,MEMBER,"@naught101 I would consider changing:
``` python
forcing_data.isel(lat=lat, lon=lon)
.to_dataframe()
.drop(['lat', 'lon'], axis=1)
```
to just `forcing_data.isel(lat=lat, lon=lon).values`, because there's no point in creating a DataFrame with a bunch of variables you wouldn't use -- pandas will be pretty wasteful in allocating this.
Otherwise that looks pretty reasonable, given the limitations of current groupby support. Now, ideally you could write something like instead:
``` python
def make_prediction(forcing_data_time_series):
predicted_values = model.predict(forcing_data_time_series.values)
return xr.DataArray(predicted_values, [flux_vars, time])
forcing_data.groupby(['lat', 'lon']).dask_apply(make_prediction)
```
This would two the 2D groupby, and then apply the predict function in parallel with dask. Sadly we don't have this feature yet, though :).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-218654283,https://api.github.com/repos/pydata/xarray/issues/818,218654283,MDEyOklzc3VlQ29tbWVudDIxODY1NDI4Mw==,1217238,2016-05-12T03:58:48Z,2016-05-12T03:58:48Z,MEMBER,"@jhamman @rabernat I'm pretty there is a good reason for that check to verify monotonicity, although I can no longer remember exactly why!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-218653355,https://api.github.com/repos/pydata/xarray/issues/818,218653355,MDEyOklzc3VlQ29tbWVudDIxODY1MzM1NQ==,1217238,2016-05-12T03:54:09Z,2016-05-12T03:54:09Z,MEMBER,"@naught101
> I want to be able to run a scikit-learn model over a bunch of variables in a 3D (lat/lon/time) dataset, and return values for each coordinate point. Is something like this multi-dimensional groupby required (I'm thinking groupby(lat, lon) => 2D matrices that can be fed straight into scikit-learn), or is there already some other mechanism that could achieve something like this? Or is the best way at the moment just to create a null dataset, and loop over lat/lon and fill in the blanks as you go?
Can you clarify exactly what shape data you want to put into scikit-learn to make predictions? What are the dimensions of your input? In principle, this is exactly the sort of thing that multi-dimensional groupby should solve, although we might also need support for multiple arguments to handle `lat`/`lon` (this should not be too difficult).
---
For the `bins` argument, I should suggest a separate DataArray/Dataset method for creating the GroupBy object. The `resample` method in xarray should be updated to return a GroupBy object (like the pandas method), and extending resample to numbers would be a natural fit. Something like `Dataset.resample(longitude=10)` could be a good way to spell this. (We would deprecate the `how`, `freq` and `dim` arguments, and ideally make all the remaining arguments keyword only.)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-207503695,https://api.github.com/repos/pydata/xarray/issues/818,207503695,MDEyOklzc3VlQ29tbWVudDIwNzUwMzY5NQ==,1217238,2016-04-08T16:29:58Z,2016-04-08T16:29:58Z,MEMBER,"@rabernat I'm not quite sure resample is the right place to put this, given that we aren't resampling on an axis. Just opened a pandas issue to discuss: https://github.com/pydata/pandas/issues/12828
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-207021028,https://api.github.com/repos/pydata/xarray/issues/818,207021028,MDEyOklzc3VlQ29tbWVudDIwNzAyMTAyOA==,1217238,2016-04-07T17:42:03Z,2016-04-07T17:42:26Z,MEMBER,"I think that if unstack things properly (only once instead of on each applied example) we should get something like this, alleviating the need for the new group name:
```
array([[ 0. , -0.5],
[ 0.5, 0]])
Coordinates:
* ny (ny) int64 0 1
* nx (nx) int64 0 1
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-206655187,https://api.github.com/repos/pydata/xarray/issues/818,206655187,MDEyOklzc3VlQ29tbWVudDIwNjY1NTE4Nw==,1217238,2016-04-07T01:48:01Z,2016-04-07T01:48:01Z,MEMBER,"@rabernat That looks like exactly the right place to me.
We only use variables for the concatenation in the `shortcut=True` path. With `shortcut=False`, we use `DataArray`/`Dataset` objects. For now, get it working with `shortcut=False` (hard code it if necessary) and I can help figure out how to extend it to `shortcut=True`.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-206445686,https://api.github.com/repos/pydata/xarray/issues/818,206445686,MDEyOklzc3VlQ29tbWVudDIwNjQ0NTY4Ng==,1217238,2016-04-06T16:13:01Z,2016-04-06T16:13:01Z,MEMBER,"(Oops, pressed the wrong button to close)
> Can you clarify what you mean by this? At what point should the unstack happen?
Consider `ds.groupby('latitude').apply(lambda x: x - x.mean())` or `ds.groupby('latitude') - ds.groupby('latitude').mean()` (these are two ways of writing the same thing). In each of these cases, the result of a groupby has the same dimensions as the original instead of replacing one or more of the original dimensions.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-206182013,https://api.github.com/repos/pydata/xarray/issues/818,206182013,MDEyOklzc3VlQ29tbWVudDIwNjE4MjAxMw==,1217238,2016-04-06T07:31:32Z,2016-04-06T07:31:32Z,MEMBER,"This will need to unstack to handle .apply. That will be nice for things like normalization.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-206165090,https://api.github.com/repos/pydata/xarray/issues/818,206165090,MDEyOklzc3VlQ29tbWVudDIwNjE2NTA5MA==,1217238,2016-04-06T07:05:05Z,2016-04-06T07:05:05Z,MEMBER,"Yes, this is awesome! I had a vague idea that `stack` could make something like this possible but hadn't really thought it through.
As for the specialized ""grouper"", I agree that that makes sense. It's basically an extension of `resample` from dates to floating point -- noting that pandas recently changed the resample API so it works a little more like groupby. `pandas.cut` could probably handle most of the logic here.
","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176