html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/818#issuecomment-230796165,https://api.github.com/repos/pydata/xarray/issues/818,230796165,MDEyOklzc3VlQ29tbWVudDIzMDc5NjE2NQ==,1197350,2016-07-06T14:50:42Z,2016-07-06T14:50:42Z,MEMBER,"I just rebased and updated this PR. I have not resolved all of the edge cases, such as what to do about non-reducing groupby_bins operations that don't span the entire coordinate. Unfortunately merging @shoyer's fix from #875 did not resolve this problem, at least not in a way that was obvious to me.
My feeling is that this PR in its current form introduces some very useful new features. For my part, I am eager to start using it for actual science projects. Multidimensional grouping is unfamiliar territory. I don't think every potential issue can be resolved by me right now via this PR--I don't have the necessary skills, nor can I anticipate every use case. I think that getting this merged and out in the wild will give us some valuable user feedback which will help figure out where to go next. Plus it would get exposed to developers with the skills to resolve some of the issues. By waiting much longer, we risk it going stale, since lots of other xarray elements are also in flux.
Please let me know what you think.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-224691235,https://api.github.com/repos/pydata/xarray/issues/818,224691235,MDEyOklzc3VlQ29tbWVudDIyNDY5MTIzNQ==,1197350,2016-06-08T18:51:37Z,2016-06-08T18:51:37Z,MEMBER,"> I think #875 should fix the issue with concatenating index objects.
Should I try to merge your branch with my branch...or wait for your branch to get merged into master?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-223934668,https://api.github.com/repos/pydata/xarray/issues/818,223934668,MDEyOklzc3VlQ29tbWVudDIyMzkzNDY2OA==,1197350,2016-06-06T11:36:02Z,2016-06-06T11:36:02Z,MEMBER,"@shoyer: I'm not sure this is as simple as a technical fix. It is a design question.
With regular `groupby`, the groups are guaranteed so span the original coordinates exactly, so you can always put the original dataarrays back together from the groupby object, i.e. `ds.groupby('dim_0').apply(lambda x: x)`.
With `groupby_bins`, the user specifies the bins and might do so in such a way that
- there are empty groups
- there are indices which don't belong to to any group
In both cases, it is not obvious to me what should happen when calling `.apply(lambda x: x)`. Especially for the latter, I would probably want to raise an error informing the user that their bins are not sufficient to reconstitute the full index.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-223817102,https://api.github.com/repos/pydata/xarray/issues/818,223817102,MDEyOklzc3VlQ29tbWVudDIyMzgxNzEwMg==,1197350,2016-06-05T14:47:12Z,2016-06-05T14:47:12Z,MEMBER,"@shoyer, @jhamman, could you give me some feedback on one outstanding issue with this PR? I am stuck on a kind of obscure edge case, but I really want to get this finished.
Consider the following groupby operation, which creates bins which are _finer_ than the original coordinate. In other words, some bins are empty because there are too many bins.
``` python
dat = xr.DataArray(np.arange(4))
dim_0_bins = np.arange(0,4.5,0.5)
gb = dat.groupby_bins('dim_0', dim_0_bins)
print(gb.groups)
```
gives
```
{'(0.5, 1]': [1], '(2.5, 3]': [3], '(1.5, 2]': [2]}
```
If I try a reducing apply operation, e.g. `gb.mean()`, it works fine. However, if I do
``` python
gb.apply(lambda x: x - x.mean())
```
I get an error on the concat step
```
--> 433 combined = self._concat(applied, shortcut=shortcut)
... [long stack trace]
IndexError: index 3 is out of bounds for axis 1 with size 3
```
I'm really not sure what the ""correct behavior"" should even be in this case. It is not even possible to reconstitute the original data array by doing `gb.apply(lambda x: x)`. The same problem arises when the groups do not span the entire coordinate (e.g. `dim_0_bins = [1,2,3]`).
Do you have any thoughts / suggestions? I'm not sure I can solve this issue right now, but I would at least like to have a more useful error message.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-221859813,https://api.github.com/repos/pydata/xarray/issues/818,221859813,MDEyOklzc3VlQ29tbWVudDIyMTg1OTgxMw==,1197350,2016-05-26T12:42:20Z,2016-05-26T12:42:20Z,MEMBER,"Just a little update--I realized that calling apply on multidimensional binned groups fails when the group is not reduced. For example
``` python
ds.groupby_bins('lat', lat_bins).apply(lambda x: x - x.mean())
```
raises errors because of conflicting coordinates when trying to concat the results. I only discovered this when making my tutorial notebook. I think I know how to fix it, but I haven't had time yet.
So it is moving along... I am excited about this feature and am confident it can make it into the next release.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-220859076,https://api.github.com/repos/pydata/xarray/issues/818,220859076,MDEyOklzc3VlQ29tbWVudDIyMDg1OTA3Ng==,1197350,2016-05-22T21:59:05Z,2016-05-22T21:59:05Z,MEMBER,"> The right thing for xarray to do is probably to throw an error when any 2d plot method is called with 2 coordinates that actually have higher dimensions.
I disagree. I don't want to use the default dimensions as the x and y coords for the plot. I want to use the true lat / lon coords, which are `xc` and `yc`. In this case, I think the plot broke because pcolormesh can't handle the way the coordinates wrap. It's not a problem with xarray. If I pass the plot through cartopy, it actually works great, because cartopy knows how to handle the 2D geographic coordinates a bit better.
``` python
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_global()
ds.Tair[0].plot.pcolormesh(ax=ax, transform=ccrs.PlateCarree(), x='xc', y='yc')
ax.coastlines()
```

This would fail of course if you could only use 1d coords for plotting, so I definitely think we should keep the plot code as is for now (not raise an error).
I am happy with this example for now.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-220833788,https://api.github.com/repos/pydata/xarray/issues/818,220833788,MDEyOklzc3VlQ29tbWVudDIyMDgzMzc4OA==,1197350,2016-05-22T13:55:51Z,2016-05-22T13:55:51Z,MEMBER,"@jhamman, @clarkfitzg: I am working on an example notebook for multidimensional coordinates. In addition to the new groupby features, I wanted to include an example of a 2D pcolormesh using the `RASM_example_data.nc` dataset.
Just doing the simplest possible thing, i.e.
``` python
ds.Tair[0].plot.pcolormesh(x='xc', y='yc')
```
gives me a slightly mangled plot:

Am I missing something obvious here?
Seems somehow related to #781, #792.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-220065292,https://api.github.com/repos/pydata/xarray/issues/818,220065292,MDEyOklzc3VlQ29tbWVudDIyMDA2NTI5Mg==,1197350,2016-05-18T15:33:45Z,2016-05-18T15:33:45Z,MEMBER,"> A nice example for the docs.
There is indeed basic documentation, but not a detailed tutorial of what these features are good for. For this, this dataset from @jhamman with a non-uniform grid would actually be ideal. The [monthly-means](http://xarray.pydata.org/en/latest/examples/monthly-means.html) example I think contains a reference to a similar dataset.
How were the files in the doc/examples directory generated?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-220029256,https://api.github.com/repos/pydata/xarray/issues/818,220029256,MDEyOklzc3VlQ29tbWVudDIyMDAyOTI1Ng==,1197350,2016-05-18T13:41:47Z,2016-05-18T13:41:47Z,MEMBER,"> Allow specification of which dims to stack.
I think this should wait for a future PR. It is pretty complicated. I think it would be better to get the current features out in the wild first and play with it a bit before moving forward.
> I ran into the index is monotonic issue, it sounds like that was resolved. Do we cover that case in a test?
It is resolved, but not tested. I'll add a test.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-219847587,https://api.github.com/repos/pydata/xarray/issues/818,219847587,MDEyOklzc3VlQ29tbWVudDIxOTg0NzU4Nw==,1197350,2016-05-17T20:43:31Z,2016-05-17T20:43:31Z,MEMBER,"@shoyer, @jhamman: I'm pretty happy with where this is at. It's quite useful for a lots of things I want to do with xarray. Any more feedback?
One outstanding issue involves some buggy behavior with `shortcut` which I don't really understand.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-219262958,https://api.github.com/repos/pydata/xarray/issues/818,219262958,MDEyOklzc3VlQ29tbWVudDIxOTI2Mjk1OA==,1197350,2016-05-15T02:44:19Z,2016-05-15T02:44:19Z,MEMBER,"Just updated this to use the `groupby_bins` syntax, which now exposes all the arguments of `pd.cut` to the user.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-219231243,https://api.github.com/repos/pydata/xarray/issues/818,219231243,MDEyOklzc3VlQ29tbWVudDIxOTIzMTI0Mw==,1197350,2016-05-14T17:00:33Z,2016-05-14T17:00:33Z,MEMBER,"This is a good question, with a simple answer (stack), but it doesn't belong on the the discussion for this PR. Open a new issue or email your question to the mailing list.
> On May 14, 2016, at 12:56 PM, James Adams notifications@github.com wrote:
>
> I would also like to do what is described below but so far have had little
> success using xarray.
>
> I have time series data (x years of monthly values) at each lat/lon point
> of a grid (x*12 times, lons, lats). I want to apply a function f() against
> the time series to return a corresponding time series of values. I then
> write these values to an output NetCDF which corresponds to the input
> NetCDF in terms of dimensions and coordinate variables. So instead of
> looping over every lat and every lon I want to apply f() in a vectorized
> manner such as what's described for xarray's groupby (in order to gain the
> expected performance from using xarray for the split-apply-combine
> pattern), but it needs to work for more than a single dimension which is
> the current capability.
>
> Has anyone done what is described above using xarray? What sort of
> performance gains can be expected using your approach?
>
> Thanks in advance for any help with this topic. My apologies if there is a
> more appropriate forum for this sort of discussion (please redirect if so),
> as this may not be applicable to the original issue...
>
> --James
>
> On Wed, May 11, 2016 at 2:24 AM, naught101 notifications@github.com wrote:
>
> > I want to be able to run a scikit-learn model over a bunch of variables in
> > a 3D (lat/lon/time) dataset, and return values for each coordinate point.
> > Is something like this multi-dimensional groupby required (I'm thinking
> > groupby(lat, lon) => 2D matrices that can be fed straight into
> > scikit-learn), or is there already some other mechanism that could achieve
> > something like this? Or is the best way at the moment just to create a null
> > dataset, and loop over lat/lon and fill in the blanks as you go?
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly or view it on GitHub
> > https://github.com/pydata/xarray/pull/818#issuecomment-218372591
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly or view it on GitHub
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-219063079,https://api.github.com/repos/pydata/xarray/issues/818,219063079,MDEyOklzc3VlQ29tbWVudDIxOTA2MzA3OQ==,1197350,2016-05-13T14:41:43Z,2016-05-13T14:41:43Z,MEMBER,"> @rabernat It's possibly a better idea to use np.digitize rather than pd.cut.
Why? This was in fact my original idea, but you encouraged me to use `pd.cut` instead. One thing I like about cut is that it is very flexible and well documented, while digitize is somewhat obscure.
What about
`ds.groupby_bins('lat', bins=lat_bins, labels=lat_labels)`
?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-218756580,https://api.github.com/repos/pydata/xarray/issues/818,218756580,MDEyOklzc3VlQ29tbWVudDIxODc1NjU4MA==,1197350,2016-05-12T13:27:38Z,2016-05-12T13:27:38Z,MEMBER,"I suppose I should also add a test for non-monotonic multidimensional binning.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-218756391,https://api.github.com/repos/pydata/xarray/issues/818,218756391,MDEyOklzc3VlQ29tbWVudDIxODc1NjM5MQ==,1197350,2016-05-12T13:26:58Z,2016-05-12T13:26:58Z,MEMBER,"@jhamman: My latest commit followed @shoyer's suggestion to fix the ""non-monotonic"" error.
I successfully loaded your data and took a zonal average in 10-degree bins with the following code:
``` python
>>> ds = xr.open_dataset('sample_for_xarray_multigroupby.nc', decode_times=False)
>>> lat_bins = np.arange(20,90,10)
>>> t_mean = ds.sample_tsurf.groupby('TLAT', bins=lat_bins).mean()
>>> t_mean
array([ 27.05354874, 24.00267499, 15.74423768, 11.16990181,
6.45922212, 0.48820518])
Coordinates:
time float64 7.226e+05
z_t float64 250.0
* TLAT (TLAT) object '(20, 30]' '(30, 40]' '(40, 50]' '(50, 60]' ...
```
The only big remaining issue is the values of the new coordinate. Currently it is just using the labels output by `pd.cut`, which are strings. This means if I try `t_mean.plot()`, I get `TypeError: Plotting requires coordinates to be numeric or dates`.
We could either allow the user to specify labels by adding a `labels` keyword to `groupby`, or we could infer the labels automatically, e.g. by taking the centered mean of the bins:
``` python
bin_labels = 0.5*(lat_bins[1:] + lat_bins[:-1]
```
Please weigh in if you have an opinion about that.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-218450849,https://api.github.com/repos/pydata/xarray/issues/818,218450849,MDEyOklzc3VlQ29tbWVudDIxODQ1MDg0OQ==,1197350,2016-05-11T12:56:47Z,2016-05-11T12:56:47Z,MEMBER,"@jhamman: Could you post [a slice of] your dataset for me to try?
> It seems this is only an issue when I specify bins. I see that there is a TODO statement there so maybe that will fix this.
The TODO comment was there when I started working on this. The error is raised by these lines
``` python
index = safe_cast_to_index(group)
if not index.is_monotonic:
# TODO: sort instead of raising an error
raise ValueError('index must be monotonic for resampling')
```
I'm not sure this check is necessary for binning.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-208092684,https://api.github.com/repos/pydata/xarray/issues/818,208092684,MDEyOklzc3VlQ29tbWVudDIwODA5MjY4NA==,1197350,2016-04-10T23:39:29Z,2016-04-10T23:39:29Z,MEMBER,"@shoyer, @jhamman I think this is ready for a review
There are two distinct features added here:
1. `groupby` works with multidimensional coordinate variables. (See example at the top of the PR.)
2. `groupby` accepts a new keyword `group_bins`, which is passed to `pandas.cut` to digitize the groups (have not documented this yet because I could use some feedback on the api). For now, the coordinates are labeled with the category labels determined by `cut`. Using the example array above
``` python
>>> da.groupby('lat', bins=[0,15,20]).apply(lambda x : x.sum())
array([1, 5])
Coordinates:
* lat (lat) object '(0, 15]' '(15, 20]'
```
I'm not sure this is the ideal behavior, since the categories are hard to slice. For my purposes, I would rather assign an integer or float index to each bin using e.g. the central value of the bin.
_note:_ Both of these features have problems when used with `shortcut=True`.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-207983237,https://api.github.com/repos/pydata/xarray/issues/818,207983237,MDEyOklzc3VlQ29tbWVudDIwNzk4MzIzNw==,1197350,2016-04-10T13:15:49Z,2016-04-10T13:15:49Z,MEMBER,"So I tracked down the cause of the original array dimensions being overwritten. It happens within `_concat_shortcut` here:
https://github.com/pydata/xarray/blob/master/xarray/core/groupby.py#L325
``` python
result._coords[concat_dim.name] = as_variable(concat_dim, copy=True)
```
At this point, `self.obj` gets modified directly.
@shoyer should I just focus on the case where `shortcut==False`? Or should I try to debug the `_concat_shortcut` method? Your inline comments (""don't worry too much about maintaining this method"") suggest that it is not going to be around forever.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-207531654,https://api.github.com/repos/pydata/xarray/issues/818,207531654,MDEyOklzc3VlQ29tbWVudDIwNzUzMTY1NA==,1197350,2016-04-08T17:39:10Z,2016-04-08T18:07:11Z,MEMBER,"I have tried adding a new keyword `bins` arg to groupby, which should accomplish what I want and more. (It will also work on regular one-dimensional groupby operations.)
The way it works is like this:
``` python
>>> ar = xr.DataArray(np.arange(4), dims='dim_0')
>>> ar
array([0, 1, 2, 3])
Coordinates:
* dim_0 (dim_0) int64 0 1 2 3
>>> ar.groupby('dim_0', bins=[2,4]).sum()
array([1, 5])
Coordinates:
* dim_0 (dim_0) int64 2 4
```
The only problem is that it seems to overwrite the original dimension of the array! After calling groupby
``` python
>>> ar
array([0, 1, 2, 3])
Coordinates:
* dim_0 (dim_0) int64 2 4
```
I think that `resample` overcomes this issue by renaming the dimension:
https://github.com/pydata/xarray/blob/master/xarray/core/common.py#L437
I guess something similar should be possible here...
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-207417668,https://api.github.com/repos/pydata/xarray/issues/818,207417668,MDEyOklzc3VlQ29tbWVudDIwNzQxNzY2OA==,1197350,2016-04-08T12:41:00Z,2016-04-08T12:41:00Z,MEMBER,"@shoyer regarding the binning, should I modify `resample` to allow for non-time dimensions? Or a new function?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-207077942,https://api.github.com/repos/pydata/xarray/issues/818,207077942,MDEyOklzc3VlQ29tbWVudDIwNzA3Nzk0Mg==,1197350,2016-04-07T20:34:53Z,2016-04-07T20:34:53Z,MEMBER,"The travis build failure is a [conda problem](https://travis-ci.org/pydata/xarray/jobs/121520763), not my commit.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-207068032,https://api.github.com/repos/pydata/xarray/issues/818,207068032,MDEyOklzc3VlQ29tbWVudDIwNzA2ODAzMg==,1197350,2016-04-07T20:03:48Z,2016-04-07T20:03:48Z,MEMBER,"I think I got it working.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-207000636,https://api.github.com/repos/pydata/xarray/issues/818,207000636,MDEyOklzc3VlQ29tbWVudDIwNzAwMDYzNg==,1197350,2016-04-07T17:14:55Z,2016-04-07T17:14:55Z,MEMBER,"My new commit supports unstacking in apply with `shortcut=True`. However, the behavior is kind of weird, in a way that is unique to the multidimensional case.
Consider the behavior of the text case:
``` python
>>> da = xr.DataArray([[0,1],[2,3]],
coords={'lon': (['ny','nx'], [[30,40],[40,50]] ),
'lat': (['ny','nx'], [[10,10],[20,20]] ),},
dims=['ny','nx'],
>>> da.groupby('lon').apply(lambda x : x - x.mean(), shortcut=False)
array([[[ 0. , nan],
[ nan, nan]],
[[ nan, -0.5],
[ 0.5, nan]],
[[ nan, nan],
[ nan, 0. ]]])
Coordinates:
* ny (ny) int64 0 1
* nx (nx) int64 0 1
lat (lon_groups, ny, nx) float64 10.0 nan nan nan nan 10.0 20.0 ...
lon (lon_groups, ny, nx) float64 30.0 nan nan nan nan 40.0 40.0 ...
* lon_groups (lon_groups) int64 30 40 50
```
When unstacking, the indices that are not part of the group get filled with nans. We are not able to put these arrays back together into a single array.
Note that if we do not rename the group name here:
https://github.com/pydata/xarray/pull/818/files#diff-96b65e0bfec9fd2b9d562483f53661f5R121
Then we get an error here:
https://github.com/pydata/xarray/pull/818/files#diff-96b65e0bfec9fd2b9d562483f53661f5R407
```
ValueError: the variable 'lon' has the same name as one of its dimensions ('lon', 'ny', 'nx'), but it is not 1-dimensional and thus it is not a valid index
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-206628737,https://api.github.com/repos/pydata/xarray/issues/818,206628737,MDEyOklzc3VlQ29tbWVudDIwNjYyODczNw==,1197350,2016-04-07T00:14:17Z,2016-04-07T00:14:17Z,MEMBER,"@shoyer I'm having a tough time figuring out where to put the unstacking logic...maybe you can give me some advice.
My first idea was to add a method to the GroupBy class called `_maybe_unstack_array` and make a call to it [here](https://github.com/pydata/xarray/blob/master/xarray/core/groupby.py#L382). The problem with that approach is that the group iteration happens over Variables, not full DataArrays, which means that unstacking is harder to do. Would need to store lots of metadata about the stacked / unstacked dimension names, sizes, etc.
If you think that is the right approach, I will forge ahead. But maybe, as the author of both the groupby and stack / unstack logic, you can see an easier way.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-206418244,https://api.github.com/repos/pydata/xarray/issues/818,206418244,MDEyOklzc3VlQ29tbWVudDIwNjQxODI0NA==,1197350,2016-04-06T15:05:54Z,2016-04-06T15:05:54Z,MEMBER,"Let me try to clarify what I mean in item 2:
> Allow specification of which dims to stack.
Say you have the following dataset
``` python
>>> ds = xr.Dataset(
{'temperature': (['time','nx'], [[1,1,2,2],[2,2,3,3]] ),
'humidity': (['time','nx'], [[1,1,1,1],[1,1,1,1]] )})
```
Now imagine you want to average humidity in temperature coordinates. (This might sound like a bizarre operation, but it is actually the foundation of a sophisticated sort of [thermodynamic analysis](http://science.sciencemag.org/content/347/6221/540).)
Currently this works as follows
``` python
>>> ds = ds.set_coords('temperature')
>>> ds.humidity.groupby('temperature').sum()
array([2, 4, 2])
Coordinates:
* temperature (temperature) int64 1 2 3
```
However, this sums over all time. What if you wanted to preserve the time dependence, but replace the `nx` coordinate with `temperature`. I would like to be able to say
``` python
ds.humidity.groupby('temperature', group_over='nx').sum()
```
and get back a DataArray with dimensions `('time', 'temperature')`.
Maybe this is already possible with a sophisticated use of `apply`. But I don't see how to do it.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-206389664,https://api.github.com/repos/pydata/xarray/issues/818,206389664,MDEyOklzc3VlQ29tbWVudDIwNjM4OTY2NA==,1197350,2016-04-06T14:09:43Z,2016-04-06T14:09:43Z,MEMBER,"> As for the specialized ""grouper"", I agree that that makes sense. It's basically an extension of resample from dates to floating point -- noting that pandas recently changed the resample API so it works a little more like groupby. pandas.cut could probably handle most of the logic here.
I normally used `numpy.digitize` for this type of thing, but `pandas.cut` indeed seems like the obvious choice.
Should this go into a separate PR?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176
https://github.com/pydata/xarray/pull/818#issuecomment-206386864,https://api.github.com/repos/pydata/xarray/issues/818,206386864,MDEyOklzc3VlQ29tbWVudDIwNjM4Njg2NA==,1197350,2016-04-06T14:04:20Z,2016-04-06T14:04:20Z,MEMBER,"> This will need to unstack to handle .apply. That will be nice for things like normalization.
Can you clarify what you mean by this? At what point should the unstack happen?
With the current code, apply seems to work ok:
``` python
>>> da.groupby('lon').apply(lambda x : (x**2).sum())
array([0, 5, 9])
Coordinates:
* lon (lon) int64 30 40 50
```
But perhaps I am missing a certain use case you have in mind?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176