issue_comments
61 rows where issue = 146182176 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Multidimensional groupby · 61 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
231256264 | https://github.com/pydata/xarray/pull/818#issuecomment-231256264 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIzMTI1NjI2NA== | shoyer 1217238 | 2016-07-08T01:50:30Z | 2016-07-08T01:50:30Z | MEMBER | OK, merging..... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
230818687 | https://github.com/pydata/xarray/pull/818#issuecomment-230818687 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIzMDgxODY4Nw== | shoyer 1217238 | 2016-07-06T16:00:54Z | 2016-07-06T16:00:54Z | MEMBER | @rabernat I agree. I have a couple of minor style/pep8 issues, and we need an entry for "what's new", but let's merge this. I can then play around a little bit with potential fixes. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
230796165 | https://github.com/pydata/xarray/pull/818#issuecomment-230796165 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIzMDc5NjE2NQ== | rabernat 1197350 | 2016-07-06T14:50:42Z | 2016-07-06T14:50:42Z | MEMBER | I just rebased and updated this PR. I have not resolved all of the edge cases, such as what to do about non-reducing groupby_bins operations that don't span the entire coordinate. Unfortunately merging @shoyer's fix from #875 did not resolve this problem, at least not in a way that was obvious to me. My feeling is that this PR in its current form introduces some very useful new features. For my part, I am eager to start using it for actual science projects. Multidimensional grouping is unfamiliar territory. I don't think every potential issue can be resolved by me right now via this PR--I don't have the necessary skills, nor can I anticipate every use case. I think that getting this merged and out in the wild will give us some valuable user feedback which will help figure out where to go next. Plus it would get exposed to developers with the skills to resolve some of the issues. By waiting much longer, we risk it going stale, since lots of other xarray elements are also in flux. Please let me know what you think. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
224693231 | https://github.com/pydata/xarray/pull/818#issuecomment-224693231 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyNDY5MzIzMQ== | shoyer 1217238 | 2016-06-08T18:58:45Z | 2016-06-08T18:58:45Z | MEMBER | Looks like I still have a bug (failing Travis builds). Let me see if I can get that sorted out first. On Wed, Jun 8, 2016 at 11:51 AM, Ryan Abernathey notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
224691235 | https://github.com/pydata/xarray/pull/818#issuecomment-224691235 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyNDY5MTIzNQ== | rabernat 1197350 | 2016-06-08T18:51:37Z | 2016-06-08T18:51:37Z | MEMBER |
Should I try to merge your branch with my branch...or wait for your branch to get merged into master? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
224484574 | https://github.com/pydata/xarray/pull/818#issuecomment-224484574 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyNDQ4NDU3NA== | shoyer 1217238 | 2016-06-08T04:32:29Z | 2016-06-08T04:32:29Z | MEMBER | I think #875 should fix the issue with concatenating index objects. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
223999761 | https://github.com/pydata/xarray/pull/818#issuecomment-223999761 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMzk5OTc2MQ== | shoyer 1217238 | 2016-06-06T15:45:49Z | 2016-06-06T15:45:49Z | MEMBER | Empty groups should be straightforward -- we should be able handle them. Indices which don't belong to any group are indeed more problematic. I think we have three options here:
1. Raise an error when calling I think my preference would be for option 3, though 1 or 2 could be reasonable work arounds for now (raising |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
223934668 | https://github.com/pydata/xarray/pull/818#issuecomment-223934668 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMzkzNDY2OA== | rabernat 1197350 | 2016-06-06T11:36:02Z | 2016-06-06T11:36:02Z | MEMBER | @shoyer: I'm not sure this is as simple as a technical fix. It is a design question. With regular With In both cases, it is not obvious to me what should happen when calling |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
223870991 | https://github.com/pydata/xarray/pull/818#issuecomment-223870991 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMzg3MDk5MQ== | shoyer 1217238 | 2016-06-06T05:23:24Z | 2016-06-06T05:23:24Z | MEMBER | I think I can fix this, by making concatenation work properly on index objects. Stay tuned... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
223817102 | https://github.com/pydata/xarray/pull/818#issuecomment-223817102 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMzgxNzEwMg== | rabernat 1197350 | 2016-06-05T14:47:12Z | 2016-06-05T14:47:12Z | MEMBER | @shoyer, @jhamman, could you give me some feedback on one outstanding issue with this PR? I am stuck on a kind of obscure edge case, but I really want to get this finished. Consider the following groupby operation, which creates bins which are finer than the original coordinate. In other words, some bins are empty because there are too many bins.
gives
If I try a reducing apply operation, e.g.
I get an error on the concat step
I'm really not sure what the "correct behavior" should even be in this case. It is not even possible to reconstitute the original data array by doing Do you have any thoughts / suggestions? I'm not sure I can solve this issue right now, but I would at least like to have a more useful error message. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
221859813 | https://github.com/pydata/xarray/pull/818#issuecomment-221859813 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMTg1OTgxMw== | rabernat 1197350 | 2016-05-26T12:42:20Z | 2016-05-26T12:42:20Z | MEMBER | Just a little update--I realized that calling apply on multidimensional binned groups fails when the group is not reduced. For example
raises errors because of conflicting coordinates when trying to concat the results. I only discovered this when making my tutorial notebook. I think I know how to fix it, but I haven't had time yet. So it is moving along... I am excited about this feature and am confident it can make it into the next release. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
220863225 | https://github.com/pydata/xarray/pull/818#issuecomment-220863225 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMDg2MzIyNQ== | clarkfitzg 5356122 | 2016-05-22T23:28:01Z | 2016-05-22T23:28:01Z | MEMBER | Ah, now I see what you were going for. More going on here than I realized. That's a nice plot :) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
220861371 | https://github.com/pydata/xarray/pull/818#issuecomment-220861371 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMDg2MTM3MQ== | jhamman 2443309 | 2016-05-22T22:47:39Z | 2016-05-22T22:47:39Z | MEMBER | @rabernat - I'm a bit late to the party here but it looks like you have gotten it straightened out. I would have suggested plotting the projected data using @clarkfitzg - this is the exact functionality we want with 2d plot coordinates and we definitely do not want to change it. It is a little annoying that |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
220859076 | https://github.com/pydata/xarray/pull/818#issuecomment-220859076 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMDg1OTA3Ng== | rabernat 1197350 | 2016-05-22T21:59:05Z | 2016-05-22T21:59:05Z | MEMBER |
I disagree. I don't want to use the default dimensions as the x and y coords for the plot. I want to use the true lat / lon coords, which are
This would fail of course if you could only use 1d coords for plotting, so I definitely think we should keep the plot code as is for now (not raise an error). I am happy with this example for now. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
220844145 | https://github.com/pydata/xarray/pull/818#issuecomment-220844145 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMDg0NDE0NQ== | clarkfitzg 5356122 | 2016-05-22T17:16:14Z | 2016-05-22T18:31:58Z | MEMBER | The problem is with the shape of these coordinates. ```
EDIT: just to be clear, it doesn't make sense to pass in 2d arrays for both x and y coordinates for a 2d plotting function. Run this: |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
220844279 | https://github.com/pydata/xarray/pull/818#issuecomment-220844279 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMDg0NDI3OQ== | clarkfitzg 5356122 | 2016-05-22T17:18:48Z | 2016-05-22T17:18:48Z | MEMBER | The right thing for |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
220833788 | https://github.com/pydata/xarray/pull/818#issuecomment-220833788 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMDgzMzc4OA== | rabernat 1197350 | 2016-05-22T13:55:51Z | 2016-05-22T13:55:51Z | MEMBER | @jhamman, @clarkfitzg: I am working on an example notebook for multidimensional coordinates. In addition to the new groupby features, I wanted to include an example of a 2D pcolormesh using the Just doing the simplest possible thing, i.e.
gives me a slightly mangled plot:
Am I missing something obvious here? Seems somehow related to #781, #792. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
220144452 | https://github.com/pydata/xarray/pull/818#issuecomment-220144452 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMDE0NDQ1Mg== | jhamman 2443309 | 2016-05-18T20:15:04Z | 2016-05-18T20:15:04Z | MEMBER | @rabernat - the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
220065292 | https://github.com/pydata/xarray/pull/818#issuecomment-220065292 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMDA2NTI5Mg== | rabernat 1197350 | 2016-05-18T15:33:45Z | 2016-05-18T15:33:45Z | MEMBER |
There is indeed basic documentation, but not a detailed tutorial of what these features are good for. For this, this dataset from @jhamman with a non-uniform grid would actually be ideal. The monthly-means example I think contains a reference to a similar dataset. How were the files in the doc/examples directory generated? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
220029256 | https://github.com/pydata/xarray/pull/818#issuecomment-220029256 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIyMDAyOTI1Ng== | rabernat 1197350 | 2016-05-18T13:41:47Z | 2016-05-18T13:41:47Z | MEMBER |
I think this should wait for a future PR. It is pretty complicated. I think it would be better to get the current features out in the wild first and play with it a bit before moving forward.
It is resolved, but not tested. I'll add a test. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
219875456 | https://github.com/pydata/xarray/pull/818#issuecomment-219875456 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxOTg3NTQ1Ng== | jhamman 2443309 | 2016-05-17T22:38:56Z | 2016-05-17T22:38:56Z | MEMBER | @rabernat - I just had a look through the code and it looks pretty good. I have a few broader questions though: 1. You have a few outstanding todo items from the first comment in your PR:
Where do we stand on these? You have some simple examples in the docs now but maybe you were thinking of more complete examples? 2. In https://github.com/pydata/xarray/pull/818#issuecomment-218358050, I ran into the index is monotonic issue, it sounds like that was resolved. Do we cover that case in a test? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
219847587 | https://github.com/pydata/xarray/pull/818#issuecomment-219847587 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxOTg0NzU4Nw== | rabernat 1197350 | 2016-05-17T20:43:31Z | 2016-05-17T20:43:31Z | MEMBER | @shoyer, @jhamman: I'm pretty happy with where this is at. It's quite useful for a lots of things I want to do with xarray. Any more feedback? One outstanding issue involves some buggy behavior with |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
219262958 | https://github.com/pydata/xarray/pull/818#issuecomment-219262958 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxOTI2Mjk1OA== | rabernat 1197350 | 2016-05-15T02:44:19Z | 2016-05-15T02:44:19Z | MEMBER | Just updated this to use the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
219231243 | https://github.com/pydata/xarray/pull/818#issuecomment-219231243 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxOTIzMTI0Mw== | rabernat 1197350 | 2016-05-14T17:00:33Z | 2016-05-14T17:00:33Z | MEMBER | This is a good question, with a simple answer (stack), but it doesn't belong on the the discussion for this PR. Open a new issue or email your question to the mailing list.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
219231028 | https://github.com/pydata/xarray/pull/818#issuecomment-219231028 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxOTIzMTAyOA== | monocongo 1328158 | 2016-05-14T16:56:37Z | 2016-05-14T16:56:37Z | NONE | I would also like to do what is described below but so far have had little success using xarray. I have time series data (x years of monthly values) at each lat/lon point of a grid (x*12 times, lons, lats). I want to apply a function f() against the time series to return a corresponding time series of values. I then write these values to an output NetCDF which corresponds to the input NetCDF in terms of dimensions and coordinate variables. So instead of looping over every lat and every lon I want to apply f() in a vectorized manner such as what's described for xarray's groupby (in order to gain the expected performance from using xarray for the split-apply-combine pattern), but it needs to work for more than a single dimension which is the current capability. Has anyone done what is described above using xarray? What sort of performance gains can be expected using your approach? Thanks in advance for any help with this topic. My apologies if there is a more appropriate forum for this sort of discussion (please redirect if so), as this may not be applicable to the original issue... --James On Wed, May 11, 2016 at 2:24 AM, naught101 notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
219096410 | https://github.com/pydata/xarray/pull/818#issuecomment-219096410 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxOTA5NjQxMA== | shoyer 1217238 | 2016-05-13T16:42:58Z | 2016-05-13T16:42:58Z | MEMBER |
If you're not going to use the labels it produces I'm not sure there's an advantage to
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
219063079 | https://github.com/pydata/xarray/pull/818#issuecomment-219063079 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxOTA2MzA3OQ== | rabernat 1197350 | 2016-05-13T14:41:43Z | 2016-05-13T14:41:43Z | MEMBER |
Why? This was in fact my original idea, but you encouraged me to use What about
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218879360 | https://github.com/pydata/xarray/pull/818#issuecomment-218879360 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODg3OTM2MA== | shoyer 1217238 | 2016-05-12T20:41:18Z | 2016-05-12T20:41:18Z | MEMBER | @rabernat It's possibly a better idea to use I would strongly suggest controlling labeling with a keyword argument, maybe similar to diff. Again, rather then further overloading the user facing API On second thought, this is significantly more verbose, so maybe |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218806328 | https://github.com/pydata/xarray/pull/818#issuecomment-218806328 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODgwNjMyOA== | shoyer 1217238 | 2016-05-12T16:10:04Z | 2016-05-12T16:10:04Z | MEMBER | Ah, of course -- forcing_data is a Dataset. You definitely want to pull out the DataArray first. Then .values if what you want. On Wed, May 11, 2016 at 11:54 PM, naught101 notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218756580 | https://github.com/pydata/xarray/pull/818#issuecomment-218756580 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODc1NjU4MA== | rabernat 1197350 | 2016-05-12T13:27:38Z | 2016-05-12T13:27:38Z | MEMBER | I suppose I should also add a test for non-monotonic multidimensional binning. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218756391 | https://github.com/pydata/xarray/pull/818#issuecomment-218756391 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODc1NjM5MQ== | rabernat 1197350 | 2016-05-12T13:26:58Z | 2016-05-12T13:26:58Z | MEMBER | @jhamman: My latest commit followed @shoyer's suggestion to fix the "non-monotonic" error. I successfully loaded your data and took a zonal average in 10-degree bins with the following code: ``` python
The only big remaining issue is the values of the new coordinate. Currently it is just using the labels output by We could either allow the user to specify labels by adding a
Please weigh in if you have an opinion about that. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218675077 | https://github.com/pydata/xarray/pull/818#issuecomment-218675077 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODY3NTA3Nw== | naught101 167164 | 2016-05-12T06:54:53Z | 2016-05-12T06:54:53Z | NONE |
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218672116 | https://github.com/pydata/xarray/pull/818#issuecomment-218672116 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODY3MjExNg== | shoyer 1217238 | 2016-05-12T06:34:56Z | 2016-05-12T06:34:56Z | MEMBER | @naught101 I was mixing up how |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218663446 | https://github.com/pydata/xarray/pull/818#issuecomment-218663446 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODY2MzQ0Ng== | shoyer 1217238 | 2016-05-12T05:27:11Z | 2016-05-12T06:34:17Z | MEMBER | @naught101 I would consider changing:
to just Otherwise that looks pretty reasonable, given the limitations of current groupby support. Now, ideally you could write something like instead: ``` python def make_prediction(forcing_data_time_series): predicted_values = model.predict(forcing_data_time_series.values) return xr.DataArray(predicted_values, [flux_vars, time]) forcing_data.groupby(['lat', 'lon']).dask_apply(make_prediction) ``` This would two the 2D groupby, and then apply the predict function in parallel with dask. Sadly we don't have this feature yet, though :). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218667702 | https://github.com/pydata/xarray/pull/818#issuecomment-218667702 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODY2NzcwMg== | naught101 167164 | 2016-05-12T06:02:55Z | 2016-05-12T06:02:55Z | NONE | @shoyer: Where does |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218654978 | https://github.com/pydata/xarray/pull/818#issuecomment-218654978 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODY1NDk3OA== | naught101 167164 | 2016-05-12T04:02:43Z | 2016-05-12T04:03:01Z | NONE | Example forcing data:
Where there might be an arbitrary number of data variables, and the scikit-learn input would be time (rows) by data variables (columns). I'm currently doing this: ``` python def predict_gridded(model, forcing_data, flux_vars): """predict model results for gridded data
``` and I think it's working (still debugging, and it's pretty slow running) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218654283 | https://github.com/pydata/xarray/pull/818#issuecomment-218654283 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODY1NDI4Mw== | shoyer 1217238 | 2016-05-12T03:58:48Z | 2016-05-12T03:58:48Z | MEMBER | @jhamman @rabernat I'm pretty there is a good reason for that check to verify monotonicity, although I can no longer remember exactly why! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218653355 | https://github.com/pydata/xarray/pull/818#issuecomment-218653355 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODY1MzM1NQ== | shoyer 1217238 | 2016-05-12T03:54:09Z | 2016-05-12T03:54:09Z | MEMBER | @naught101
Can you clarify exactly what shape data you want to put into scikit-learn to make predictions? What are the dimensions of your input? In principle, this is exactly the sort of thing that multi-dimensional groupby should solve, although we might also need support for multiple arguments to handle For the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218510748 | https://github.com/pydata/xarray/pull/818#issuecomment-218510748 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODUxMDc0OA== | jhamman 2443309 | 2016-05-11T16:18:05Z | 2016-05-11T16:18:05Z | MEMBER | @rabernat - See link to 2d slice with coordinates below: sample_for_xarray_multigroupby.nc.zip As for the TODO, I see now that it was there before and I agree that we should be able to side step the sorted requirement. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218450849 | https://github.com/pydata/xarray/pull/818#issuecomment-218450849 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODQ1MDg0OQ== | rabernat 1197350 | 2016-05-11T12:56:47Z | 2016-05-11T12:56:47Z | MEMBER | @jhamman: Could you post [a slice of] your dataset for me to try?
The TODO comment was there when I started working on this. The error is raised by these lines
I'm not sure this check is necessary for binning. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218372591 | https://github.com/pydata/xarray/pull/818#issuecomment-218372591 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODM3MjU5MQ== | naught101 167164 | 2016-05-11T06:24:11Z | 2016-05-11T06:24:11Z | NONE | I want to be able to run a scikit-learn model over a bunch of variables in a 3D (lat/lon/time) dataset, and return values for each coordinate point. Is something like this multi-dimensional groupby required (I'm thinking groupby(lat, lon) => 2D matrices that can be fed straight into scikit-learn), or is there already some other mechanism that could achieve something like this? Or is the best way at the moment just to create a null dataset, and loop over lat/lon and fill in the blanks as you go? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
218358050 | https://github.com/pydata/xarray/pull/818#issuecomment-218358050 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIxODM1ODA1MA== | jhamman 2443309 | 2016-05-11T04:21:16Z | 2016-05-11T04:21:16Z | MEMBER | @rabernat - Sorry this took so long. Comments as I play around with the new feature...
1. I was getting some strange memory errors when trying this multidimensional groupbyon a large 4d ocean dataset (nlat: 720, nlon: 1280, time: 424, z_t: 45). my IPython Kernel just kept dying. Command was ``` pytb ----> 1 da.groupby('TLAT', bins=[50, 60, 70, 80, 90]) /Users/jhamman/Dropbox/src/xarray/xarray/core/common.py in groupby(self, group, squeeze, bins) 352 if isinstance(group, basestring): 353 group = self[group] --> 354 return self.groupby_cls(self, group, squeeze=squeeze, bins=bins) 355 356 def rolling(self, min_periods=None, center=False, **windows): /Users/jhamman/Dropbox/src/xarray/xarray/core/groupby.py in init(self, obj, group, squeeze, grouper, bins) 141 if not index.is_monotonic: 142 # TODO: sort instead of raising an error --> 143 raise ValueError('index must be monotonic for resampling') 144 s = pd.Series(np.arange(index.size), index) 145 if grouper is not None: ValueError: index must be monotonic for resampling ``` It seems this is only an issue when I specify Based on the datasets I have handy right now, I think number 2 in my list is a show show stopper so I think we want to make sure that feature makes it into this PR. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
208631007 | https://github.com/pydata/xarray/pull/818#issuecomment-208631007 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwODYzMTAwNw== | jhamman 2443309 | 2016-04-12T00:08:27Z | 2016-04-12T00:08:27Z | MEMBER | This looks really promising. I've gone through the code for the first time and had just a few comments. I'll pull your branch down and give it a test drive on some real data. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
208092684 | https://github.com/pydata/xarray/pull/818#issuecomment-208092684 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwODA5MjY4NA== | rabernat 1197350 | 2016-04-10T23:39:29Z | 2016-04-10T23:39:29Z | MEMBER | @shoyer, @jhamman I think this is ready for a review There are two distinct features added here:
1. ``` python
I'm not sure this is the ideal behavior, since the categories are hard to slice. For my purposes, I would rather assign an integer or float index to each bin using e.g. the central value of the bin. note: Both of these features have problems when used with |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
207983237 | https://github.com/pydata/xarray/pull/818#issuecomment-207983237 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNzk4MzIzNw== | rabernat 1197350 | 2016-04-10T13:15:49Z | 2016-04-10T13:15:49Z | MEMBER | So I tracked down the cause of the original array dimensions being overwritten. It happens within
At this point, @shoyer should I just focus on the case where |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
207531654 | https://github.com/pydata/xarray/pull/818#issuecomment-207531654 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNzUzMTY1NA== | rabernat 1197350 | 2016-04-08T17:39:10Z | 2016-04-08T18:07:11Z | MEMBER | I have tried adding a new keyword The way it works is like this: ``` python
The only problem is that it seems to overwrite the original dimension of the array! After calling groupby ``` python
I think that I guess something similar should be possible here... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
207503695 | https://github.com/pydata/xarray/pull/818#issuecomment-207503695 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNzUwMzY5NQ== | shoyer 1217238 | 2016-04-08T16:29:58Z | 2016-04-08T16:29:58Z | MEMBER | @rabernat I'm not quite sure resample is the right place to put this, given that we aren't resampling on an axis. Just opened a pandas issue to discuss: https://github.com/pydata/pandas/issues/12828 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
207417668 | https://github.com/pydata/xarray/pull/818#issuecomment-207417668 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNzQxNzY2OA== | rabernat 1197350 | 2016-04-08T12:41:00Z | 2016-04-08T12:41:00Z | MEMBER | @shoyer regarding the binning, should I modify |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
207077942 | https://github.com/pydata/xarray/pull/818#issuecomment-207077942 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNzA3Nzk0Mg== | rabernat 1197350 | 2016-04-07T20:34:53Z | 2016-04-07T20:34:53Z | MEMBER | The travis build failure is a conda problem, not my commit. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
207068032 | https://github.com/pydata/xarray/pull/818#issuecomment-207068032 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNzA2ODAzMg== | rabernat 1197350 | 2016-04-07T20:03:48Z | 2016-04-07T20:03:48Z | MEMBER | I think I got it working. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
207021028 | https://github.com/pydata/xarray/pull/818#issuecomment-207021028 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNzAyMTAyOA== | shoyer 1217238 | 2016-04-07T17:42:03Z | 2016-04-07T17:42:26Z | MEMBER | I think that if unstack things properly (only once instead of on each applied example) we should get something like this, alleviating the need for the new group name:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
207000636 | https://github.com/pydata/xarray/pull/818#issuecomment-207000636 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNzAwMDYzNg== | rabernat 1197350 | 2016-04-07T17:14:55Z | 2016-04-07T17:14:55Z | MEMBER | My new commit supports unstacking in apply with Consider the behavior of the text case: ``` python
Coordinates: * ny (ny) int64 0 1 * nx (nx) int64 0 1 lat (lon_groups, ny, nx) float64 10.0 nan nan nan nan 10.0 20.0 ... lon (lon_groups, ny, nx) float64 30.0 nan nan nan nan 40.0 40.0 ... * lon_groups (lon_groups) int64 30 40 50 ``` When unstacking, the indices that are not part of the group get filled with nans. We are not able to put these arrays back together into a single array. Note that if we do not rename the group name here: https://github.com/pydata/xarray/pull/818/files#diff-96b65e0bfec9fd2b9d562483f53661f5R121 Then we get an error here: https://github.com/pydata/xarray/pull/818/files#diff-96b65e0bfec9fd2b9d562483f53661f5R407
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
206655187 | https://github.com/pydata/xarray/pull/818#issuecomment-206655187 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNjY1NTE4Nw== | shoyer 1217238 | 2016-04-07T01:48:01Z | 2016-04-07T01:48:01Z | MEMBER | @rabernat That looks like exactly the right place to me. We only use variables for the concatenation in the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
206628737 | https://github.com/pydata/xarray/pull/818#issuecomment-206628737 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNjYyODczNw== | rabernat 1197350 | 2016-04-07T00:14:17Z | 2016-04-07T00:14:17Z | MEMBER | @shoyer I'm having a tough time figuring out where to put the unstacking logic...maybe you can give me some advice. My first idea was to add a method to the GroupBy class called If you think that is the right approach, I will forge ahead. But maybe, as the author of both the groupby and stack / unstack logic, you can see an easier way. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
206468443 | https://github.com/pydata/xarray/pull/818#issuecomment-206468443 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNjQ2ODQ0Mw== | jhamman 2443309 | 2016-04-06T17:09:31Z | 2016-04-06T17:09:31Z | MEMBER | @rabernat - I don't have much to add right now but I've very excited about this addition. Once you've filled in few more of the features, ping me and I'll give it a full review and will test it out in some applications we have in house. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
206445686 | https://github.com/pydata/xarray/pull/818#issuecomment-206445686 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNjQ0NTY4Ng== | shoyer 1217238 | 2016-04-06T16:13:01Z | 2016-04-06T16:13:01Z | MEMBER | (Oops, pressed the wrong button to close)
Consider |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
206418244 | https://github.com/pydata/xarray/pull/818#issuecomment-206418244 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNjQxODI0NA== | rabernat 1197350 | 2016-04-06T15:05:54Z | 2016-04-06T15:05:54Z | MEMBER | Let me try to clarify what I mean in item 2:
Say you have the following dataset ``` python
Now imagine you want to average humidity in temperature coordinates. (This might sound like a bizarre operation, but it is actually the foundation of a sophisticated sort of thermodynamic analysis.) Currently this works as follows ``` python
However, this sums over all time. What if you wanted to preserve the time dependence, but replace the
and get back a DataArray with dimensions Maybe this is already possible with a sophisticated use of |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
206389664 | https://github.com/pydata/xarray/pull/818#issuecomment-206389664 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNjM4OTY2NA== | rabernat 1197350 | 2016-04-06T14:09:43Z | 2016-04-06T14:09:43Z | MEMBER |
I normally used Should this go into a separate PR? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
206386864 | https://github.com/pydata/xarray/pull/818#issuecomment-206386864 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNjM4Njg2NA== | rabernat 1197350 | 2016-04-06T14:04:20Z | 2016-04-06T14:04:20Z | MEMBER |
Can you clarify what you mean by this? At what point should the unstack happen? With the current code, apply seems to work ok: ``` python
But perhaps I am missing a certain use case you have in mind? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
206182013 | https://github.com/pydata/xarray/pull/818#issuecomment-206182013 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNjE4MjAxMw== | shoyer 1217238 | 2016-04-06T07:31:32Z | 2016-04-06T07:31:32Z | MEMBER | This will need to unstack to handle .apply. That will be nice for things like normalization. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 | |
206165090 | https://github.com/pydata/xarray/pull/818#issuecomment-206165090 | https://api.github.com/repos/pydata/xarray/issues/818 | MDEyOklzc3VlQ29tbWVudDIwNjE2NTA5MA== | shoyer 1217238 | 2016-04-06T07:05:05Z | 2016-04-06T07:05:05Z | MEMBER | Yes, this is awesome! I had a vague idea that As for the specialized "grouper", I agree that that makes sense. It's basically an extension of |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Multidimensional groupby 146182176 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 6