home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

19 rows where author_association = "MEMBER", issue = 146182176 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 19 ✖

issue 1

  • Multidimensional groupby · 19 ✖

author_association 1

  • MEMBER · 19 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
231256264 https://github.com/pydata/xarray/pull/818#issuecomment-231256264 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIzMTI1NjI2NA== shoyer 1217238 2016-07-08T01:50:30Z 2016-07-08T01:50:30Z MEMBER

OK, merging.....

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
230818687 https://github.com/pydata/xarray/pull/818#issuecomment-230818687 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIzMDgxODY4Nw== shoyer 1217238 2016-07-06T16:00:54Z 2016-07-06T16:00:54Z MEMBER

@rabernat I agree. I have a couple of minor style/pep8 issues, and we need an entry for "what's new", but let's merge this. I can then play around a little bit with potential fixes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
224693231 https://github.com/pydata/xarray/pull/818#issuecomment-224693231 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIyNDY5MzIzMQ== shoyer 1217238 2016-06-08T18:58:45Z 2016-06-08T18:58:45Z MEMBER

Looks like I still have a bug (failing Travis builds). Let me see if I can get that sorted out first.

On Wed, Jun 8, 2016 at 11:51 AM, Ryan Abernathey notifications@github.com wrote:

I think #875 https://github.com/pydata/xarray/pull/875 should fix the issue with concatenating index objects.

Should I try to merge your branch with my branch...or wait for your branch to get merged into master?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/818#issuecomment-224691235, or mute the thread https://github.com/notifications/unsubscribe/ABKS1oaZfZ0P384eSGKIQ8-0fbyH8KDWks5qJw86gaJpZM4IAuQH .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
224484574 https://github.com/pydata/xarray/pull/818#issuecomment-224484574 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIyNDQ4NDU3NA== shoyer 1217238 2016-06-08T04:32:29Z 2016-06-08T04:32:29Z MEMBER

I think #875 should fix the issue with concatenating index objects.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
223999761 https://github.com/pydata/xarray/pull/818#issuecomment-223999761 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIyMzk5OTc2MQ== shoyer 1217238 2016-06-06T15:45:49Z 2016-06-06T15:45:49Z MEMBER

Empty groups should be straightforward -- we should be able handle them.

Indices which don't belong to any group are indeed more problematic. I think we have three options here: 1. Raise an error when calling .groupby_bins(...) 2. Raise an error when calling .groupby_bins(...).apply(...) 3. Simply concatenate back together whatever items were grouped, and give up on the guarantee that the applying the identity function restores the original item.

I think my preference would be for option 3, though 1 or 2 could be reasonable work arounds for now (raising NotImplementedError), because 3 is likely to be a little tricky to implement.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
223870991 https://github.com/pydata/xarray/pull/818#issuecomment-223870991 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIyMzg3MDk5MQ== shoyer 1217238 2016-06-06T05:23:24Z 2016-06-06T05:23:24Z MEMBER

I think I can fix this, by making concatenation work properly on index objects. Stay tuned...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
219096410 https://github.com/pydata/xarray/pull/818#issuecomment-219096410 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxOTA5NjQxMA== shoyer 1217238 2016-05-13T16:42:58Z 2016-05-13T16:42:58Z MEMBER

Why? This was in fact my original idea, but you encouraged me to use pd.cut instead. One thing I like about cut is that it is very flexible and well documented, while digitize is somewhat obscure.

If you're not going to use the labels it produces I'm not sure there's an advantage to pd.cut. Otherwise I thought they were pretty similar.

groupby_bins seems pretty reasonable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218879360 https://github.com/pydata/xarray/pull/818#issuecomment-218879360 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODg3OTM2MA== shoyer 1217238 2016-05-12T20:41:18Z 2016-05-12T20:41:18Z MEMBER

@rabernat It's possibly a better idea to use np.digitize rather than pd.cut.

I would strongly suggest controlling labeling with a keyword argument, maybe similar to diff.

Again, rather then further overloading the user facing API .groupby(), the binning is probably best expressed in a separate method. I would suggest a .bin(bins) method on Dataset/DataArray. Then you could just use a normal call to (multi-dimensional) groupby. So instead, we might have: ds.sample_tsurf.assign(lat_bin=ds.TLAT.bin(lat_bins)).groupby('lat_bins').mean().

On second thought, this is significantly more verbose, so maybe bins in the groupby call is OK.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218806328 https://github.com/pydata/xarray/pull/818#issuecomment-218806328 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODgwNjMyOA== shoyer 1217238 2016-05-12T16:10:04Z 2016-05-12T16:10:04Z MEMBER

Ah, of course -- forcing_data is a Dataset. You definitely want to pull out the DataArray first. Then .values if what you want.

On Wed, May 11, 2016 at 11:54 PM, naught101 notifications@github.com wrote:

forcing_data.isel(lat=lat, lon=lon).values() returns a ValuesView, which scikit-learn doesn't like. However, forcing_data.isel(lat=lat, lon=lon).to_array().T seems to work..

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pydata/xarray/pull/818#issuecomment-218675077

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218672116 https://github.com/pydata/xarray/pull/818#issuecomment-218672116 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODY3MjExNg== shoyer 1217238 2016-05-12T06:34:56Z 2016-05-12T06:34:56Z MEMBER

@naught101 I was mixing up how to_dataframe() works. Please ignore it! (I edited my earlier post.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218663446 https://github.com/pydata/xarray/pull/818#issuecomment-218663446 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODY2MzQ0Ng== shoyer 1217238 2016-05-12T05:27:11Z 2016-05-12T06:34:17Z MEMBER

@naught101 I would consider changing:

python forcing_data.isel(lat=lat, lon=lon) .to_dataframe() .drop(['lat', 'lon'], axis=1)

to just forcing_data.isel(lat=lat, lon=lon).values, because there's no point in creating a DataFrame with a bunch of variables you wouldn't use -- pandas will be pretty wasteful in allocating this.

Otherwise that looks pretty reasonable, given the limitations of current groupby support. Now, ideally you could write something like instead:

``` python def make_prediction(forcing_data_time_series): predicted_values = model.predict(forcing_data_time_series.values) return xr.DataArray(predicted_values, [flux_vars, time])

forcing_data.groupby(['lat', 'lon']).dask_apply(make_prediction) ```

This would two the 2D groupby, and then apply the predict function in parallel with dask. Sadly we don't have this feature yet, though :).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218654283 https://github.com/pydata/xarray/pull/818#issuecomment-218654283 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODY1NDI4Mw== shoyer 1217238 2016-05-12T03:58:48Z 2016-05-12T03:58:48Z MEMBER

@jhamman @rabernat I'm pretty there is a good reason for that check to verify monotonicity, although I can no longer remember exactly why!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218653355 https://github.com/pydata/xarray/pull/818#issuecomment-218653355 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODY1MzM1NQ== shoyer 1217238 2016-05-12T03:54:09Z 2016-05-12T03:54:09Z MEMBER

@naught101

I want to be able to run a scikit-learn model over a bunch of variables in a 3D (lat/lon/time) dataset, and return values for each coordinate point. Is something like this multi-dimensional groupby required (I'm thinking groupby(lat, lon) => 2D matrices that can be fed straight into scikit-learn), or is there already some other mechanism that could achieve something like this? Or is the best way at the moment just to create a null dataset, and loop over lat/lon and fill in the blanks as you go?

Can you clarify exactly what shape data you want to put into scikit-learn to make predictions? What are the dimensions of your input? In principle, this is exactly the sort of thing that multi-dimensional groupby should solve, although we might also need support for multiple arguments to handle lat/lon (this should not be too difficult).


For the bins argument, I should suggest a separate DataArray/Dataset method for creating the GroupBy object. The resample method in xarray should be updated to return a GroupBy object (like the pandas method), and extending resample to numbers would be a natural fit. Something like Dataset.resample(longitude=10) could be a good way to spell this. (We would deprecate the how, freq and dim arguments, and ideally make all the remaining arguments keyword only.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
207503695 https://github.com/pydata/xarray/pull/818#issuecomment-207503695 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIwNzUwMzY5NQ== shoyer 1217238 2016-04-08T16:29:58Z 2016-04-08T16:29:58Z MEMBER

@rabernat I'm not quite sure resample is the right place to put this, given that we aren't resampling on an axis. Just opened a pandas issue to discuss: https://github.com/pydata/pandas/issues/12828

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
207021028 https://github.com/pydata/xarray/pull/818#issuecomment-207021028 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIwNzAyMTAyOA== shoyer 1217238 2016-04-07T17:42:03Z 2016-04-07T17:42:26Z MEMBER

I think that if unstack things properly (only once instead of on each applied example) we should get something like this, alleviating the need for the new group name:

<xarray.DataArray (ny: 2, nx: 2)> array([[ 0. , -0.5], [ 0.5, 0]]) Coordinates: * ny (ny) int64 0 1 * nx (nx) int64 0 1

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
206655187 https://github.com/pydata/xarray/pull/818#issuecomment-206655187 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIwNjY1NTE4Nw== shoyer 1217238 2016-04-07T01:48:01Z 2016-04-07T01:48:01Z MEMBER

@rabernat That looks like exactly the right place to me.

We only use variables for the concatenation in the shortcut=True path. With shortcut=False, we use DataArray/Dataset objects. For now, get it working with shortcut=False (hard code it if necessary) and I can help figure out how to extend it to shortcut=True.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
206445686 https://github.com/pydata/xarray/pull/818#issuecomment-206445686 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIwNjQ0NTY4Ng== shoyer 1217238 2016-04-06T16:13:01Z 2016-04-06T16:13:01Z MEMBER

(Oops, pressed the wrong button to close)

Can you clarify what you mean by this? At what point should the unstack happen?

Consider ds.groupby('latitude').apply(lambda x: x - x.mean()) or ds.groupby('latitude') - ds.groupby('latitude').mean() (these are two ways of writing the same thing). In each of these cases, the result of a groupby has the same dimensions as the original instead of replacing one or more of the original dimensions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
206182013 https://github.com/pydata/xarray/pull/818#issuecomment-206182013 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIwNjE4MjAxMw== shoyer 1217238 2016-04-06T07:31:32Z 2016-04-06T07:31:32Z MEMBER

This will need to unstack to handle .apply. That will be nice for things like normalization.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
206165090 https://github.com/pydata/xarray/pull/818#issuecomment-206165090 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIwNjE2NTA5MA== shoyer 1217238 2016-04-06T07:05:05Z 2016-04-06T07:05:05Z MEMBER

Yes, this is awesome! I had a vague idea that stack could make something like this possible but hadn't really thought it through.

As for the specialized "grouper", I agree that that makes sense. It's basically an extension of resample from dates to floating point -- noting that pandas recently changed the resample API so it works a little more like groupby. pandas.cut could probably handle most of the logic here.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 245.043ms · About: xarray-datasette