issue_comments
21 rows where issue = 750985364 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Add histogram method · 21 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1433726618 | https://github.com/pydata/xarray/issues/4610#issuecomment-1433726618 | https://api.github.com/repos/pydata/xarray/issues/4610 | IC_kwDOAMm_X85VdO6a | TomNicholas 35968931 | 2023-02-16T21:17:56Z | 2023-02-16T21:17:56Z | MEMBER |
Can we not just test the in-memory performance by |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
1433686861 | https://github.com/pydata/xarray/issues/4610#issuecomment-1433686861 | https://api.github.com/repos/pydata/xarray/issues/4610 | IC_kwDOAMm_X85VdFNN | Illviljan 14371165 | 2023-02-16T20:39:54Z | 2023-02-16T20:39:54Z | MEMBER | Nice, I was looking at the real example too, |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
1433675446 | https://github.com/pydata/xarray/issues/4610#issuecomment-1433675446 | https://api.github.com/repos/pydata/xarray/issues/4610 | IC_kwDOAMm_X85VdCa2 | TomNicholas 35968931 | 2023-02-16T20:29:25Z | 2023-02-16T20:29:25Z | MEMBER |
I think I just timed the difference in the (unweighted) "real" example I gave in the notebook. (Not the weighted one because that didn't give the right answer with flox for some reason.)
Fair point, worth trying. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
1433670641 | https://github.com/pydata/xarray/issues/4610#issuecomment-1433670641 | https://api.github.com/repos/pydata/xarray/issues/4610 | IC_kwDOAMm_X85VdBPx | Illviljan 14371165 | 2023-02-16T20:24:51Z | 2023-02-16T20:25:36Z | MEMBER |
Could you show the example that's this slow, @TomNicholas ? So I can play around with it too. One thing I noticed in your notebook is that you haven't used |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
1425198851 | https://github.com/pydata/xarray/issues/4610#issuecomment-1425198851 | https://api.github.com/repos/pydata/xarray/issues/4610 | IC_kwDOAMm_X85U8s8D | dcherian 2448579 | 2023-02-10T05:38:13Z | 2023-02-10T05:38:31Z | MEMBER |
Nah, in my experience, the overhead is "factorizing" (pd.cut/np.digitize) or converting to integer bins, and then converting the nD problem to a 1D problem for bincount. numba doesn't really help. 3-4x is a lot bigger than I expected. I was hoping for under 2x because flox is more general. I think the problem is We could swap that out easily here: https://github.com/xarray-contrib/flox/blob/daebc868c13dad74a55d74f3e5d24e0f6bbbc118/flox/core.py#L473 I think the one special case to consider is binning datetimes, and that digitize and pd.cut have different defaults for
Ideally As a workaround, we could replace
Yup. unlikely to help here. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
1423144049 | https://github.com/pydata/xarray/issues/4610#issuecomment-1423144049 | https://api.github.com/repos/pydata/xarray/issues/4610 | IC_kwDOAMm_X85U03Rx | TomNicholas 35968931 | 2023-02-08T19:40:58Z | 2023-02-08T20:25:04Z | MEMBER | Q: Use xhistogram approach or flox-powered approach?@dcherian recently showed how his flox package can perform histograms as groupby-like reductions. This begs the question of which approach would be better to use in a histogram function in xarray. (This is related to but better than what we had tried previously with xarray groupby and numpy_groupies.) Here's a WIP notebook comparing the two approaches. Both approaches can feasibly do: - Histograms which leave some dimensions excluded (broadcast over), - Multi-dimensional histograms (e.g. binning two different variables into one 2D bin), - Normalized histograms (return PDFs instead of counts), - Weighted histograms, - Multi-dimensional bins (as @aaronspring asks for above - but it requires work - see how to do it flox, and my stalled PR to xhistogram). Pros of using flox-powered reductions:
Pros of using xhistogram's blockwise bincount approach:
Other thoughts:
xref https://github.com/xgcm/xhistogram/issues/60, https://github.com/xgcm/xhistogram/issues/28 |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
858693187 | https://github.com/pydata/xarray/issues/4610#issuecomment-858693187 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1ODY5MzE4Nw== | TomNicholas 35968931 | 2021-06-10T14:54:31Z | 2021-06-10T14:54:31Z | MEMBER |
Given the performance I found in https://github.com/xgcm/xhistogram/issues/60, I think we probably want to use the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
853143060 | https://github.com/pydata/xarray/issues/4610#issuecomment-853143060 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1MzE0MzA2MA== | TomNicholas 35968931 | 2021-06-02T15:51:28Z | 2021-06-02T15:51:36Z | MEMBER | Okay great, thanks for the patient explanation @aaronspring ! Will tag you when this has progressed to the point that you can try it out. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
853138609 | https://github.com/pydata/xarray/issues/4610#issuecomment-853138609 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1MzEzODYwOQ== | aaronspring 12237157 | 2021-06-02T15:45:45Z | 2021-06-02T15:45:45Z | CONTRIBUTOR |
agree.
looking forward to the PR |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
853123125 | https://github.com/pydata/xarray/issues/4610#issuecomment-853123125 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1MzEyMzEyNQ== | TomNicholas 35968931 | 2021-06-02T15:26:06Z | 2021-06-02T15:26:06Z | MEMBER |
This makes sense, but it sounds like this suggestion (of accepting Datasets not just DataArrays) is mostly a convenience tool for applying histograms to particular variables across multiple datasets quickly. It's not fundamentally different to picking and choosing the variables you want from multiple datasets and feeding them in to I think we should focus on including features that enable analyses that would otherwise be difficult or impossible, for example ND bins: without allowing bins to be >1D at a low level internally then it would be fairly difficult to replicate the same functionality just by wrapping |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
852944667 | https://github.com/pydata/xarray/issues/4610#issuecomment-852944667 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1Mjk0NDY2Nw== | aaronspring 12237157 | 2021-06-02T11:22:07Z | 2021-06-02T11:22:07Z | CONTRIBUTOR | I like your explanation of the two different inputs @dougiesquire and for multi-dim datasets these must be xr.datasets. my point about the |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
852836671 | https://github.com/pydata/xarray/issues/4610#issuecomment-852836671 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1MjgzNjY3MQ== | dougiesquire 42455466 | 2021-06-02T08:12:06Z | 2021-06-02T08:12:06Z | NONE | We have a very thin wrapper of xhistogram in xskillscore for calculating histograms from Datasets. It simply calculates the histograms independently for all variables that exist in all Datasets. This makes sense in the context of calculating skill score where the first Dataset corresponds to observations and the second to forecasts, and we want to calculate the histograms between matched variables in each dataset. However, this might be quite a specific use case and is probably not what we'd want to do in the general case. I like @TomNicholas 's proposal for Dataset functionality. Is this what you're getting at @aaronspring ? Or am I misunderstanding? |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
852428526 | https://github.com/pydata/xarray/issues/4610#issuecomment-852428526 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1MjQyODUyNg== | aaronspring 12237157 | 2021-06-01T20:36:25Z | 2021-06-01T20:36:25Z | CONTRIBUTOR | I am unsure about this and cannot manage to put my Südasien down precisely. Calculating a contingency table for instance between two multivar inputs: ˋˋˋ xhistogram(ds_observations_multivar, ds_forecast_multivar, bins=[ds_obs_multivar_edges, ds_forecast_multivar_edges ]) ˋˋˋ maybe @dougiesquire can phrase this more precisely |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
852369352 | https://github.com/pydata/xarray/issues/4610#issuecomment-852369352 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1MjM2OTM1Mg== | TomNicholas 35968931 | 2021-06-01T18:59:37Z | 2021-06-01T18:59:37Z | MEMBER | For each dataset in what? Do you mean for each input dataarray? I'm proposing an API in which you either pass multiple DataArrays as data (what xhistogram currently accepts), or you can call
If bins can be a list of multiple dataarrays then you can have this, right? i.e.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
852364749 | https://github.com/pydata/xarray/issues/4610#issuecomment-852364749 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1MjM2NDc0OQ== | aaronspring 12237157 | 2021-06-01T18:51:47Z | 2021-06-01T18:51:47Z | CONTRIBUTOR |
with dataset bins I want to have different bin_edges for each dataset. If bins is only a dataArray, I cannot have this. Can I? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
852257864 | https://github.com/pydata/xarray/issues/4610#issuecomment-852257864 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1MjI1Nzg2NA== | TomNicholas 35968931 | 2021-06-01T16:21:39Z | 2021-06-01T16:21:53Z | MEMBER |
5400 is right now just a skeleton, it won't compute anything other than a
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
852231700 | https://github.com/pydata/xarray/issues/4610#issuecomment-852231700 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1MjIzMTcwMA== | aaronspring 12237157 | 2021-06-01T15:45:00Z | 2021-06-01T15:45:00Z | CONTRIBUTOR | I tried to show in https://gist.github.com/aaronspring/251553f132202cc91aadde03f2a452f9 how I would like to use xr.Datasets as I tried show in the gist that I could be also nice to allow xr.Datasets as bins if the inputs are xr.Datasets.
I cannot find this in #5400. I should checkout and run the code locally. Yep, the example xskillscore code posted doesnt allow nd bins. forgot that. correct. in my head thinking about the future it does. https://github.com/xarray-contrib/xskillscore/blob/6f7be06098eefa1cdb90f7319f577c274621301c/xskillscore/core/probabilistic.py#L498 takes xr.Datasets as bins and in a previous version we used |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
852174014 | https://github.com/pydata/xarray/issues/4610#issuecomment-852174014 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1MjE3NDAxNA== | TomNicholas 35968931 | 2021-06-01T14:31:01Z | 2021-06-01T14:31:01Z | MEMBER | @aaronspring I'm a bit confused by your comment. The (proposed) API in #5400 does have a That's not the same thing as using Datasets as bins though - but I'm not really sure I understand the use case for that or what that allows? You can already choose different bins to use for each input variable, are you saying it would be neater if you could assign bins to input variables via a dict-like dataset rather than the arguments being in the corresponding positions in a list? The example you linked doesn't pass datasets as bins either, it just loops over multiple input datasets and assumes you want to calculate joint histograms between those datasets. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
852120100 | https://github.com/pydata/xarray/issues/4610#issuecomment-852120100 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg1MjEyMDEwMA== | aaronspring 12237157 | 2021-06-01T13:22:25Z | 2021-06-01T13:24:19Z | CONTRIBUTOR | what about a list of @dougiesquire implemented this in https://github.com/xarray-contrib/xskillscore/blob/2217b58c536ec1b3d2c42265ed6689a740c2b3bf/xskillscore/core/utils.py#L133 EDIT: seeing now that this issue and #5400 aims to implement xr.DataArray.hist only. xr.Dataset would be also nice :) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
844322268 | https://github.com/pydata/xarray/issues/4610#issuecomment-844322268 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg0NDMyMjI2OA== | TomNicholas 35968931 | 2021-05-19T17:37:49Z | 2021-05-28T14:07:25Z | MEMBER | Update on this: in a PR to xhistogram we have a rough proof-of-principle for a dask-parallelized, axis-aware implementation of N-dimensional histogram calculations, suitable for eventually integrating into xarray. We still need to complete the work over in xhistogram, but for now I want to suggest what I think the eventual API should be for this functionality within xarray: Top-level functionxhistogram's xarray API is essentially one New methodsWe could also add a The existing
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 | |
846418243 | https://github.com/pydata/xarray/issues/4610#issuecomment-846418243 | https://api.github.com/repos/pydata/xarray/issues/4610 | MDEyOklzc3VlQ29tbWVudDg0NjQxODI0Mw== | Illviljan 14371165 | 2021-05-22T14:46:13Z | 2021-05-22T14:46:13Z | MEMBER |
Should be fine I think. Matplolib explains how to use Some reading if wanting to do the plot by hand: https://stackoverflow.com/questions/5328556/histogram-matplotlib https://stackoverflow.com/questions/33203645/how-to-plot-a-histogram-using-matplotlib-in-python-with-a-list-of-data |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
Add histogram method 750985364 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 5