issue_comments
17 rows where author_association = "MEMBER" and issue = 252358450 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Automatic parallelization for dask arrays in apply_ufunc · 17 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
335316764 | https://github.com/pydata/xarray/pull/1517#issuecomment-335316764 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzNTMxNjc2NA== | shoyer 1217238 | 2017-10-09T23:28:52Z | 2017-10-09T23:28:52Z | MEMBER | I'll start on my PR to expose this as public API -- hopefully will make some progress on my flight from NY to SF tonight. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
335316029 | https://github.com/pydata/xarray/pull/1517#issuecomment-335316029 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzNTMxNjAyOQ== | jhamman 2443309 | 2017-10-09T23:23:45Z | 2017-10-09T23:23:45Z | MEMBER | Great. Go ahead and merge it then. I'm very excited about this feature. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
335315831 | https://github.com/pydata/xarray/pull/1517#issuecomment-335315831 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzNTMxNTgzMQ== | shoyer 1217238 | 2017-10-09T23:22:33Z | 2017-10-09T23:22:33Z | MEMBER | I think this is ready. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
335315689 | https://github.com/pydata/xarray/pull/1517#issuecomment-335315689 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzNTMxNTY4OQ== | jhamman 2443309 | 2017-10-09T23:21:32Z | 2017-10-09T23:21:32Z | MEMBER | @shoyer - anything left to do here? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
330709438 | https://github.com/pydata/xarray/pull/1517#issuecomment-330709438 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzMDcwOTQzOA== | jhamman 2443309 | 2017-09-20T00:18:32Z | 2017-09-20T00:18:32Z | MEMBER | @shoyer - My vote is for something closer to #2. Your example scenario is something I run into frequently. In cases like this, I think its better to tell the user that they are not providing an appropriate input rather than attempting to rechunk a dataset. (This is somewhat related to #1440) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
330701921 | https://github.com/pydata/xarray/pull/1517#issuecomment-330701921 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzMDcwMTkyMQ== | mrocklin 306380 | 2017-09-19T23:27:49Z | 2017-09-19T23:27:49Z | MEMBER | The heuristics we have are I think just of the form "did you make way more chunks than you had previously". I can imagine other heuristics of the form "some of your new chunks are several times larger than your previous chunks". In general these heuristics might be useful in several places. It might make sense to build them in a |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
330701517 | https://github.com/pydata/xarray/pull/1517#issuecomment-330701517 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzMDcwMTUxNw== | shoyer 1217238 | 2017-09-19T23:25:08Z | 2017-09-19T23:25:08Z | MEMBER | I have a design question here: how should we handle cases where a core dimension exists in multiple chunks? For example, suppose you are applying a function that needs access to every point along the "time" axis at once (e.g., an auto-correlation function). Should we: 1. Automatically rechunk along "time" into a single chunk, or 2. Raise an error, and require the user to rechunk manually (xref https://github.com/dask/dask/issues/2689 for API on this) Currently we do behavior 1, but behavior 2 might be more user friendly. Otherwise it could be pretty easy to inadvertently pass in a dask array (e.g., in small chunks along dask.array has some heuristics to protect against this in |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
330679808 | https://github.com/pydata/xarray/pull/1517#issuecomment-330679808 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzMDY3OTgwOA== | spencerkclark 6628425 | 2017-09-19T21:32:00Z | 2017-09-19T21:32:00Z | MEMBER | I was not aware of dask's atop function before reading this PR (it looks pretty cool), so I defer to @nbren12 there. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
330022743 | https://github.com/pydata/xarray/pull/1517#issuecomment-330022743 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzMDAyMjc0Mw== | shoyer 1217238 | 2017-09-17T05:39:38Z | 2017-09-17T05:39:38Z | MEMBER |
This seems like a reasonable option to me. Once we get this merged, want to make a PR? @jhamman could you give this a review? I have not included extensive documentation yet, but I am also reluctant to squeeze that into this PR before we make it public API. (Which I'd like to save for another one.) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
328341717 | https://github.com/pydata/xarray/pull/1517#issuecomment-328341717 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyODM0MTcxNw== | spencerkclark 6628425 | 2017-09-10T13:09:40Z | 2017-09-10T13:09:40Z | MEMBER | @nbren12 for similar use cases I've had success writing a single function that does the ghosting, applies a function with def centered_diff(da, dim, spacing=1.): def apply_centered_diff(arr, spacing=1.): if isinstance(arr, np.ndarray): return centered_diff_numpy(arr, spacing=spacing) else: axis = len(arr.shape) - 1 g = darray.ghost.ghost(arr, depth={axis: 1}, boundary={axis: 'periodic'}) result = darray.map_blocks(centered_diff_numpy, g, spacing=spacing) return darray.ghost.trim_internal(result, {axis: 1})
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
328299112 | https://github.com/pydata/xarray/pull/1517#issuecomment-328299112 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyODI5OTExMg== | shoyer 1217238 | 2017-09-09T19:38:09Z | 2017-09-09T19:38:09Z | MEMBER | @nbren12 Probably the best way to do ghosting with the current interface is to write a function that acts on dask array objects to apply the ghosting, and then apply it using |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
324734974 | https://github.com/pydata/xarray/pull/1517#issuecomment-324734974 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyNDczNDk3NA== | shoyer 1217238 | 2017-08-24T19:34:49Z | 2017-08-24T19:34:49Z | MEMBER | @mrocklin I split that discussion off to #1525. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
324732814 | https://github.com/pydata/xarray/pull/1517#issuecomment-324732814 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyNDczMjgxNA== | mrocklin 306380 | 2017-08-24T19:25:32Z | 2017-08-24T19:25:32Z | MEMBER | Yes if you don't care strongly about deduplication. The following will be slower:
In current operation this will be optimized to
So you'll lose that, but I suspect that in your case chunking the same dataset many times is somewhat rare. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
324732200 | https://github.com/pydata/xarray/pull/1517#issuecomment-324732200 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyNDczMjIwMA== | shoyer 1217238 | 2017-08-24T19:22:56Z | 2017-08-24T19:22:56Z | MEMBER | @mrocklin Yes, that took a few seconds (due to hashing the array contents). Would you suggest setting |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
324722153 | https://github.com/pydata/xarray/pull/1517#issuecomment-324722153 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyNDcyMjE1Mw== | mrocklin 306380 | 2017-08-24T18:43:30Z | 2017-08-24T18:43:30Z | MEMBER | I'm curious, how long does this line take:
Have you consider setting |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
324705244 | https://github.com/pydata/xarray/pull/1517#issuecomment-324705244 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyNDcwNTI0NA== | shoyer 1217238 | 2017-08-24T17:38:29Z | 2017-08-24T17:38:29Z | MEMBER |
Oops, fixed.
We already have some tips here: http://xarray.pydata.org/en/stable/dask.html#chunking-and-performance
Yes, this would be great.
I agree with both! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
324692881 | https://github.com/pydata/xarray/pull/1517#issuecomment-324692881 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyNDY5Mjg4MQ== | clarkfitzg 5356122 | 2017-08-24T16:50:45Z | 2017-08-24T16:50:45Z | MEMBER | Wow, this is great stuff! What's When this makes it into the public facing API it would be nice to include some guidance on how the chunking scheme affects the run time. Imagine a plot with run time plotted as a function of chunk size or number of chunks. Of course it also depends on the data size and the number of cores available. To say it in a different way, More ambitiously I could imagine an API such as |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 5