issue_comments
22 rows where issue = 252358450 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Automatic parallelization for dask arrays in apply_ufunc · 22 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
335316764 | https://github.com/pydata/xarray/pull/1517#issuecomment-335316764 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzNTMxNjc2NA== | shoyer 1217238 | 2017-10-09T23:28:52Z | 2017-10-09T23:28:52Z | MEMBER | I'll start on my PR to expose this as public API -- hopefully will make some progress on my flight from NY to SF tonight. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
335316029 | https://github.com/pydata/xarray/pull/1517#issuecomment-335316029 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzNTMxNjAyOQ== | jhamman 2443309 | 2017-10-09T23:23:45Z | 2017-10-09T23:23:45Z | MEMBER | Great. Go ahead and merge it then. I'm very excited about this feature. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
335315831 | https://github.com/pydata/xarray/pull/1517#issuecomment-335315831 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzNTMxNTgzMQ== | shoyer 1217238 | 2017-10-09T23:22:33Z | 2017-10-09T23:22:33Z | MEMBER | I think this is ready. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
335315689 | https://github.com/pydata/xarray/pull/1517#issuecomment-335315689 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzNTMxNTY4OQ== | jhamman 2443309 | 2017-10-09T23:21:32Z | 2017-10-09T23:21:32Z | MEMBER | @shoyer - anything left to do here? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
330709438 | https://github.com/pydata/xarray/pull/1517#issuecomment-330709438 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzMDcwOTQzOA== | jhamman 2443309 | 2017-09-20T00:18:32Z | 2017-09-20T00:18:32Z | MEMBER | @shoyer - My vote is for something closer to #2. Your example scenario is something I run into frequently. In cases like this, I think its better to tell the user that they are not providing an appropriate input rather than attempting to rechunk a dataset. (This is somewhat related to #1440) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
330701921 | https://github.com/pydata/xarray/pull/1517#issuecomment-330701921 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzMDcwMTkyMQ== | mrocklin 306380 | 2017-09-19T23:27:49Z | 2017-09-19T23:27:49Z | MEMBER | The heuristics we have are I think just of the form "did you make way more chunks than you had previously". I can imagine other heuristics of the form "some of your new chunks are several times larger than your previous chunks". In general these heuristics might be useful in several places. It might make sense to build them in a |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
330701517 | https://github.com/pydata/xarray/pull/1517#issuecomment-330701517 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzMDcwMTUxNw== | shoyer 1217238 | 2017-09-19T23:25:08Z | 2017-09-19T23:25:08Z | MEMBER | I have a design question here: how should we handle cases where a core dimension exists in multiple chunks? For example, suppose you are applying a function that needs access to every point along the "time" axis at once (e.g., an auto-correlation function). Should we: 1. Automatically rechunk along "time" into a single chunk, or 2. Raise an error, and require the user to rechunk manually (xref https://github.com/dask/dask/issues/2689 for API on this) Currently we do behavior 1, but behavior 2 might be more user friendly. Otherwise it could be pretty easy to inadvertently pass in a dask array (e.g., in small chunks along dask.array has some heuristics to protect against this in |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
330679808 | https://github.com/pydata/xarray/pull/1517#issuecomment-330679808 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzMDY3OTgwOA== | spencerkclark 6628425 | 2017-09-19T21:32:00Z | 2017-09-19T21:32:00Z | MEMBER | I was not aware of dask's atop function before reading this PR (it looks pretty cool), so I defer to @nbren12 there. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
330023238 | https://github.com/pydata/xarray/pull/1517#issuecomment-330023238 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzMDAyMzIzOA== | nbren12 1386642 | 2017-09-17T05:56:12Z | 2017-09-17T05:56:12Z | CONTRIBUTOR | Sure. I'd be happy to make a PR once this gets merged. On Sat, Sep 16, 2017 at 10:39 PM Stephan Hoyer notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
330022743 | https://github.com/pydata/xarray/pull/1517#issuecomment-330022743 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMzMDAyMjc0Mw== | shoyer 1217238 | 2017-09-17T05:39:38Z | 2017-09-17T05:39:38Z | MEMBER |
This seems like a reasonable option to me. Once we get this merged, want to make a PR? @jhamman could you give this a review? I have not included extensive documentation yet, but I am also reluctant to squeeze that into this PR before we make it public API. (Which I'd like to save for another one.) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
328358162 | https://github.com/pydata/xarray/pull/1517#issuecomment-328358162 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyODM1ODE2Mg== | nbren12 1386642 | 2017-09-10T17:30:19Z | 2017-09-10T17:31:18Z | CONTRIBUTOR | I guess the key issue here is that some computations (E.g. finite differences) cannot be boiled down to passing one numpy function to @shoyer Would it unreasonably complicated to add some sort of |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
328356133 | https://github.com/pydata/xarray/pull/1517#issuecomment-328356133 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyODM1NjEzMw== | nbren12 1386642 | 2017-09-10T16:59:46Z | 2017-09-10T16:59:46Z | CONTRIBUTOR | Hey Spencer! Thanks. That makes much more sense. I have written nearly identical code for centered differencing, but did not know about |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
328341717 | https://github.com/pydata/xarray/pull/1517#issuecomment-328341717 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyODM0MTcxNw== | spencerkclark 6628425 | 2017-09-10T13:09:40Z | 2017-09-10T13:09:40Z | MEMBER | @nbren12 for similar use cases I've had success writing a single function that does the ghosting, applies a function with def centered_diff(da, dim, spacing=1.): def apply_centered_diff(arr, spacing=1.): if isinstance(arr, np.ndarray): return centered_diff_numpy(arr, spacing=spacing) else: axis = len(arr.shape) - 1 g = darray.ghost.ghost(arr, depth={axis: 1}, boundary={axis: 'periodic'}) result = darray.map_blocks(centered_diff_numpy, g, spacing=spacing) return darray.ghost.trim_internal(result, {axis: 1})
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
328326916 | https://github.com/pydata/xarray/pull/1517#issuecomment-328326916 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyODMyNjkxNg== | nbren12 1386642 | 2017-09-10T08:10:32Z | 2017-09-10T08:11:19Z | CONTRIBUTOR | Ok thanks. just so I understand you correctly, are you recommending something like this:
Wouldn't xarray complain because the ghosted axes data would have different size than the corresponding coordinates? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
328299112 | https://github.com/pydata/xarray/pull/1517#issuecomment-328299112 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyODI5OTExMg== | shoyer 1217238 | 2017-09-09T19:38:09Z | 2017-09-09T19:38:09Z | MEMBER | @nbren12 Probably the best way to do ghosting with the current interface is to write a function that acts on dask array objects to apply the ghosting, and then apply it using |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
328275147 | https://github.com/pydata/xarray/pull/1517#issuecomment-328275147 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyODI3NTE0Nw== | nbren12 1386642 | 2017-09-09T12:45:21Z | 2017-09-09T12:45:21Z | CONTRIBUTOR | This looks great! I am not sure if this is the right place to bring this up, but is there any way to add ghost cell functionality to |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
324734974 | https://github.com/pydata/xarray/pull/1517#issuecomment-324734974 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyNDczNDk3NA== | shoyer 1217238 | 2017-08-24T19:34:49Z | 2017-08-24T19:34:49Z | MEMBER | @mrocklin I split that discussion off to #1525. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
324732814 | https://github.com/pydata/xarray/pull/1517#issuecomment-324732814 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyNDczMjgxNA== | mrocklin 306380 | 2017-08-24T19:25:32Z | 2017-08-24T19:25:32Z | MEMBER | Yes if you don't care strongly about deduplication. The following will be slower:
In current operation this will be optimized to
So you'll lose that, but I suspect that in your case chunking the same dataset many times is somewhat rare. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
324732200 | https://github.com/pydata/xarray/pull/1517#issuecomment-324732200 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyNDczMjIwMA== | shoyer 1217238 | 2017-08-24T19:22:56Z | 2017-08-24T19:22:56Z | MEMBER | @mrocklin Yes, that took a few seconds (due to hashing the array contents). Would you suggest setting |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
324722153 | https://github.com/pydata/xarray/pull/1517#issuecomment-324722153 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyNDcyMjE1Mw== | mrocklin 306380 | 2017-08-24T18:43:30Z | 2017-08-24T18:43:30Z | MEMBER | I'm curious, how long does this line take:
Have you consider setting |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
324705244 | https://github.com/pydata/xarray/pull/1517#issuecomment-324705244 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyNDcwNTI0NA== | shoyer 1217238 | 2017-08-24T17:38:29Z | 2017-08-24T17:38:29Z | MEMBER |
Oops, fixed.
We already have some tips here: http://xarray.pydata.org/en/stable/dask.html#chunking-and-performance
Yes, this would be great.
I agree with both! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 | |
324692881 | https://github.com/pydata/xarray/pull/1517#issuecomment-324692881 | https://api.github.com/repos/pydata/xarray/issues/1517 | MDEyOklzc3VlQ29tbWVudDMyNDY5Mjg4MQ== | clarkfitzg 5356122 | 2017-08-24T16:50:45Z | 2017-08-24T16:50:45Z | MEMBER | Wow, this is great stuff! What's When this makes it into the public facing API it would be nice to include some guidance on how the chunking scheme affects the run time. Imagine a plot with run time plotted as a function of chunk size or number of chunks. Of course it also depends on the data size and the number of cores available. To say it in a different way, More ambitiously I could imagine an API such as |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic parallelization for dask arrays in apply_ufunc 252358450 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 6