github: issue_comments: 22 rows where issue = 252358450 sorted by updated

22 rows where issue = 252358450 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
335316764	https://github.com/pydata/xarray/pull/1517#issuecomment-335316764	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMzNTMxNjc2NA==	shoyer 1217238	2017-10-09T23:28:52Z	2017-10-09T23:28:52Z	MEMBER	I'll start on my PR to expose this as public API -- hopefully will make some progress on my flight from NY to SF tonight.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
335316029	https://github.com/pydata/xarray/pull/1517#issuecomment-335316029	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMzNTMxNjAyOQ==	jhamman 2443309	2017-10-09T23:23:45Z	2017-10-09T23:23:45Z	MEMBER	Great. Go ahead and merge it then. I'm very excited about this feature.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
335315831	https://github.com/pydata/xarray/pull/1517#issuecomment-335315831	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMzNTMxNTgzMQ==	shoyer 1217238	2017-10-09T23:22:33Z	2017-10-09T23:22:33Z	MEMBER	I think this is ready.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
335315689	https://github.com/pydata/xarray/pull/1517#issuecomment-335315689	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMzNTMxNTY4OQ==	jhamman 2443309	2017-10-09T23:21:32Z	2017-10-09T23:21:32Z	MEMBER	@shoyer - anything left to do here?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
330709438	https://github.com/pydata/xarray/pull/1517#issuecomment-330709438	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMzMDcwOTQzOA==	jhamman 2443309	2017-09-20T00:18:32Z	2017-09-20T00:18:32Z	MEMBER	@shoyer - My vote is for something closer to #2. Your example scenario is something I run into frequently. In cases like this, I think its better to tell the user that they are not providing an appropriate input rather than attempting to rechunk a dataset. (This is somewhat related to #1440)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
330701921	https://github.com/pydata/xarray/pull/1517#issuecomment-330701921	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMzMDcwMTkyMQ==	mrocklin 306380	2017-09-19T23:27:49Z	2017-09-19T23:27:49Z	MEMBER	The heuristics we have are I think just of the form "did you make way more chunks than you had previously". I can imagine other heuristics of the form "some of your new chunks are several times larger than your previous chunks". In general these heuristics might be useful in several places. It might make sense to build them in a `dask/array/utils.py` file.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
330701517	https://github.com/pydata/xarray/pull/1517#issuecomment-330701517	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMzMDcwMTUxNw==	shoyer 1217238	2017-09-19T23:25:08Z	2017-09-19T23:25:08Z	MEMBER	I have a design question here: how should we handle cases where a core dimension exists in multiple chunks? For example, suppose you are applying a function that needs access to every point along the "time" axis at once (e.g., an auto-correlation function). Should we: 1. Automatically rechunk along "time" into a single chunk, or 2. Raise an error, and require the user to rechunk manually (xref https://github.com/dask/dask/issues/2689 for API on this) Currently we do behavior 1, but behavior 2 might be more user friendly. Otherwise it could be pretty easy to inadvertently pass in a dask array (e.g., in small chunks along `time`) that `apply_ufunc` would load into memory by putting in a single chunk. dask.array has some heuristics to protect against this in `rechunk()` but I'm not sure they are effective enough to catch this. (@mrocklin?)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
330679808	https://github.com/pydata/xarray/pull/1517#issuecomment-330679808	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMzMDY3OTgwOA==	spencerkclark 6628425	2017-09-19T21:32:00Z	2017-09-19T21:32:00Z	MEMBER	I was not aware of dask's atop function before reading this PR (it looks pretty cool), so I defer to @nbren12 there.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
330023238	https://github.com/pydata/xarray/pull/1517#issuecomment-330023238	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMzMDAyMzIzOA==	nbren12 1386642	2017-09-17T05:56:12Z	2017-09-17T05:56:12Z	CONTRIBUTOR	Sure. I'd be happy to make a PR once this gets merged. On Sat, Sep 16, 2017 at 10:39 PM Stephan Hoyer notifications@github.com wrote: Alternatively apply_ufunc could see if the func object has a pre_dask_atop method, and apply it if it does. This seems like a reasonable option to me. Once we get this merged, want to make a PR? @jhamman https://github.com/jhamman could you give this a review? I have not included extensive documentation yet, but I am also reluctant to squeeze that into this PR before we make it public API. (Which I'd like to save for another one.) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1517#issuecomment-330022743, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUoksuo9P3AJIzemncQQJZ3D5Ga2Opsks5sjLCdgaJpZM4PAViG .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
330022743	https://github.com/pydata/xarray/pull/1517#issuecomment-330022743	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMzMDAyMjc0Mw==	shoyer 1217238	2017-09-17T05:39:38Z	2017-09-17T05:39:38Z	MEMBER	Alternatively apply_ufunc could see if the func object has a pre_dask_atop method, and apply it if it does. This seems like a reasonable option to me. Once we get this merged, want to make a PR? @jhamman could you give this a review? I have not included extensive documentation yet, but I am also reluctant to squeeze that into this PR before we make it public API. (Which I'd like to save for another one.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
328358162	https://github.com/pydata/xarray/pull/1517#issuecomment-328358162	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMyODM1ODE2Mg==	nbren12 1386642	2017-09-10T17:30:19Z	2017-09-10T17:31:18Z	CONTRIBUTOR	I guess the key issue here is that some computations (E.g. finite differences) cannot be boiled down to passing one numpy function to `atop`. Instead, these calculations consist of three steps: 1) some dask calls, 2) a call to atop, map_blocks, etc, and 3) some more dask calls. In spencer's example, step 1 would be a the call to `da.ghost.ghost` and step 3 would be the call to `da.ghost.trim_internal`. While many dask functions have analogies in XArray, there are others that do not, such as `da.ghost.ghost`, so it won't always be possible to support this three step class of calculations by replacing the dask calls before and after atop with xarray calls. @shoyer Would it unreasonably complicated to add some sort of `pre_dask_atop` and `post_dask_atop` arguments to `apply_ufunc`. Alternatively `apply_ufunc` could see if the `func` object has a `pre_dask_atop` method, and apply it if it does.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
328356133	https://github.com/pydata/xarray/pull/1517#issuecomment-328356133	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMyODM1NjEzMw==	nbren12 1386642	2017-09-10T16:59:46Z	2017-09-10T16:59:46Z	CONTRIBUTOR	Hey Spencer! Thanks. That makes much more sense. I have written nearly identical code for centered differencing, but did not know about `apply_ufunc`, so I had a couple manual calls to `transpose` and `xr.DataArray`. The thing I was hoping for an official way to pass `centerd_diff_numpy` directly to `apply_ufunc`, which would avoid some of the boiler-plate in your `centered_diff` function, which basically rewrites much of the code in this PRs version `apply_ufunc`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
328341717	https://github.com/pydata/xarray/pull/1517#issuecomment-328341717	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMyODM0MTcxNw==	spencerkclark 6628425	2017-09-10T13:09:40Z	2017-09-10T13:09:40Z	MEMBER	@nbren12 for similar use cases I've had success writing a single function that does the ghosting, applies a function with `map_blocks`, and trims the edges. Then I apply that single function on a DataArray with `apply_ufunc` (so a single call to `apply_ufunc` rather than three). As an example, a simple centered difference on an array with periodic boundaries might be accomplished with: ```python def centered_diff_numpy(arr, axis=-1, spacing=1.): return (np.roll(arr, -1, axis=axis) - np.roll(arr, 1, axis=axis)) / (2. * spacing) def centered_diff(da, dim, spacing=1.): def apply_centered_diff(arr, spacing=1.): if isinstance(arr, np.ndarray): return centered_diff_numpy(arr, spacing=spacing) else: axis = len(arr.shape) - 1 g = darray.ghost.ghost(arr, depth={axis: 1}, boundary={axis: 'periodic'}) result = darray.map_blocks(centered_diff_numpy, g, spacing=spacing) return darray.ghost.trim_internal(result, {axis: 1}) `return computation.apply_ufunc( apply_centered_diff, da, input_core_dims=[[dim]], output_core_dims=[[dim]], dask_array='allowed', kwargs={'spacing': spacing})` Depending on your use case, you might also consider `dask.ghost.map_overlap` to do all of those three steps in one line, i.e. replace `apply_centered_diff` with the following:python def apply_centered_diff(arr, spacing=1.): if isinstance(arr, np.ndarray): return centered_diff_numpy(arr, spacing=spacing) else: axis = len(arr.shape) - 1 return darray.ghost.map_overlap( arr, centered_diff_numpy, depth={axis: 1}, boundary={axis: 'periodic'}, spacing=spacing) ``` (Not sure if this is what @shoyer had in mind, but just offering an example)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
328326916	https://github.com/pydata/xarray/pull/1517#issuecomment-328326916	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMyODMyNjkxNg==	nbren12 1386642	2017-09-10T08:10:32Z	2017-09-10T08:11:19Z	CONTRIBUTOR	Ok thanks. just so I understand you correctly, are you recommending something like this: `xarr_ghosted = apply_ufunc(partial(da.ghost.ghost, ...), xarr, dask='allowed',...) fd_ghosted = apply_ufunc(finite_difference_func, xarr_ghosted, dask='parallelized',...) fd = apply_ufunc(partial(da.ghost.trim_internal, ...), fd_ghosted, dask='allowed')` Wouldn't xarray complain because the ghosted axes data would have different size than the corresponding coordinates?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
328299112	https://github.com/pydata/xarray/pull/1517#issuecomment-328299112	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMyODI5OTExMg==	shoyer 1217238	2017-09-09T19:38:09Z	2017-09-09T19:38:09Z	MEMBER	@nbren12 Probably the best way to do ghosting with the current interface is to write a function that acts on dask array objects to apply the ghosting, and then apply it using `apply_ufunc`. I don't see an easy way to incorporate it into the current interface, which is already getting pretty complicated.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
328275147	https://github.com/pydata/xarray/pull/1517#issuecomment-328275147	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMyODI3NTE0Nw==	nbren12 1386642	2017-09-09T12:45:21Z	2017-09-09T12:45:21Z	CONTRIBUTOR	This looks great! I am not sure if this is the right place to bring this up, but is there any way to add ghost cell functionality to `apply_ufunc` or perhaps to create an `apply_ufunc_ghosted` function. This would be very handy for performing finite differences and filters. I have successfully used `dask.ghost.map_blocks`as well as `dask.array.ghost.ghost`+`dask.array.core.atop` for this, so I doubtt this would be too hard.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
324734974	https://github.com/pydata/xarray/pull/1517#issuecomment-324734974	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMyNDczNDk3NA==	shoyer 1217238	2017-08-24T19:34:49Z	2017-08-24T19:34:49Z	MEMBER	@mrocklin I split that discussion off to #1525.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
324732814	https://github.com/pydata/xarray/pull/1517#issuecomment-324732814	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMyNDczMjgxNA==	mrocklin 306380	2017-08-24T19:25:32Z	2017-08-24T19:25:32Z	MEMBER	Yes if you don't care strongly about deduplication. The following will be slower: `b = (a.chunk(...) + 1) + (a.chunk(...) + 1)` In current operation this will be optimized to `tmp = a.chunk(...) + 1 b = tmp + tmp` So you'll lose that, but I suspect that in your case chunking the same dataset many times is somewhat rare.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
324732200	https://github.com/pydata/xarray/pull/1517#issuecomment-324732200	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMyNDczMjIwMA==	shoyer 1217238	2017-08-24T19:22:56Z	2017-08-24T19:22:56Z	MEMBER	@mrocklin Yes, that took a few seconds (due to hashing the array contents). Would you suggest setting `name=False` by default for xarray's `chunk()` method?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
324722153	https://github.com/pydata/xarray/pull/1517#issuecomment-324722153	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMyNDcyMjE1Mw==	mrocklin 306380	2017-08-24T18:43:30Z	2017-08-24T18:43:30Z	MEMBER	I'm curious, how long does this line take: `r = spearman_correlation(array1.chunk({'place': 10}), array2.chunk({'place': 10}), 'time')` Have you consider setting `name=False` in your from_array call by default when doing this? I often avoid creating deterministic names when going back and forth rapidly between dask.array and numpy.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
324705244	https://github.com/pydata/xarray/pull/1517#issuecomment-324705244	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMyNDcwNTI0NA==	shoyer 1217238	2017-08-24T17:38:29Z	2017-08-24T17:38:29Z	MEMBER	What's `rs.randn()`? Oops, fixed. When this makes it into the public facing API it would be nice to include some guidance on how the chunking scheme affects the run time. We already have some tips here: http://xarray.pydata.org/en/stable/dask.html#chunking-and-performance More ambitiously I could imagine an API such as array1.chunk('place') or array1.chunk('auto') meaning to figure out a reasonable chunking scheme only once .compute() is called so that all the compute steps are known. Yes, this would be great. Maybe this is more specific to dask than xarray. I believe it would also be difficult. I agree with both!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450
324692881	https://github.com/pydata/xarray/pull/1517#issuecomment-324692881	https://api.github.com/repos/pydata/xarray/issues/1517	MDEyOklzc3VlQ29tbWVudDMyNDY5Mjg4MQ==	clarkfitzg 5356122	2017-08-24T16:50:45Z	2017-08-24T16:50:45Z	MEMBER	Wow, this is great stuff! What's `rs.randn()`? When this makes it into the public facing API it would be nice to include some guidance on how the chunking scheme affects the run time. Imagine a plot with run time plotted as a function of chunk size or number of chunks. Of course it also depends on the data size and the number of cores available. To say it in a different way, `array1.chunk({'place': 10})` is a performance tuning parameter, semantically no different than `array1`. More ambitiously I could imagine an API such as `array1.chunk('place')` or `array1.chunk('auto')` meaning to figure out a reasonable chunking scheme only once `.compute()` is called so that all the compute steps are known. Maybe this is more specific to dask than xarray. I believe it would also be difficult.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic parallelization for dask arrays in apply_ufunc 252358450

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);