html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/1517#issuecomment-335316764,https://api.github.com/repos/pydata/xarray/issues/1517,335316764,MDEyOklzc3VlQ29tbWVudDMzNTMxNjc2NA==,1217238,2017-10-09T23:28:52Z,2017-10-09T23:28:52Z,MEMBER,I'll start on my PR to expose this as public API -- hopefully will make some progress on my flight from NY to SF tonight.,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-335316029,https://api.github.com/repos/pydata/xarray/issues/1517,335316029,MDEyOklzc3VlQ29tbWVudDMzNTMxNjAyOQ==,2443309,2017-10-09T23:23:45Z,2017-10-09T23:23:45Z,MEMBER,Great. Go ahead and merge it then. I'm very excited about this feature. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-335315831,https://api.github.com/repos/pydata/xarray/issues/1517,335315831,MDEyOklzc3VlQ29tbWVudDMzNTMxNTgzMQ==,1217238,2017-10-09T23:22:33Z,2017-10-09T23:22:33Z,MEMBER,I think this is ready.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-335315689,https://api.github.com/repos/pydata/xarray/issues/1517,335315689,MDEyOklzc3VlQ29tbWVudDMzNTMxNTY4OQ==,2443309,2017-10-09T23:21:32Z,2017-10-09T23:21:32Z,MEMBER,@shoyer - anything left to do here?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-330709438,https://api.github.com/repos/pydata/xarray/issues/1517,330709438,MDEyOklzc3VlQ29tbWVudDMzMDcwOTQzOA==,2443309,2017-09-20T00:18:32Z,2017-09-20T00:18:32Z,MEMBER,"@shoyer - My vote is for something closer to #2.
Your example scenario is something I run into frequently. In cases like this, I think its better to tell the user that they are not providing an appropriate input rather than attempting to rechunk a dataset.
(This is somewhat related to #1440)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-330701921,https://api.github.com/repos/pydata/xarray/issues/1517,330701921,MDEyOklzc3VlQ29tbWVudDMzMDcwMTkyMQ==,306380,2017-09-19T23:27:49Z,2017-09-19T23:27:49Z,MEMBER,"The heuristics we have are I think just of the form ""did you make way more chunks than you had previously"". I can imagine other heuristics of the form ""some of your new chunks are several times larger than your previous chunks"". In general these heuristics might be useful in several places. It might make sense to build them in a `dask/array/utils.py` file.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-330701517,https://api.github.com/repos/pydata/xarray/issues/1517,330701517,MDEyOklzc3VlQ29tbWVudDMzMDcwMTUxNw==,1217238,2017-09-19T23:25:08Z,2017-09-19T23:25:08Z,MEMBER,"I have a design question here: how should we handle cases where a core dimension exists in multiple chunks? For example, suppose you are applying a function that needs access to every point along the ""time"" axis at once (e.g., an auto-correlation function).
Should we:
1. Automatically rechunk along ""time"" into a single chunk, or
2. Raise an error, and require the user to rechunk manually (xref https://github.com/dask/dask/issues/2689 for API on this)
Currently we do behavior 1, but behavior 2 might be more user friendly. Otherwise it could be pretty easy to inadvertently pass in a dask array (e.g., in small chunks along `time`) that `apply_ufunc` would load into memory by putting in a single chunk.
dask.array has some heuristics to protect against this in `rechunk()` but I'm not sure they are effective enough to catch this. (@mrocklin?)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-330679808,https://api.github.com/repos/pydata/xarray/issues/1517,330679808,MDEyOklzc3VlQ29tbWVudDMzMDY3OTgwOA==,6628425,2017-09-19T21:32:00Z,2017-09-19T21:32:00Z,MEMBER,"I was not aware of dask's atop function before reading this PR (it looks pretty cool), so I defer to @nbren12 there.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-330022743,https://api.github.com/repos/pydata/xarray/issues/1517,330022743,MDEyOklzc3VlQ29tbWVudDMzMDAyMjc0Mw==,1217238,2017-09-17T05:39:38Z,2017-09-17T05:39:38Z,MEMBER,"> Alternatively apply_ufunc could see if the func object has a pre_dask_atop method, and apply it if it does.
This seems like a reasonable option to me. Once we get this merged, want to make a PR?
@jhamman could you give this a review? I have not included extensive documentation yet, but I am also reluctant to squeeze that into this PR before we make it public API. (Which I'd like to save for another one.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-328341717,https://api.github.com/repos/pydata/xarray/issues/1517,328341717,MDEyOklzc3VlQ29tbWVudDMyODM0MTcxNw==,6628425,2017-09-10T13:09:40Z,2017-09-10T13:09:40Z,MEMBER,"@nbren12 for similar use cases I've had success writing a single function that does the ghosting, applies a function with `map_blocks`, and trims the edges. Then I apply that single function on a DataArray with `apply_ufunc` (so a single call to `apply_ufunc` rather than three). As an example, a simple centered difference on an array with periodic boundaries might be accomplished with:
```python
def centered_diff_numpy(arr, axis=-1, spacing=1.):
return (np.roll(arr, -1, axis=axis) - np.roll(arr, 1, axis=axis)) / (2. * spacing)
def centered_diff(da, dim, spacing=1.):
def apply_centered_diff(arr, spacing=1.):
if isinstance(arr, np.ndarray):
return centered_diff_numpy(arr, spacing=spacing)
else:
axis = len(arr.shape) - 1
g = darray.ghost.ghost(arr, depth={axis: 1}, boundary={axis: 'periodic'})
result = darray.map_blocks(centered_diff_numpy, g, spacing=spacing)
return darray.ghost.trim_internal(result, {axis: 1})
return computation.apply_ufunc(
apply_centered_diff, da, input_core_dims=[[dim]],
output_core_dims=[[dim]], dask_array='allowed', kwargs={'spacing': spacing})
```
Depending on your use case, you might also consider `dask.ghost.map_overlap` to do all of those three steps in one line, i.e. replace `apply_centered_diff` with the following:
```python
def apply_centered_diff(arr, spacing=1.):
if isinstance(arr, np.ndarray):
return centered_diff_numpy(arr, spacing=spacing)
else:
axis = len(arr.shape) - 1
return darray.ghost.map_overlap(
arr, centered_diff_numpy, depth={axis: 1}, boundary={axis: 'periodic'},
spacing=spacing)
```
(Not sure if this is what @shoyer had in mind, but just offering an example)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-328299112,https://api.github.com/repos/pydata/xarray/issues/1517,328299112,MDEyOklzc3VlQ29tbWVudDMyODI5OTExMg==,1217238,2017-09-09T19:38:09Z,2017-09-09T19:38:09Z,MEMBER,"@nbren12 Probably the best way to do ghosting with the current interface is to write a function that acts on dask array objects to apply the ghosting, and then apply it using `apply_ufunc`. I don't see an easy way to incorporate it into the current interface, which is already getting pretty complicated.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-324734974,https://api.github.com/repos/pydata/xarray/issues/1517,324734974,MDEyOklzc3VlQ29tbWVudDMyNDczNDk3NA==,1217238,2017-08-24T19:34:49Z,2017-08-24T19:34:49Z,MEMBER,@mrocklin I split that discussion off to #1525.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-324732814,https://api.github.com/repos/pydata/xarray/issues/1517,324732814,MDEyOklzc3VlQ29tbWVudDMyNDczMjgxNA==,306380,2017-08-24T19:25:32Z,2017-08-24T19:25:32Z,MEMBER,"Yes if you don't care strongly about deduplication. The following will be slower:
b = (a.chunk(...) + 1) + (a.chunk(...) + 1)
In current operation this will be optimized to
tmp = a.chunk(...) + 1
b = tmp + tmp
So you'll lose that, but I suspect that in your case chunking the same dataset many times is somewhat rare.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-324732200,https://api.github.com/repos/pydata/xarray/issues/1517,324732200,MDEyOklzc3VlQ29tbWVudDMyNDczMjIwMA==,1217238,2017-08-24T19:22:56Z,2017-08-24T19:22:56Z,MEMBER,"@mrocklin Yes, that took a few seconds (due to hashing the array contents). Would you suggest setting `name=False` by default for xarray's `chunk()` method?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-324722153,https://api.github.com/repos/pydata/xarray/issues/1517,324722153,MDEyOklzc3VlQ29tbWVudDMyNDcyMjE1Mw==,306380,2017-08-24T18:43:30Z,2017-08-24T18:43:30Z,MEMBER,"I'm curious, how long does this line take:
r = spearman_correlation(array1.chunk({'place': 10}), array2.chunk({'place': 10}), 'time')
Have you consider setting `name=False` in your from_array call by default when doing this? I often avoid creating deterministic names when going back and forth rapidly between dask.array and numpy.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-324705244,https://api.github.com/repos/pydata/xarray/issues/1517,324705244,MDEyOklzc3VlQ29tbWVudDMyNDcwNTI0NA==,1217238,2017-08-24T17:38:29Z,2017-08-24T17:38:29Z,MEMBER,"> What's `rs.randn()`?
Oops, fixed.
> When this makes it into the public facing API it would be nice to include some guidance on how the chunking scheme affects the run time.
We already have some tips here:
http://xarray.pydata.org/en/stable/dask.html#chunking-and-performance
> More ambitiously I could imagine an API such as array1.chunk('place') or array1.chunk('auto') meaning to figure out a reasonable chunking scheme only once .compute() is called so that all the compute steps are known.
Yes, this would be great.
> Maybe this is more specific to dask than xarray. I believe it would also be difficult.
I agree with both!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450
https://github.com/pydata/xarray/pull/1517#issuecomment-324692881,https://api.github.com/repos/pydata/xarray/issues/1517,324692881,MDEyOklzc3VlQ29tbWVudDMyNDY5Mjg4MQ==,5356122,2017-08-24T16:50:45Z,2017-08-24T16:50:45Z,MEMBER,"Wow, this is great stuff!
What's `rs.randn()`?
When this makes it into the public facing API it would be nice to include some guidance on how the chunking scheme affects the run time. Imagine a plot with run time plotted as a function of chunk size or number of chunks. Of course it also depends on the data size and the number of cores available.
To say it in a different way, `array1.chunk({'place': 10})` is a performance tuning parameter, semantically no different than `array1`.
More ambitiously I could imagine an API such as `array1.chunk('place')` or `array1.chunk('auto')` meaning to figure out a reasonable chunking scheme only once `.compute()` is called so that all the compute steps are known. Maybe this is more specific to dask than xarray. I believe it would also be difficult.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252358450