home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

33 rows where issue = 170779798 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 7

  • shoyer 14
  • max-sixty 9
  • pwolfram 3
  • crusaderky 3
  • chris-b1 2
  • rabernat 1
  • jhamman 1

author_association 2

  • MEMBER 30
  • CONTRIBUTOR 3

issue 1

  • New function for applying vectorized functions for unlabeled arrays to xarray objects · 33 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
277028203 https://github.com/pydata/xarray/pull/964#issuecomment-277028203 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI3NzAyODIwMw== shoyer 1217238 2017-02-02T17:41:48Z 2017-02-02T17:41:48Z MEMBER

1245 replaces the unintuitive signature argument with separate input_core_dims and output_core_dims.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
270863277 https://github.com/pydata/xarray/pull/964#issuecomment-270863277 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI3MDg2MzI3Nw== max-sixty 5635139 2017-01-06T09:20:08Z 2017-01-06T09:20:08Z MEMBER

FWIW the bn.push example still has some unanswered questions - would be interested to know if there's an easier way of doing that. Particularly if it's just a 'dim for axis' swap

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
270863083 https://github.com/pydata/xarray/pull/964#issuecomment-270863083 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI3MDg2MzA4Mw== max-sixty 5635139 2017-01-06T09:18:47Z 2017-01-06T09:18:47Z MEMBER

Congrats!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
270799379 https://github.com/pydata/xarray/pull/964#issuecomment-270799379 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI3MDc5OTM3OQ== shoyer 1217238 2017-01-06T00:36:21Z 2017-01-06T00:36:21Z MEMBER

OK, in it goes. Once again, there's no public API exposed yet.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
270429069 https://github.com/pydata/xarray/pull/964#issuecomment-270429069 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI3MDQyOTA2OQ== shoyer 1217238 2017-01-04T17:19:10Z 2017-01-04T17:20:02Z MEMBER

I removed the public facing API and renamed the (now private) apply function back to apply_ufunc. I also removed the new_coords argument, in favor of encouraging using .coords or .assign_coords.

As discussed above, the current API with signature is difficult to use, but this is probably fine for an internal function and we can revisit the public facing API later. Any objections to merging this?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
269878903 https://github.com/pydata/xarray/pull/964#issuecomment-269878903 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI2OTg3ODkwMw== shoyer 1217238 2016-12-31T19:29:37Z 2016-12-31T19:30:16Z MEMBER

@crusaderky

any plans to add dask support as suggested above?

Yes, in fact I have a branch with some basic support for this that I was working on a few months ago. I haven't written tests yet but I can potentially push that WIP to another PR after merging this.

There are a couple of recent feature additions to dask.array.atop (https://github.com/dask/dask/pull/1612 and https://github.com/dask/dask/pull/1716) that should make this easier and more powerful. I have not built anything on top of these yet, so my prior work is somewhat outdated.

@jhamman

do we want to get this into 0.9 as a private api function and aim to complete it for the public api by 0.10 or so?

Yes, this seems like a good goal. I'll take another look over this next week when I have the chance, to remove any work-in-progress bits that have snuck in and remove the public facing API.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
269859010 https://github.com/pydata/xarray/pull/964#issuecomment-269859010 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI2OTg1OTAxMA== crusaderky 6213168 2016-12-31T10:23:57Z 2016-12-31T10:23:57Z MEMBER

@shoyer - any plans to add dask support as suggested above?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
269799097 https://github.com/pydata/xarray/pull/964#issuecomment-269799097 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI2OTc5OTA5Nw== jhamman 2443309 2016-12-30T17:34:18Z 2016-12-30T17:34:18Z MEMBER

@shoyer - do we want to get this into 0.9 as a private api function and aim to complete it for the public api by 0.10 or so?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
268355105 https://github.com/pydata/xarray/pull/964#issuecomment-268355105 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI2ODM1NTEwNQ== max-sixty 5635139 2016-12-20T20:46:55Z 2016-12-20T20:46:55Z MEMBER

Gave this a quick spin for filling. A few questions:

  • Is there an easy of way of merely translating dims into axes? Maybe that already exists?
  • Is there as easy way to keep a dimension? Or should it be in the signature and a new_dim?

```python da=xr.DataArray(np.random.rand(10,3), dims=('x','y')) da = da.where(da>0.5) In [43]: da Out[43]: <xarray.DataArray (x: 10, y: 3)> array([[ nan, 0.57243305, 0.84363016], [ nan, 0.90788156, nan], [ nan, 0.50739189, 0.93701278], [ nan, nan, 0.86804167], [ nan, 0.50883914, nan], [ nan, nan, nan], [ nan, 0.91547763, nan], [ 0.72920182, nan, 0.6982745 ], [ 0.73033449, 0.950719 , 0.73077113], [ nan, nan, 0.72463932]])

In [44]: xr.apply(bn.push, da) . # already better than bn.push(da)! Out[44]: <xarray.DataArray (x: 10, y: 3)> array([[ nan, 0.57243305, 0.84363016], [ nan, 0.90788156, 0.90788156], [ nan, 0.50739189, 0.93701278], [ nan, nan, 0.86804167], [ nan, 0.50883914, 0.50883914], [ nan, nan, nan], [ nan, 0.91547763, 0.91547763], [ 0.72920182, 0.72920182, 0.6982745 ], [ 0.73033449, 0.950719 , 0.73077113], [ nan, nan, 0.72463932]])

but changing the axis is verbose and transposes the array - are there existing tools for this?

In [48]: xr.apply(bn.push, da, signature='(x)->(x)', new_coords=[dict(x=da.x)]) Out[48]: <xarray.DataArray (y: 3, x: 10)> array([[ nan, nan, nan, nan, nan, nan, nan, 0.72920182, 0.73033449, 0.73033449], [ 0.57243305, 0.90788156, 0.50739189, 0.50739189, 0.50883914, 0.50883914, 0.91547763, 0.91547763, 0.950719 , 0.950719 ], [ 0.84363016, 0.84363016, 0.93701278, 0.86804167, 0.86804167, 0.86804167, 0.86804167, 0.6982745 , 0.73077113, 0.72463932]]) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 o y (y) - ```

  • The triple nested signature is pretty tough to write! Two kwargs?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
264020777 https://github.com/pydata/xarray/pull/964#issuecomment-264020777 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI2NDAyMDc3Nw== shoyer 1217238 2016-11-30T22:45:07Z 2016-11-30T22:45:07Z MEMBER

Surprisingly, I can't actually find something like this out there. The pandas code is good but highly 1-2 dimension specific.

Let me know if I'm missing (pun intended - long day) something. Is there a library of these sorts of functions over n-dims somewhere else (even R / Julia)? Or are we really the first people in the world to be doing this?

Usually I check numpy and bottleneck. It actually looks like bottleneck.push is what you're looking for. I think this is a recent addition to bottleneck, though.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
263988895 https://github.com/pydata/xarray/pull/964#issuecomment-263988895 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI2Mzk4ODg5NQ== max-sixty 5635139 2016-11-30T20:41:14Z 2016-11-30T20:41:14Z MEMBER

Either way, the first step is probably to write a function backfill(values, axis) that acts on NumPy arrays.

Right. Surprisingly, I can't actually find something like this out there. The pandas code is good but highly 1-2 dimension specific.

Let me know if I'm missing (pun intended - long day) something. Is there a library of these sorts of functions over n-dims somewhere else (even R / Julia)? Or are we really the first people in the world to be doing this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
263944098 https://github.com/pydata/xarray/pull/964#issuecomment-263944098 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI2Mzk0NDA5OA== shoyer 1217238 2016-11-30T17:51:33Z 2016-11-30T17:51:33Z MEMBER

I'm thinking through how difficult it would be to add back-fill method to DataArray (that could be an argument to fillna or a bfill method - that's a separate discussion).

Would this PR help? I'm trying to wrap my head around the options. Thanks

Yes, quite likely. In the current state, it would depend on if you want to back-fill all variables or just data variables (only the later is currently supported).

Either way, the first step is probably to write a function backfill(values, axis) that acts on NumPy arrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
263942583 https://github.com/pydata/xarray/pull/964#issuecomment-263942583 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI2Mzk0MjU4Mw== max-sixty 5635139 2016-11-30T17:45:43Z 2016-11-30T17:45:43Z MEMBER

I'm thinking through how difficult it would be to add back-fill method to DataArray (that could be an argument to fillna or a bfill method - that's a separate discussion).

Would this PR help? I'm trying to wrap my head around the options. Thanks

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
256814663 https://github.com/pydata/xarray/pull/964#issuecomment-256814663 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI1NjgxNDY2Mw== shoyer 1217238 2016-10-28T01:31:02Z 2016-10-28T01:31:02Z MEMBER

I'm thinking about making a few tweaks and merging this, but not exposing it to users yet as part of public API. The public API is not quite there yet, but even as it I think it would be a useful building point for internal functionality (e.g., for #1065), and then other people could start to build on this as well.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
250908588 https://github.com/pydata/xarray/pull/964#issuecomment-250908588 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI1MDkwODU4OA== crusaderky 6213168 2016-10-01T11:57:38Z 2016-10-01T11:57:38Z MEMBER

I worked around the limitation. It would be nice if apply() did the below automatically!

``` from itertools import chain from functools import wraps import dask.array

def dask_kernel(func): """Invoke dask.array.map_blocks(func, args, kwds) if at least one of the arguments is a dask array; else invoke func(args, kwds) """ @wraps(func) def wrapper(*args, kwds): if any(isinstance(a, dask.array.Array) for a in chain(args, kwds.values())): return dask.array.map_blocks(func, args, kwds) else: return func(args, **kwds) return wrapper ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
250907376 https://github.com/pydata/xarray/pull/964#issuecomment-250907376 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI1MDkwNzM3Ng== crusaderky 6213168 2016-10-01T11:27:37Z 2016-10-01T11:49:42Z MEMBER

Any hope to get dask support? Even with the limitation of having 1:1 matching between input and output chunks, it would already be tremendously useful

In other words, it should be easy to automatically call dask.array.map_blocks

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
249118534 https://github.com/pydata/xarray/pull/964#issuecomment-249118534 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0OTExODUzNA== shoyer 1217238 2016-09-23T07:07:09Z 2016-09-23T07:07:09Z MEMBER

One of the tricky things with apply is that there are a lot of similar but distinct use cases to disambiguate. I'll outline a few of these below.

I'd appreciate feedback on which cases are most essential and which can wait until later (this PR is already getting pretty big).

Also, I'd appreciate ideas for how to make the API more easily understood. We will have extensive docs either way, but xarray.apply is probably already in the realm of "too many arguments for one function". The last thing I want to do is to make a swiss army knife so flexible (like (numpy.nditer)[http://docs.scipy.org/doc/numpy/reference/generated/numpy.nditer.html]) that nobody uses it because they don't understand how it works.

How func vectorizes

There are two main cases here: 1. Functions already written to vectorize their arguments: 1. Scalar functions built out of NumPy primitives (e.g., a + b + c). These work by default. 2. Functions that use core dimension referred to by axis (e.g., np.mean). These work if you set axis=-1 and put the dimension in the signature, but the API is kind of awkward. You're rather that the wrapper just converts argument like dim='time' automatically into axis=2. Transposing these core dimensions to the end also feels unnecessary, though maybe not a serious concern given that transposing NumPy arrays involves no memory copies. 3. Functions that work mostly like gufuncs, but aren't actually (e.g., np.svd). This is pretty common, because NumPy ufuncs have some serious limitations (e.g.., they can't handle non-vectorized arguments). These work about as well as we could hope, modulo possible improvements to the signature spec. 4. True gufuncs, most likely written with numba.guvectorize. For these functions, we'd like a way to extract/use the signature automatically. 2. Functions for which you only have the inner loop (e.g., np.polyfit or scipy.stats.pearsonr). Running these is going to entail large Python overhead, but often that's acceptable.

One option for these is to wrap them into something that broadcasts like a gufunc, e.g., via a new function numpy.guvectorize (https://github.com/numpy/numpy/pull/8054). But as a user, this is a lot of wrappers to write. You'd rather just add something like vectorize=True and let xarray handle all the automatic broadcasting, e.g.,

python def poly_fit(x, y, dim='time', deg=1): return xr.apply(np.polyfit, x, y, signature=([(dim,), (dim,)], [('poly_order',)]), new_coords={'poly_order': range(deg + 1)}, kwargs={'deg': deg}, vectorize=True)

Whether func applies to "data only" or "everything"

Most "computation" functions/methods in xarray (e.g., arithmetic and reduce methods) follow the rule of merging coordinates, and only applying the core function to data variables. Coordinates that are no longer valid with new dimensions are dropped. This is currently what we do in apply.

On the other hand, there are also function/methods that we might refer to as "organizing" (e.g., indexing methods, concat, stack/unstack, transpose), which generally apply to every variable, including coordinates. It seems like there are definitely use cases for applying these sorts of functions, too, e.g., to wrap Cartopy's add_cyclic_point utility (#1005). So, I think we might need another option to toggle what happens to coordinates (e.g., variables='data' vs variables='all').

How to handle mismatched core dimensions

Xarray methods often have fallbacks to handle data with different dimensions. For example, if you write ds.mean(['x', 'y']), it matches on core dimensions to apply four different possible functions to each data variables: - mean over ('x', 'y'): for variables with both dimensions - mean over 'x': for variables with only 'x' - mean over 'y': for variables with only 'y' - identity: for variables with neither 'x' nor 'y'

Indexing is another example -- it applies to both data and coordinates, but only to matching dimensions for each variable. If you don't have the dimensions, we ignore the variable.

Writing something like mean with a single call to apply would entail the need for something like a dispatching system to pick which function to use, e.g., instead of a singular func/signature pair you pass a dispatcher function that chooses func/signature based on the core dimensions of passed variable. This feels like serious over engineering.

Instead, we might support a few pre-canned options for how to deal with mismatched dimensions. For example: - missing_core_dims='drop': silently drop these variables in the output(s) - missing_core_dims='error': raise an error. This is the current default behavior, which is probably only be useful with variables='data' -- otherwise some coordinate variables would always error. -missing_core_dims='keep': keep these variables unchanged in the output(s) (usemerge_variablesto check for conflicts) -missing_core_dims='broadcast'`: broadcast all inputs to have the necessary core dimensions if they don't have them already

Another option would be to consolidate this with the variables option to allow only two modes of operation: - variables='data': For "computation" functions. Apply only to data variables, and error if any data variables are missing a core dimension. Merge coordinates on the output(s), silently dropping conflicts for variables that don't label a dimension. - variables='matching': For "organizing" functions. Apply to every variable with matching core dimensions. Merge everything else on the outputs(s), silently dropping conflicts for variables that don't label a dimension.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
248957341 https://github.com/pydata/xarray/pull/964#issuecomment-248957341 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0ODk1NzM0MQ== chris-b1 1924092 2016-09-22T16:34:44Z 2016-09-22T16:34:44Z MEMBER

@shoyer - I agree on 3) that it might too much to pack in to xr.apply. As one possibility, here's a half-implemented (probably buggy!) wrapper that would allow this:

``` python @xarray_gufunc @numba.guvectorize(['void(f8[:], f8[:])'], '(n)->()') def std_gufunc(arr, out): out[0] = np.std(arr)

std_gufunc(arr, dims=('x',)) ```

https://gist.github.com/chris-b1/d28c6b8e78bf65ef7eb97e1095bc87f2

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
248938413 https://github.com/pydata/xarray/pull/964#issuecomment-248938413 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0ODkzODQxMw== rabernat 1197350 2016-09-22T15:30:35Z 2016-09-22T15:30:35Z MEMBER

Of course I think this is a fantastic feature which will change the way use use xarray.

I gave it a test run for a problem we come across a lot on the mailing list: estimating a linear trend along one dimension of a dataarray. A short example notebook is here: https://gist.github.com/rabernat/a0ec6a7e947f2d928615a30f5cb91ee9

Overall it worked as I hoped, but there were a few bumps I had to overcome. My feedback is from a user perspective, regarding the api and documentation - In the documentation, I would not assume that the user is familiar with Numpy generalized universal functions. A more explicit explanation of the syntax and meaning of the signature in the docstring would be very helpful. It took me lots of trial an error to find the signature that worked. - The function I wanted to apply, np.polyfit, works on the first axis of the array, not the last. This required an extra swap-axis step inside a wrapper function. - This would not work if the data ndim were > 2, because np.polyfit expects a 2D array. So an additional stacking step would also be required.

Perhaps I am not using this as designed, but this was the most obvious example application I could think of.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
248798053 https://github.com/pydata/xarray/pull/964#issuecomment-248798053 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0ODc5ODA1Mw== shoyer 1217238 2016-09-22T02:57:29Z 2016-09-22T02:57:29Z MEMBER

@chris-b1 thanks for giving this is a try! Using this with numba's guvectorize is exactly what I had in mind.

1) Yes, we need a better error here.

2) Agreed, it's really hard to parse a triply nested list. I don't like encouraging writing signature strings though because that artificially restricts dimensions to use strings (and it's weird to program in strings).

Maybe separate arguments for input_core_dims and output_core_dims would make more sense, e.g., xr.apply(std_gufunc, arr, input_core_dims=[('x',)])?

3) Yes, agreed. The main issue with xr.apply(std_gufunc, arr, dims=('x',)) is that it's not clear what that means if std_gufunc is not a gufunc. For example, one might expect it to generate axis arguments.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
248772865 https://github.com/pydata/xarray/pull/964#issuecomment-248772865 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0ODc3Mjg2NQ== chris-b1 1924092 2016-09-21T23:28:55Z 2016-09-21T23:28:55Z MEMBER

A few pieces of feedback trying this out. I'm basically learning xarray as I go (I ran into this right away), so weight appropriately.

Usecase - I have a numba gufunc I want to apply to DataArray, e.g. a reduction like this (std is just for sake of example)

``` @numba.guvectorize(['void(f8[:], f8[:])'], '(n)->()') def std_gufunc(arr, out): out[0] = np.std(arr)

arr = xr.DataArray(np.random.randn(100, 100, 100), dims=('x', 'y', 'z')) ```

1) The "obvious" thing doesn't work - maybe catch and show a nicer error message here

xr.apply(std_gufunc, arr) ValueError: dimensions ('x', 'y', 'z') must have the same length as the number of data dimensions, ndim=2

2) I personally found the non-string version of signature really difficult to wrap my mind around (the below took several tries to get right). I don't have a concrete suggestion though, maybe only the string form is meant to really be the public api?

xr.apply(std_gufunc, arr, signature=([('x',)], [()])) xr.apply(std_gufunc, arr, signature='(x)->()')

3) It would be nice to take advantage of the existing gufunc signature in some way. Maybe this is a wrapper built on top or xr.apply, or expand the api to allow something like this;

xr.apply(std_gufunc, arr, dims=('x',)) xr.apply(std_gufunc, arr, dims={'n':'x'})

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
248403503 https://github.com/pydata/xarray/pull/964#issuecomment-248403503 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0ODQwMzUwMw== pwolfram 4295853 2016-09-20T19:16:57Z 2016-09-20T19:16:57Z CONTRIBUTOR

@shoyer, I decided to go with the more robust approach and do the PR properly: #791.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
248349007 https://github.com/pydata/xarray/pull/964#issuecomment-248349007 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0ODM0OTAwNw== shoyer 1217238 2016-09-20T16:06:40Z 2016-09-20T16:06:40Z MEMBER

And yes, py.test should work on master (not 100% sure for stable, which is only used for pushing doc updates). That's what we test on Travis-CI: https://travis-ci.org/pydata/xarray/builds. I'm happy to help debug, but let's open another issue for that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
248348585 https://github.com/pydata/xarray/pull/964#issuecomment-248348585 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0ODM0ODU4NQ== shoyer 1217238 2016-09-20T16:05:13Z 2016-09-20T16:05:13Z MEMBER

@pwolfram If you don't need dask support, then I would suggest simply trying this PR. I think a slight variation of my mean example at the top would work.

If you do need dask support, either just use da.nancumsum directly with this PR (along with dask_array='allowed') or we can use core.ops._dask_or_eager_func (e.g., as in your PR) to make a version of nancumsum that works on both dask and numpy arrays, and then use that version along with this PR.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
248322967 https://github.com/pydata/xarray/pull/964#issuecomment-248322967 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0ODMyMjk2Nw== pwolfram 4295853 2016-09-20T14:44:48Z 2016-09-20T14:44:48Z CONTRIBUTOR

Also, should py.test work for stable and master? I was getting various errors and this seems strange to me-- I suspect it is a configuration issue on my end.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
248322071 https://github.com/pydata/xarray/pull/964#issuecomment-248322071 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0ODMyMjA3MQ== pwolfram 4295853 2016-09-20T14:41:52Z 2016-09-20T14:41:52Z CONTRIBUTOR

@shoyer and others, quick question-- as it turns out I need functionality from #812 for my work. Is it best to build off that issue (with a half-baked branch) or this one? I can form a hacky stop-gap solution in the meantime but it is clear I need to finish this off for the long term. Thanks in advance for the advice.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
246796164 https://github.com/pydata/xarray/pull/964#issuecomment-246796164 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0Njc5NjE2NA== max-sixty 5635139 2016-09-13T19:28:58Z 2016-09-13T19:28:58Z MEMBER

Would it be possible to write something like np.einsum with xarray named dimensions?

I think it's possible, by supplying the dimensions to sum over, and broadcasting the others. Similar to the inner_product example, but taking *dims rather than dims. Is that right?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
246731502 https://github.com/pydata/xarray/pull/964#issuecomment-246731502 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0NjczMTUwMg== shoyer 1217238 2016-09-13T15:59:28Z 2016-09-13T15:59:28Z MEMBER

CC @jhamman @rabernat @crusaderky @pwolfram @spencerahill @ajdawson

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
246551799 https://github.com/pydata/xarray/pull/964#issuecomment-246551799 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDI0NjU1MTc5OQ== shoyer 1217238 2016-09-13T02:04:38Z 2016-09-13T02:04:38Z MEMBER

This is now tested and ready for review. The API could particularly use feedback -- please take a look at the docstring and examples in the first comment. Long desired operations, like a fill value for where (#576) and cumsum (#791) should now be writable in only a few lines.

I have not yet hooked this up to the rest of xarray's code base, both because the set of changes we will be able to do with this are quite large, and because I'd like to give other contributors a chance to help/test. Note that the general version of apply_ufunc can include some significant overhead for doing the dispatch. For binary operations, we will probably want to use the pre-specialized versions (e.g., apply_dataset_ufunc).

Finally, given the generality of this operation, I'm considering renaming it from xr.apply_ufunc to simply xr.apply.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
239556426 https://github.com/pydata/xarray/pull/964#issuecomment-239556426 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDIzOTU1NjQyNg== max-sixty 5635139 2016-08-12T20:50:24Z 2016-08-12T20:50:32Z MEMBER

Thanks for thinking through these

This suggests maybe ds[bool_array] -> da.where(bool_array, drop=True).

I think that makes sense. drop=False would be too confusing

Maybe something like: left_join(da, inner_join(bool_array, other))?

The way I was thinking about it: both other and bool_array need a value for every value in da. So they both need to be subsets. So something like:

``` python assert set(other.dims) =< set(da.dims) assert set(bool_array.dims) =< set(da.dims)

other, _ = xr.broadcast(other, da) bool_array, _ = xr.broadcast(bool_array, da)

da.where(bool_array, other) ```

Is that consistent with the joins you were thinking of?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
239506907 https://github.com/pydata/xarray/pull/964#issuecomment-239506907 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDIzOTUwNjkwNw== shoyer 1217238 2016-08-12T17:21:25Z 2016-08-12T17:22:32Z MEMBER

@MaximilianR Two issues come to mind with remapping da[condition] = other -> da = da.where(bool_array, other): 1. What should da[bool_array] return? For consistency, I think we need to support both. We could alias that to where, too (da[bool_array] -> da.where(bool_array)), but this would be inconsistent with the existing behavior of da[bool_array] if da and bool_array are one-dimensional. This suggests maybe ds[bool_array] -> da.where(bool_array, drop=True). 2. What is other is not a scalar value, but is a DataArray? Currently, I don't think we align when indexing, but in this case, by necessity we would. For __getitem__ indexing, it's pretty obvious that alignment should preserve the indexes of the object being indexed. For __setitem__ indexing, I'm not sure. At the least they would be a little different from the defaults for where (which does an inner join like most xarray operations by default). Maybe something like: left_join(da, inner_join(bool_array, other))? We need to make sure that every xarray assignment like obj[key] or obj[key] = value (and also .loc and .sel and so on) works sanely with these alignment rules.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
239469432 https://github.com/pydata/xarray/pull/964#issuecomment-239469432 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDIzOTQ2OTQzMg== max-sixty 5635139 2016-08-12T14:58:15Z 2016-08-12T14:58:15Z MEMBER

When this is done & we can do where, I wonder whether

python da[bool_array] = 5

...could be sugar for...

python da.where(bool_array, 5)

i.e. do we get multidimensional indexing for free?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798
239347755 https://github.com/pydata/xarray/pull/964#issuecomment-239347755 https://api.github.com/repos/pydata/xarray/issues/964 MDEyOklzc3VlQ29tbWVudDIzOTM0Nzc1NQ== max-sixty 5635139 2016-08-12T02:34:12Z 2016-08-12T02:34:12Z MEMBER

This looks awesome! Would simplify a lot of the existing op stuff!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New function for applying vectorized functions for unlabeled arrays to xarray objects 170779798

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3360.993ms · About: xarray-datasette