html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/988#issuecomment-413732471,https://api.github.com/repos/pydata/xarray/issues/988,413732471,MDEyOklzc3VlQ29tbWVudDQxMzczMjQ3MQ==,1217238,2018-08-17T01:39:30Z,2018-08-17T01:39:30Z,MEMBER,"> xarray wrapping a pint array wrapping a dask array
Yep, this is pretty much what I was thinking of.
> I like composition, but that level of wrapping...feels wrong to me for some reason. Is there some elegance I'm missing here? (Other than array-like things playing together.)
The virtue of this approach vs setting an global ""attribute handler"" (as suggested here) is that everything is controlled locally. For example, suppose people want to plug in two separate unit systems into xarray (e.g., pint and unyt). If the unit handling is determined by the specific arrays, then libraries relying on both approaches internally can happily co-exist and even call each other.
In principle, this could be done safely with global handlers if you always know exactly when to switch back and forth, but that requires explicitly switching on handlers for even basic arithmetic. I doubt most users are going to bother, which is going to make using multiple tools that make use of this feature really hard.
The other big advantage is that you only have to write the bulk of the unit system once, e.g., to define operations on NumPy arrays.
> And then I still need hooks in xarray so that when pint does a calculation, it can update the metadata in xarray; so it feels like we're back here anyway.
Rather than struggling to keep `attrs` up to date, I think it would be more consistent with the rest of xarray (e.g., our handling of time units) to include units explicitly in the data model.
We could still do some work on the xarray side to make this easy to use. Specifically:
- The `DataArray.units` property could forward to `DataArray.data.units`.
- A `DataArray.to` or `DataArray.convert` method could call the relevant method on data and re-wrap it in a DataArray.
- A minimal layer on top of xarray's netCDF IO could handle unit attributes by wrapping/unwrapping arrays with pint.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,173612265
https://github.com/pydata/xarray/issues/988#issuecomment-413409482,https://api.github.com/repos/pydata/xarray/issues/988,413409482,MDEyOklzc3VlQ29tbWVudDQxMzQwOTQ4Mg==,1217238,2018-08-16T03:02:53Z,2018-08-16T03:02:53Z,MEMBER,"`__array_ufunc__` is now available in recent NumPy releases, and recently, I've been pushing on a NumPy enhancement proposal for `__array_function__` (which is near final approval now): http://www.numpy.org/neps/nep-0018-array-function-protocol.html
I think these overloads are a much more maintainable way to add features like unit handling into xarray, as outlined in our [development roadmap](http://xarray.pydata.org/en/latest/roadmap.html). It's not a complete system for overloading attribute handling in `attrs`, but I think it could need most of what users would need and is better than writing a separate hooks system for xarray only.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,173612265
https://github.com/pydata/xarray/issues/988#issuecomment-284314734,https://api.github.com/repos/pydata/xarray/issues/988,284314734,MDEyOklzc3VlQ29tbWVudDI4NDMxNDczNA==,1217238,2017-03-06T06:39:38Z,2017-03-06T06:39:38Z,MEMBER,"There's some chance that `__numpy_ufunc__` (or more likely `__array_ufunc__`) will finally arrive in time for the next release of NumPy, which could make it easier to handle this overloading (e.g., to allow for wrapping non-NumPy arrays inside xarray objects).
In general, this is a pretty tough design problem, which explains why it hasn't been solved yet :). But I was pretty happy with the way our `__numpy_ufunc__` discussions were going.
Speaking of PEP 472, if someone has energy to push on that, it would be really awesome to see that happen.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,173612265
https://github.com/pydata/xarray/issues/988#issuecomment-282082423,https://api.github.com/repos/pydata/xarray/issues/988,282082423,MDEyOklzc3VlQ29tbWVudDI4MjA4MjQyMw==,1217238,2017-02-23T18:44:43Z,2017-02-23T18:44:43Z,MEMBER,"> Is it not? The documentation says it's new in numpy 1.11 and we're at 1.12 now.
Definitely not, I'm afraid. It's gone back and forth several times on master but hasn't landed yet.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,173612265
https://github.com/pydata/xarray/issues/988#issuecomment-282075293,https://api.github.com/repos/pydata/xarray/issues/988,282075293,MDEyOklzc3VlQ29tbWVudDI4MjA3NTI5Mw==,1217238,2017-02-23T18:18:36Z,2017-02-23T18:18:36Z,MEMBER,"`__numpy_ufunc__` hasn't been implemented yet in a released version of NumPy, and when it lands it will probably be renamed `__array_ufunc__` (https://github.com/numpy/numpy/issues/5986). See [recent discussion on this](https://mail.scipy.org/pipermail/numpy-discussion/2017-February/076502.html).
We currently have the binary arithmetic logic in `_binary_op`, but [`xarray.core.computation.apply_ufunc`](https://github.com/pydata/xarray/blob/1cafb14cb4726da14abfb8976d22e6e2b5f3ae24/xarray/core/computation.py#L537) has more comprehensive logic, and that's where we should extend things going forward. (The `_binary_op` logic should really be replaced by calls to `apply_ufunc`.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,173612265
https://github.com/pydata/xarray/issues/988#issuecomment-280574774,https://api.github.com/repos/pydata/xarray/issues/988,280574774,MDEyOklzc3VlQ29tbWVudDI4MDU3NDc3NA==,1217238,2017-02-17T07:25:52Z,2017-02-17T07:25:52Z,MEMBER,Some related discussion that may be of interest to participants here is going on over in #1271.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,173612265
https://github.com/pydata/xarray/issues/988#issuecomment-243980885,https://api.github.com/repos/pydata/xarray/issues/988,243980885,MDEyOklzc3VlQ29tbWVudDI0Mzk4MDg4NQ==,1217238,2016-09-01T05:37:08Z,2016-09-01T05:37:08Z,MEMBER,"So I guess `set_options` it is, then, with a big warning in the docs discouraging library authors from setting it unilaterally.
I guess we can also start with the attrs only hooks for now and add the others later if/as necessary.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,173612265
https://github.com/pydata/xarray/issues/988#issuecomment-243289800,https://api.github.com/repos/pydata/xarray/issues/988,243289800,MDEyOklzc3VlQ29tbWVudDI0MzI4OTgwMA==,1217238,2016-08-29T23:35:35Z,2016-08-29T23:35:35Z,MEMBER,"I agree that end users are likely to set this flag unilaterally, especially for interactive use. That's fine. This could even be OK in a higher level library, though I would encourage requiring an explicit opt in application code.
One thing to consider is whether to allow multiple attribute handlers to be registered simultaneously or not. I kind of like a set_options interface that requires all handlers to be registered at once (as opposed to adding handlers incrementally ), because that ensures conflicts cannot arise inadvertantly.
Either way, I don't think the performance penalty here would be significant in most cases, given how much of Python's dynamic nature xarray already uses.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,173612265
https://github.com/pydata/xarray/issues/988#issuecomment-242950070,https://api.github.com/repos/pydata/xarray/issues/988,242950070,MDEyOklzc3VlQ29tbWVudDI0Mjk1MDA3MA==,1217238,2016-08-28T01:15:52Z,2016-08-28T01:16:13Z,MEMBER,"Let me give concrete examples of what this interface could look like.
To implement units:
``` python
from typing import List, Optional # optional Python 3.5 type annotations
@xarray.register_ufunc_variables_attrs_handler
def propagate_units(results: List[xarray.Variable],
context: xarray.UFuncContext) -> Optional[List[dict]]:
if context.func.__name__ in ['add', 'sub']:
units_set = set(getattr(arg, 'attrs', {}).get('units') for arg in context.args)
if len(units_set) > 1:
raise ValueError('not all input units the same: %r' % units_set)
units, = units_set
return [{'units': units}]
else:
return [] * len(results)
# or equivalently, don't return anything at all
```
Or to (partially) handle `cell_methods`:
``` python
@xarray.register_ufunc_variables_attrs_handler
def add_cell_methods(results, context):
if context.func.__name__ in ['mean', 'median', 'sum', 'min', 'max', 'std']):
dims = set(context.args[0].dims) - set(results[0].dims)
cell_methods = ': '.join(dims) + ': ' + context.func.__name__
return [{'cell_methods': cell_methods})
```
Or to implement `keep_attrs=True` if a function only has one input:
``` python
@xarray.register_ufunc_variables_attrs_handler
def always_keep_attrs(results, context):
if len(context.args) == 1:
return [context.args[0].attrs] * len(result)
```
Every time xarray does an operation, we would call all of these registered `ufunc_variables_attrs_handlers` to get list of attributes to add to result `Variable`. `attrs` on the resulting object (or objects if the ufunc has multiple outputs) would be accumulated by calling the handlers in arbitrary order and merging the resulting dicts. Repeated keys in the `attrs` dicts returned by different handlers would result in an error.
`xarray.UFuncContext` itself would be a simple struct-like class with a few attributes:
- `func`: the function being applied. Typically from NumPy or dask.array, but also could be an arbitrarily callable if a user calls `xarray.apply_ufunc` directly.
- `args`: positional arguments passed into the function. Possibly xarray `Variable` objects, `numpy.ndarray` or scalars.
- `kwargs`: additional dict of keyword arguments.
Similarly, we would have `register_ufunc_dataset_attrs_handler` for updating `Dataset` attrs.
The downside of this approach is that unlike the way NumPy handles things, this doesn't handle conflicting implementations well. If you try to use two different libraries that register their own global attribute handlers instead of using the context manager (e.g., two different units implementations), things will break, even if the unrelated code paths do not touch.
So alternatively to using the registration system, we could support/encourage using a context manager, e.g.,
``` python
with xarray.ufunc_variables_attrs_handlers([always_keep_attrs, add_cell_methods]):
# either augment or ignore other attrs handlers, possibly depending
# on the choice of a keyword argument to ufunc_variables_attrs_handlers
result = ds.mean()
```
It's kind of verbose, but certainly useful for libraries that want to be cautious about breaking other code. In general, it's poor behavior for libraries to unilaterally change unrelated code without an explicit opt-in. So perhaps the best approach is to encourage users to _always_ use a context manager, e.g.,
``` python
import contextlib
@contextlib.contextmanager
def my_attrs_context():
with xarray.ufunc_variables_attrs_handlers(
[always_keep_attrs, add_cell_methods, ...]):
yield
with my_attrs_context():
result = ds.mean() - 0.5 * (ds.max() - ds.min())
```
So maybe a subclass based implementation (with a custom attribute like `__xarray_attrs_handler__`) is the cleanest way to handle this, after all.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,173612265