html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2176#issuecomment-392138523,https://api.github.com/repos/pydata/xarray/issues/2176,392138523,MDEyOklzc3VlQ29tbWVudDM5MjEzODUyMw==,12307589,2018-05-25T18:11:55Z,2018-05-25T18:11:55Z,CONTRIBUTOR,"Thank you for the input everyone, this discussion has been very useful! Closing this issue.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,325810810 https://github.com/pydata/xarray/issues/2176#issuecomment-391800114,https://api.github.com/repos/pydata/xarray/issues/2176,391800114,MDEyOklzc3VlQ29tbWVudDM5MTgwMDExNA==,12307589,2018-05-24T17:41:27Z,2018-05-24T17:41:27Z,CONTRIBUTOR,"@dopplershift That's a good point. It's pretty trivial to create a `sympl.DataArray` from an `xarray.DataArray`, so perhaps I should be using a decorator that will convert `xarray.DataArray` to `sympl.DataArray` whenever one is passed into a `sympl` call. This would be similarly easy to do in `metpy`. One could also write a function to convert `Dataset` into one that contains unit-aware `DataArray` objects, or an `open_dataset` that calls `xarray.open_dataset` and then does such a conversion, though I'd wonder if certain `Dataset` calls (e.g. `mean`) might undo such a conversion. In `sympl` our main concerns are unit checking at the boundary of components and in properly converting units when time stepping or adding outputs of components together. Maybe `sympl` should only be using this `DataArray` subclass internally, with type conversions or wrapping when taking `DataArray`s into and out of its methods? That would solve a lot of our problems. `unyt` may be a better choice than `pint` for MetPy. Like I said in Sympl we don't use `pint` for unit information storage, only for conversion and arithmetic, so whether it uses an ndarray subclass doesn't apply for us.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,325810810 https://github.com/pydata/xarray/issues/2176#issuecomment-391783509,https://api.github.com/repos/pydata/xarray/issues/2176,391783509,MDEyOklzc3VlQ29tbWVudDM5MTc4MzUwOQ==,221526,2018-05-24T16:47:20Z,2018-05-24T16:47:20Z,CONTRIBUTOR,"My problem with custom classes (subclasses or composition) is that you will never get those from a `Dataset` that you get from `open_dataset()`. That’s a problem for me when I’m trying to make things work for users who want to do simple scripted data analysis. I’m much more interested in #1938 (or #1118). I’m also looking at moving from pint to [unyt](https://github.com/yy-project/unyt), which is Yt’s unit support, brought into a standalone package. Beyond some performance benefits (I’ve heard), it has the benefit of being a ndarray subclass, which means `asanyarray` will leave it alone. That would seem to make it easier to cram into a `DataArray`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,325810810 https://github.com/pydata/xarray/issues/2176#issuecomment-391596424,https://api.github.com/repos/pydata/xarray/issues/2176,391596424,MDEyOklzc3VlQ29tbWVudDM5MTU5NjQyNA==,11411331,2018-05-24T05:50:26Z,2018-05-24T05:50:26Z,CONTRIBUTOR,"Well, I'm certainly not trying to argue that the `PhysArray` implementation is _the_ solution. It's just _a_ solution, and a solution for a much smaller problem. I understand your concerns about `cf_units`. I've heard many people make the same argument for `pint`, and I appreciate it. I went with `cf_units` because of its dependence on UDUNITS, which is declared as the ""standard"" in the [CF Conventions](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html). I never had the time (nor do I foresee ever having the time) to see if `pint` was fully compliant with UDUNITS, so I went with something I knew was. I personally think it's fine to discuss this here, unless other people would like to see this go offline. To address some of your issues: > 1) As a container rather than subclass, it does not implement many of the methods of DataArray Yes. I agree. In fact, my first implementation _was_ a subclass of `DataArray`, but after reading the Xarray documentation [here](http://xarray.pydata.org/en/stable/internals.html#extending-xarray) and the Issue #706, I decided on composition for this implementation. > 2) There are a few design choices I don't understand, like why calendar is always a property of a `PhysArray` even when it isn't storing a time I personally didn't see much of a benefit to a check like `hasattr(obj, 'calendar')` versus `obj.calendar is None` (other than the fact that these two checks are opposite). So, for my code, I felt that `obj.calendar is None` was a reasonable check to see if the units were ""calendared time"" units. > why `cftime` objects aren't used instead of units to manage time As I mentioned in my post, this is just my personal preference. I think that calendared time units are...well...just units, and that the same mechanism for dealing with all other units should deal with calendared time units. It's just an aesthetic that I prefer. (That said, I'm extremely happy that someone finally dealt with non-standard calendars with `cftime`.) > why the `positive` attribute is important enough for PhysArray to manage (I've never seen it in any data I've used, and it's easy to check if a DataArray is all positive or negative with a function call) The CF Conventions define the `positive` attribute to indicate that the field represents the vertical component of a vector, and the value of `positive` indicates what direction the vertical component points (`up` or `down`) if the data is positive. So, if you add field `X` and field `Y`, and `X.positive == 'up'` but `Y.positive == 'down'`, then you need to actually subtract the fields. It's an annoying attribute that I've had to deal with when preparing data for CMIP6. In the end, I'm happy doing whatever the community wants with this code. I can pull it out into it's own repo (along with the unit tests, which are also in the PyConform repo). And then, after that, I have no problem with people taking it in a different direction (e.g., using `pint`, injecting properties based on the value of the attributes, etc.). I'm also happy with a subclass option, as my C++ experiences in the past have made me very comfortable with inheritance. I take guidance easily. I just don't usually have a lot of time to devote to development any more these days. :-(","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,325810810 https://github.com/pydata/xarray/issues/2176#issuecomment-391542922,https://api.github.com/repos/pydata/xarray/issues/2176,391542922,MDEyOklzc3VlQ29tbWVudDM5MTU0MjkyMg==,12307589,2018-05-24T00:10:29Z,2018-05-24T00:10:29Z,CONTRIBUTOR,"@shoyer that notation might work, thanks for pointing it out! Maybe we can think of a more natural name for the accessor (""with_units""? ""keep_units""? ""uarray""? ""u""?). I find the ""storage"" of units as a string in attrs to be much cleaner than any other implementation I've seen so far (like implementations that have a unit container over an underlying array, or an array of unit-aware objects). It has the added benefit that this is how units are conventionally stored in netCDF files. I don't think it's necessary to use a class other than ndarray for data storage. @kmpaul the main reason I stayed away from `cf_units` is that I had bad experiences trying to get it to build with its dependencies in the past. Particularly it's an issue that it depends on the Udunits C library, which requires MinGW to install on Windows and has generally been a headache for me. I'd much prefer a pure Python unit handling implementation. For Sympl, we don't care so much about time units, because time is stored using `datetime` objects (potentially from the `cftime` package for alternate calendars). This is also the way that time units are conventionally stored in `xarray`, once decoded. It may make sense for us to use some kind of stand-alone unit-aware DataArray implementation. I'd just need to be convinced that yours is well-designed, thoroughly tested, and easy to install with pip. The main things concerning me about `PhysArray` are 1) As a container rather than subclass, it does not implement many of the methods of `DataArray` and 2) There are a few design choices I don't understand, like why `calendar` is always a property of a `PhysArray` even when it isn't storing a time, why `cftime` objects aren't used instead of units to manage time, and why the `positive` attribute is important enough for `PhysArray` to manage (I've never seen it in any data I've used, and it's easy to check if a `DataArray` is all positive or negative with a function call). We could discuss these by e-mail if you like (it's my username at uw.edu). Other possibilities are that I'll take the implementation we come up with and give it its own package, or that we'll collaborate on such a package.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,325810810 https://github.com/pydata/xarray/issues/2176#issuecomment-391481382,https://api.github.com/repos/pydata/xarray/issues/2176,391481382,MDEyOklzc3VlQ29tbWVudDM5MTQ4MTM4Mg==,1217238,2018-05-23T20:14:26Z,2018-05-23T20:14:26Z,MEMBER,"If you implement arithmetic special methods like `__add__` and `__radd__` on your accessor, you could write things like: ``` dt = DataArray(timestep.total_seconds(), attrs={'units': 's'}) for key in tendencies_list[0].keys(): return_state[key] = state[key].sympl + dt.sympl * ( 1.5 * tendencies_list[-1][key].sympl - 0.5 * tendencies_list[-2][key].sympl ) ``` It's awkward but you can still use the infix notation. Subclassing or writing a DataArray wrapper like what @kmpaul did in physarrays seems like the cleanest way to do this currently. The main challenge is to avoid relying on non-public parts of the DataArray API, but I think overwriting arithmetic would be pretty safe. In the long term, I think the best solution here is probably an extensible interface for `data` in xarray objects that would let you put an array with units into an xarray object, similar to the hooks we want for sparse arrays (https://github.com/pydata/xarray/issues/1938).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,325810810 https://github.com/pydata/xarray/issues/2176#issuecomment-391470885,https://api.github.com/repos/pydata/xarray/issues/2176,391470885,MDEyOklzc3VlQ29tbWVudDM5MTQ3MDg4NQ==,11411331,2018-05-23T19:37:26Z,2018-05-23T19:37:26Z,CONTRIBUTOR,"Thanks, @jhamman, for the reference. I have, indeed, been thinking about this a lot. I've hit the same problem that you have @mcgibbon and came to the same conclusion that subclassing was the only way to go. I'm not sure I'm happy with my solution, but I wrote a (partial) implementation of a DataArray subclass to deal with units. @jhamman suggested that I pull this out into a separate repo, and I hope to do that one day... However, if you are curious, now, you can view it here: https://github.com/NCAR/PyConform/tree/rewrite/source/pyconform/physarrays Note that this implementation deals with time as just another unit, rather than requiring any special dealing with time. This works if you units package can deal with calendared times, too. I am currently using `cf_units` to deal with the units, not `pint`, but the result is similar. However, `cf_units` does not deal with calendar conversion, and it doesn't always deal with non-standard calendars well. In the end, though, if I could voice my own 2 cents, I believe that calendared times should be dealt with using the same mechanics as any other unit. Just my bias. Also, note that this implementation also deals with the `positive` attribute of the data, in addition to the `units` and `calendar` attributes. The `positive` attribute can be tricky, but if the values don't match between two arrays, then you need to convert before doing any math. Let me know what you think about this. If it's a priority, I can pull this out into its own repository. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,325810810 https://github.com/pydata/xarray/issues/2176#issuecomment-391443258,https://api.github.com/repos/pydata/xarray/issues/2176,391443258,MDEyOklzc3VlQ29tbWVudDM5MTQ0MzI1OA==,12307589,2018-05-23T18:03:27Z,2018-05-23T18:03:27Z,CONTRIBUTOR,"For reference, here are some of the sort of methods I've been adding that aren't currently in `sympl`: def multiply(self, other): if isinstance(other, xr.DataArray): result = self._dataarray * other result.attrs['units'] = multiply_units(self._dataarray.attrs['units'], other.attrs['units']) else: result = self._dataarray * other result.attrs['units'] = self._dataarray.attrs['units'] return result def divide(self, other): print(self._dataarray, other) if isinstance(other, xr.DataArray): result = self._dataarray / other result.attrs['units'] = divide_units(self._dataarray.attrs['units'], other.attrs['units']) else: result = self._dataarray / other result.attrs['units'] = self._dataarray.attrs['units'] return result def add(self, other): result = self._dataarray + other.sympl.to_units(self._dataarray.attrs['units']) result.attrs['units'] = self._dataarray.attrs['units'] return result def subtract(self, other): result = self._dataarray - other.sympl.to_units(self._dataarray.attrs['units']) result.attrs['units'] = self._dataarray.attrs['units'] return result","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,325810810 https://github.com/pydata/xarray/issues/2176#issuecomment-391441974,https://api.github.com/repos/pydata/xarray/issues/2176,391441974,MDEyOklzc3VlQ29tbWVudDM5MTQ0MTk3NA==,2443309,2018-05-23T17:59:32Z,2018-05-23T17:59:32Z,MEMBER,cc @KMPaul who has also been working on this sort of thing recently. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,325810810