home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where issue = 325810810 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 5

  • mcgibbon 4
  • kmpaul 2
  • dopplershift 1
  • shoyer 1
  • jhamman 1

author_association 2

  • CONTRIBUTOR 7
  • MEMBER 2

issue 1

  • Advice on unit-aware arithmetic · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
392138523 https://github.com/pydata/xarray/issues/2176#issuecomment-392138523 https://api.github.com/repos/pydata/xarray/issues/2176 MDEyOklzc3VlQ29tbWVudDM5MjEzODUyMw== mcgibbon 12307589 2018-05-25T18:11:55Z 2018-05-25T18:11:55Z CONTRIBUTOR

Thank you for the input everyone, this discussion has been very useful! Closing this issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Advice on unit-aware arithmetic 325810810
391800114 https://github.com/pydata/xarray/issues/2176#issuecomment-391800114 https://api.github.com/repos/pydata/xarray/issues/2176 MDEyOklzc3VlQ29tbWVudDM5MTgwMDExNA== mcgibbon 12307589 2018-05-24T17:41:27Z 2018-05-24T17:41:27Z CONTRIBUTOR

@dopplershift That's a good point. It's pretty trivial to create a sympl.DataArray from an xarray.DataArray, so perhaps I should be using a decorator that will convert xarray.DataArray to sympl.DataArray whenever one is passed into a sympl call. This would be similarly easy to do in metpy. One could also write a function to convert Dataset into one that contains unit-aware DataArray objects, or an open_dataset that calls xarray.open_dataset and then does such a conversion, though I'd wonder if certain Dataset calls (e.g. mean) might undo such a conversion.

In sympl our main concerns are unit checking at the boundary of components and in properly converting units when time stepping or adding outputs of components together. Maybe sympl should only be using this DataArray subclass internally, with type conversions or wrapping when taking DataArrays into and out of its methods? That would solve a lot of our problems.

unyt may be a better choice than pint for MetPy. Like I said in Sympl we don't use pint for unit information storage, only for conversion and arithmetic, so whether it uses an ndarray subclass doesn't apply for us.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Advice on unit-aware arithmetic 325810810
391783509 https://github.com/pydata/xarray/issues/2176#issuecomment-391783509 https://api.github.com/repos/pydata/xarray/issues/2176 MDEyOklzc3VlQ29tbWVudDM5MTc4MzUwOQ== dopplershift 221526 2018-05-24T16:47:20Z 2018-05-24T16:47:20Z CONTRIBUTOR

My problem with custom classes (subclasses or composition) is that you will never get those from a Dataset that you get from open_dataset(). That’s a problem for me when I’m trying to make things work for users who want to do simple scripted data analysis. I’m much more interested in #1938 (or #1118).

I’m also looking at moving from pint to unyt, which is Yt’s unit support, brought into a standalone package. Beyond some performance benefits (I’ve heard), it has the benefit of being a ndarray subclass, which means asanyarray will leave it alone. That would seem to make it easier to cram into a DataArray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Advice on unit-aware arithmetic 325810810
391596424 https://github.com/pydata/xarray/issues/2176#issuecomment-391596424 https://api.github.com/repos/pydata/xarray/issues/2176 MDEyOklzc3VlQ29tbWVudDM5MTU5NjQyNA== kmpaul 11411331 2018-05-24T05:50:26Z 2018-05-24T05:50:26Z CONTRIBUTOR

Well, I'm certainly not trying to argue that the PhysArray implementation is the solution. It's just a solution, and a solution for a much smaller problem.

I understand your concerns about cf_units. I've heard many people make the same argument for pint, and I appreciate it. I went with cf_units because of its dependence on UDUNITS, which is declared as the "standard" in the CF Conventions. I never had the time (nor do I foresee ever having the time) to see if pint was fully compliant with UDUNITS, so I went with something I knew was.

I personally think it's fine to discuss this here, unless other people would like to see this go offline. To address some of your issues:

1) As a container rather than subclass, it does not implement many of the methods of DataArray

Yes. I agree. In fact, my first implementation was a subclass of DataArray, but after reading the Xarray documentation here and the Issue #706, I decided on composition for this implementation.

2) There are a few design choices I don't understand, like why calendar is always a property of a PhysArray even when it isn't storing a time

I personally didn't see much of a benefit to a check like hasattr(obj, 'calendar') versus obj.calendar is None (other than the fact that these two checks are opposite). So, for my code, I felt that obj.calendar is None was a reasonable check to see if the units were "calendared time" units.

why cftime objects aren't used instead of units to manage time

As I mentioned in my post, this is just my personal preference. I think that calendared time units are...well...just units, and that the same mechanism for dealing with all other units should deal with calendared time units. It's just an aesthetic that I prefer. (That said, I'm extremely happy that someone finally dealt with non-standard calendars with cftime.)

why the positive attribute is important enough for PhysArray to manage (I've never seen it in any data I've used, and it's easy to check if a DataArray is all positive or negative with a function call)

The CF Conventions define the positive attribute to indicate that the field represents the vertical component of a vector, and the value of positive indicates what direction the vertical component points (up or down) if the data is positive. So, if you add field X and field Y, and X.positive == 'up' but Y.positive == 'down', then you need to actually subtract the fields. It's an annoying attribute that I've had to deal with when preparing data for CMIP6.

In the end, I'm happy doing whatever the community wants with this code. I can pull it out into it's own repo (along with the unit tests, which are also in the PyConform repo). And then, after that, I have no problem with people taking it in a different direction (e.g., using pint, injecting properties based on the value of the attributes, etc.). I'm also happy with a subclass option, as my C++ experiences in the past have made me very comfortable with inheritance. I take guidance easily. I just don't usually have a lot of time to devote to development any more these days. :-(

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Advice on unit-aware arithmetic 325810810
391542922 https://github.com/pydata/xarray/issues/2176#issuecomment-391542922 https://api.github.com/repos/pydata/xarray/issues/2176 MDEyOklzc3VlQ29tbWVudDM5MTU0MjkyMg== mcgibbon 12307589 2018-05-24T00:10:29Z 2018-05-24T00:10:29Z CONTRIBUTOR

@shoyer that notation might work, thanks for pointing it out! Maybe we can think of a more natural name for the accessor ("with_units"? "keep_units"? "uarray"? "u"?). I find the "storage" of units as a string in attrs to be much cleaner than any other implementation I've seen so far (like implementations that have a unit container over an underlying array, or an array of unit-aware objects). It has the added benefit that this is how units are conventionally stored in netCDF files. I don't think it's necessary to use a class other than ndarray for data storage.

@kmpaul the main reason I stayed away from cf_units is that I had bad experiences trying to get it to build with its dependencies in the past. Particularly it's an issue that it depends on the Udunits C library, which requires MinGW to install on Windows and has generally been a headache for me. I'd much prefer a pure Python unit handling implementation. For Sympl, we don't care so much about time units, because time is stored using datetime objects (potentially from the cftime package for alternate calendars). This is also the way that time units are conventionally stored in xarray, once decoded.

It may make sense for us to use some kind of stand-alone unit-aware DataArray implementation. I'd just need to be convinced that yours is well-designed, thoroughly tested, and easy to install with pip. The main things concerning me about PhysArray are 1) As a container rather than subclass, it does not implement many of the methods of DataArray and 2) There are a few design choices I don't understand, like why calendar is always a property of a PhysArray even when it isn't storing a time, why cftime objects aren't used instead of units to manage time, and why the positive attribute is important enough for PhysArray to manage (I've never seen it in any data I've used, and it's easy to check if a DataArray is all positive or negative with a function call). We could discuss these by e-mail if you like (it's my username at uw.edu). Other possibilities are that I'll take the implementation we come up with and give it its own package, or that we'll collaborate on such a package.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Advice on unit-aware arithmetic 325810810
391481382 https://github.com/pydata/xarray/issues/2176#issuecomment-391481382 https://api.github.com/repos/pydata/xarray/issues/2176 MDEyOklzc3VlQ29tbWVudDM5MTQ4MTM4Mg== shoyer 1217238 2018-05-23T20:14:26Z 2018-05-23T20:14:26Z MEMBER

If you implement arithmetic special methods like __add__ and __radd__ on your accessor, you could write things like: dt = DataArray(timestep.total_seconds(), attrs={'units': 's'}) for key in tendencies_list[0].keys(): return_state[key] = state[key].sympl + dt.sympl * ( 1.5 * tendencies_list[-1][key].sympl - 0.5 * tendencies_list[-2][key].sympl ) It's awkward but you can still use the infix notation.

Subclassing or writing a DataArray wrapper like what @kmpaul did in physarrays seems like the cleanest way to do this currently. The main challenge is to avoid relying on non-public parts of the DataArray API, but I think overwriting arithmetic would be pretty safe.

In the long term, I think the best solution here is probably an extensible interface for data in xarray objects that would let you put an array with units into an xarray object, similar to the hooks we want for sparse arrays (https://github.com/pydata/xarray/issues/1938).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Advice on unit-aware arithmetic 325810810
391470885 https://github.com/pydata/xarray/issues/2176#issuecomment-391470885 https://api.github.com/repos/pydata/xarray/issues/2176 MDEyOklzc3VlQ29tbWVudDM5MTQ3MDg4NQ== kmpaul 11411331 2018-05-23T19:37:26Z 2018-05-23T19:37:26Z CONTRIBUTOR

Thanks, @jhamman, for the reference.

I have, indeed, been thinking about this a lot. I've hit the same problem that you have @mcgibbon and came to the same conclusion that subclassing was the only way to go. I'm not sure I'm happy with my solution, but I wrote a (partial) implementation of a DataArray subclass to deal with units. @jhamman suggested that I pull this out into a separate repo, and I hope to do that one day... However, if you are curious, now, you can view it here:

https://github.com/NCAR/PyConform/tree/rewrite/source/pyconform/physarrays

Note that this implementation deals with time as just another unit, rather than requiring any special dealing with time. This works if you units package can deal with calendared times, too. I am currently using cf_units to deal with the units, not pint, but the result is similar. However, cf_units does not deal with calendar conversion, and it doesn't always deal with non-standard calendars well. In the end, though, if I could voice my own 2 cents, I believe that calendared times should be dealt with using the same mechanics as any other unit. Just my bias.

Also, note that this implementation also deals with the positive attribute of the data, in addition to the units and calendar attributes. The positive attribute can be tricky, but if the values don't match between two arrays, then you need to convert before doing any math.

Let me know what you think about this. If it's a priority, I can pull this out into its own repository.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Advice on unit-aware arithmetic 325810810
391443258 https://github.com/pydata/xarray/issues/2176#issuecomment-391443258 https://api.github.com/repos/pydata/xarray/issues/2176 MDEyOklzc3VlQ29tbWVudDM5MTQ0MzI1OA== mcgibbon 12307589 2018-05-23T18:03:27Z 2018-05-23T18:03:27Z CONTRIBUTOR

For reference, here are some of the sort of methods I've been adding that aren't currently in sympl:

def multiply(self, other):
    if isinstance(other, xr.DataArray):
        result = self._dataarray * other
        result.attrs['units'] = multiply_units(self._dataarray.attrs['units'], other.attrs['units'])
    else:
        result = self._dataarray * other
        result.attrs['units'] = self._dataarray.attrs['units']
    return result

def divide(self, other):
    print(self._dataarray, other)
    if isinstance(other, xr.DataArray):
        result = self._dataarray / other
        result.attrs['units'] = divide_units(self._dataarray.attrs['units'], other.attrs['units'])
    else:
        result = self._dataarray / other
        result.attrs['units'] = self._dataarray.attrs['units']
    return result

def add(self, other):
    result = self._dataarray + other.sympl.to_units(self._dataarray.attrs['units'])
    result.attrs['units'] = self._dataarray.attrs['units']
    return result

def subtract(self, other):
    result = self._dataarray - other.sympl.to_units(self._dataarray.attrs['units'])
    result.attrs['units'] = self._dataarray.attrs['units']
    return result
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Advice on unit-aware arithmetic 325810810
391441974 https://github.com/pydata/xarray/issues/2176#issuecomment-391441974 https://api.github.com/repos/pydata/xarray/issues/2176 MDEyOklzc3VlQ29tbWVudDM5MTQ0MTk3NA== jhamman 2443309 2018-05-23T17:59:32Z 2018-05-23T17:59:32Z MEMBER

cc @KMPaul who has also been working on this sort of thing recently.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Advice on unit-aware arithmetic 325810810

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.901ms · About: xarray-datasette