home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

23 rows where issue = 100295585 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 23 ✖

issue 1

  • support for units · 23 ✖

author_association 1

  • MEMBER 23
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
531603838 https://github.com/pydata/xarray/issues/525#issuecomment-531603838 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUzMTYwMzgzOA== shoyer 1217238 2019-09-15T22:13:31Z 2019-09-15T22:13:31Z MEMBER

Would it help for this integration to have pint Quanitites implement the dask custom collections interface for when it wraps a dask array? ... Then, instead of xarray checking for isinstance(dask_array_type), it could for check for "duck dask arrays" (e.g., those with both __array_function__ and __dask_graph__)?

Yes, great idea!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524572226 https://github.com/pydata/xarray/issues/525#issuecomment-524572226 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU3MjIyNg== shoyer 1217238 2019-08-24T18:41:13Z 2019-08-24T18:41:13Z MEMBER

The other virtue of the separate package is a faster release cycle. We can (and should!) still put a full example in the xarray docs.

For IO integration, I think the simplest choice would be to write utility functions for going to pint arrays from unit-free arrays with string “units” in attrs (and back). That will be easily composable with xarray’s existing IO functionality.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524569964 https://github.com/pydata/xarray/issues/525#issuecomment-524569964 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU2OTk2NA== shoyer 1217238 2019-08-24T18:03:24Z 2019-08-24T18:03:24Z MEMBER

That said, I agree that the dedicated accessor is a pretty good interface, especially if you want more methods/property than units and to. Even then array.pint.units and array.pint.to() is pretty readable. This could make sense if we are concerned about different interfaces for different unit libraries.

The accessor route is also definitely more conservative than putting first-class unit support in xarray proper, which I like.

As for where to put it, that's sort of up to you. I think it's probably going to get complicated enough that it should into a library that can be installed, rather than being a boilerplate example in xarray's docs. It could be in xarray if it's going to be very minimal (e.g., only one method + one property) but if you want more than that -- and judging by pint-pandas I would guess you do -- then it should probably go into pint or a dedicated package, e.g., pint-xarray.

I would not write a general purpose xarray+units library unless you're particularly excited about doing that for some reason -- it's much better to start with a particular integration that works well than to make a half-hearted effort at something general purpose without well-motivated use-cases.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524568953 https://github.com/pydata/xarray/issues/525#issuecomment-524568953 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU2ODk1Mw== shoyer 1217238 2019-08-24T17:47:22Z 2019-08-24T17:47:22Z MEMBER

With the units attribute, I was presuming based on the past comments that DataArray.units would be a new property; I forgot that DataArray.<attrname> passes along to DataArray.attrs.<attrname>, so that implementing something new for DataArray.units would be a breaking change!

I think the new property is still an option, even if we want to preserve accessing "units" from attrs, e.g., python @property def units(self): if hasattr(self.data, 'units'): # data is an array with units return self.data.units elif 'units' in self.attrs: # consider issuing a FutureWarning here? return self.attrs['units'] else: raise AttributeError('units')

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
514880353 https://github.com/pydata/xarray/issues/525#issuecomment-514880353 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUxNDg4MDM1Mw== shoyer 1217238 2019-07-25T03:25:47Z 2019-08-24T05:02:39Z MEMBER

I think we could do basic indexes with units after steps (1) and (2) in the big index refactor plan: https://github.com/pydata/xarray/issues/1603#issuecomment-511126208

At that point, indexes will be something that are propagated entirely separately from arrays. So even if the index will get cast into a pandas.index, the corresponding coordinate array will stick around.

The next level of support would be "unit array indexing", e.g., ds.sel(x=1000*u.meters). This will require an API for letting you define your own index classes -- something that we definitely want in the long term but will take more work to realize.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524520588 https://github.com/pydata/xarray/issues/525#issuecomment-524520588 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDUyMDU4OA== shoyer 1217238 2019-08-24T05:01:54Z 2019-08-24T05:01:54Z MEMBER

For the general xarray method, I think we would probably want something like DataArray.units_convert or DataArray.units_to. Or potentially this could use an accessor, e.g., DataArray.pint.to (which will always be a fallback option).

For the Dataset repr, it would probably be nice to print the units along with some of the array values. So yes, this could probably use some custom logic for recognizing quantity types, among other duck array types. If the number of distinct array types starts to get burdensomely large, we could expose an interface for registering new ones, e.g., xarray.register_inline_array_repr(array_type, inline_repr).

For rolling out a new units attribute and/or IO integration, we will need to be careful to preserve backwards compatibility for now (at least with a warning). I’m sure there is lots of code that expects array.attrs to be a string attribute today, so we should consider our options carefully before breaking all that code. The conservative choice would be to keep existing uses working for now unchanged (as a fallback), and save breaking changes for later once we are confident we know the right solution.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483814686 https://github.com/pydata/xarray/issues/525#issuecomment-483814686 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MzgxNDY4Ng== shoyer 1217238 2019-04-16T19:39:23Z 2019-04-16T19:39:23Z MEMBER

Would __array_function__ solve the problem with operator precedence? I thought they are separate issues because __mul__ and __rmul__ need not call any numpy functions, and will therefore not necessary dispatch to __array_function__.

Let me try to answer this more clearly: these are independent examples of the same problem.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483812516 https://github.com/pydata/xarray/issues/525#issuecomment-483812516 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MzgxMjUxNg== shoyer 1217238 2019-04-16T19:32:28Z 2019-04-16T19:32:28Z MEMBER

There's a whole section in NEP 13 about this: http://www.numpy.org/neps/nep-0013-ufunc-overrides.html#type-casting-hierarchy

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483812113 https://github.com/pydata/xarray/issues/525#issuecomment-483812113 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MzgxMjExMw== shoyer 1217238 2019-04-16T19:31:13Z 2019-04-16T19:31:13Z MEMBER

__array_function__ and __array_ufunc__ have the exact same operator precedence issues as __mul__/__rmul__. In all cases, properly written methods should return NotImplemented in some cases.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483803067 https://github.com/pydata/xarray/issues/525#issuecomment-483803067 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MzgwMzA2Nw== shoyer 1217238 2019-04-16T19:03:43Z 2019-04-16T19:03:43Z MEMBER

Another place to get started with this would be implementing __array_function__ in pint: https://github.com/hgrecco/pint/issues/790

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483802699 https://github.com/pydata/xarray/issues/525#issuecomment-483802699 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MzgwMjY5OQ== shoyer 1217238 2019-04-16T19:02:37Z 2019-04-16T19:02:37Z MEMBER

Your first example dispatches to dask.array.__mul__, which doesn't know anything about pint and doesn't know how to compose its operations because there are no hooks--the pint array just gets coerced to a numpy array. The second goes to pint.Quantity.__mul__, which assumes it can wrap the dask.array (because it duck typing) and seems to succeed in doing so.

Unfortunately I don't think either dask or pint handle this properly right now.

There is a protocol for Python's * operator, which involves calling __mul__ and __rmul__ methods. But if both dask and pint always returns the result instead of NotImplemented, it is impossible to ensure a consistent result for a * b and b * a if a and b are different types. (This exact same issue exists for __array_function__, too, because the dispatching protocol is copied from Python.)

Dask and pint need some system -- either opt-in or opt-out -- for recognizing that they cannot handle operations with some argument types.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482651181 https://github.com/pydata/xarray/issues/525#issuecomment-482651181 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjY1MTE4MQ== shoyer 1217238 2019-04-12T17:09:10Z 2019-04-12T17:09:10Z MEMBER

This would be a good issue to fix upstream, by resolving whether dask should wrap pint or pint should wrap dask :).

For what it's worth, I suspect the optimal wrapping order is: xarrays > pint > dask > numpy. This is because it's useful to get unit compatibility errors at "graph construction time" rather than "runtime".

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482650523 https://github.com/pydata/xarray/issues/525#issuecomment-482650523 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjY1MDUyMw== shoyer 1217238 2019-04-12T17:06:49Z 2019-04-12T17:06:49Z MEMBER

One additional issue. It seems like pint has some odd behavior with dask. Multiplication (and I assume addition) is not commutative:

This would be a good issue to fix upstream, by resolving whether dask should wrap pint or pint should wrap dask :).

I would really like to see units support in xarray, and I'm just wondering what the barrier to contribution to this issue is? Like is this a "leave @shoyer to it" kind of task? Or is it something which less experienced developers (such as myself) can help with?

I don't think there's a huge barrier to entry here and I would encourage someone else to dive into this. We could start by adding an experimental flag to xarray to disable coercing everything to numpy arrays, and do some experiments to see what works.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482643387 https://github.com/pydata/xarray/issues/525#issuecomment-482643387 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjY0MzM4Nw== shoyer 1217238 2019-04-12T16:44:16Z 2019-04-12T16:44:16Z MEMBER

(I just added a third bullet to my list above)

  1. once our minimum required numpy version is 1.17

@shoyer - what would be an approximate time frame for this?

First, we'll need to wait for NumPy 1.17 to be released :). But more seriously, if we do a breaking release of xarray we can probably with bumping the required NumPy version significantly.

It's definitely a smoother experience for users if we allow at least slightly older versions of NumPy (e.g., so they can use newer xarray with a version of NumPy pre-packaged with their system), but if keeping existing things working with the current version of NumPy is a pain, then it may be worth upgrading the minimum required version.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482641808 https://github.com/pydata/xarray/issues/525#issuecomment-482641808 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjY0MTgwOA== shoyer 1217238 2019-04-12T16:39:13Z 2019-04-12T16:40:47Z MEMBER

Three things will need to change internally in xarray:

  1. .data is currently required to return a NumPy or dask array. This will need to be relaxed to include "any duck array type". (For now, we can store an explicit list of these types.)
  2. We need to rewrite Xarray's internal array operations, found in xarray/core/duck_array_ops.py, to use NumPy's API when __array_function__ is enabled instead of our ad-hoc checks. Eventually (once our minimum required numpy version is 1.17), we should be able to delete most of duck_array_ops entirely!
  3. We should figure out what a minimal "units layer" would look like in xarray, exposing a few attributes or methods that call out to underlying unit implementations, e.g., DataArray.units should be redirected to pull out DataArray.data.units
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
457871736 https://github.com/pydata/xarray/issues/525#issuecomment-457871736 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ1Nzg3MTczNg== shoyer 1217238 2019-01-26T22:35:19Z 2019-01-26T22:35:26Z MEMBER

This is still relevant. Hopefully the advent of __array_function__ in NumPy will make this easier/possible.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
248081325 https://github.com/pydata/xarray/issues/525#issuecomment-248081325 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDI0ODA4MTMyNQ== shoyer 1217238 2016-09-19T18:35:14Z 2016-09-19T18:35:14Z MEMBER

@burnpanck Take a look at the approach described in #988 and let me know if you think that sounds viable.

NumPy subclasses inside xarray objects would probably mostly work, if we changed some internal uses of np.asarray to np.asanyarray. But it's also a pretty big rabbit hole. I'm still not sure there are any good ways to do operations like concatenate.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
242937257 https://github.com/pydata/xarray/issues/525#issuecomment-242937257 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDI0MjkzNzI1Nw== shoyer 1217238 2016-08-27T19:50:22Z 2016-08-27T19:50:22Z MEMBER

988 describes a possible approach for allowing third-party libraries to add units to xarray. It's not as comprehensive as a custom dtype, but might be enough to be useful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
182744774 https://github.com/pydata/xarray/issues/525#issuecomment-182744774 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE4Mjc0NDc3NA== shoyer 1217238 2016-02-11T07:19:19Z 2016-02-11T07:19:19Z MEMBER

If anyone is excited about working on the NumPy improvement we need to make this more feasible (namely, custom dtypes and duck typing) at BIDS, you should talk to @njsmith.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
181938629 https://github.com/pydata/xarray/issues/525#issuecomment-181938629 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE4MTkzODYyOQ== shoyer 1217238 2016-02-09T16:21:27Z 2016-02-09T16:21:27Z MEMBER

@hgrecco Are you suggesting that pint could wrap xarray objects, or that xarray could wrap pint? Either is certainly possible, though I'm a bit pessimistic that we can come up with a complete solution without the numpy fixes we've been discussing.

Also, just to note, xarray contains a Dataset type for representing collections of variables that often have different units (e.g., temperature and pressure). That suggests to me that it could make more sense to put pint and/or astropy.Quantity objects inside xarray arrays rather than the other way around.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
141853078 https://github.com/pydata/xarray/issues/525#issuecomment-141853078 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE0MTg1MzA3OA== shoyer 1217238 2015-09-21T01:28:18Z 2016-02-09T16:16:38Z MEMBER

@mhvk It would certainly be possible to extend dask.array to handle units, in either of the ways you suggest.

Although you could allow Quantity objects inside dask.arrays, I don't like that approach, because static checks like units really should be done only once when arrays are constructed (akin to dtype checks) rather than at evaluation time, and for every chunk. This suggests that tagging on the outside is the better approach.

So far, so good -- but with the current state of duck array typing in NumPy, it's really hard to be happy with this. Until __numpy_ufunc__ lands, we can't override operations like np.sqrt in a way that is remotely feasible for dask.arrays (we can't afford to load big arrays into memory). Likewise, we need overrides for standard numpy array utility functions like concatenate. But the worst part is that the lack of standard interfaces means that we lose the possibility of composing different arrays backends with your Quantity type -- it will only be able to wrap dask or numpy arrays, not sparse matrices or bolt arrays or some other type yet to be invented.

Once we have all that duck-array stuff, then yes, you certainly could write all a duck-array Quantity type that can wrap generic duck-arrays. But something like Quantity really only needs to override compute operations so that they can propagate dtypes -- there shouldn't be a need to override methods like concatenate. If you had an actual (parametric) dtype for units (e.g., Quantity[float64, 'meters']), then you would get all those dtype agnostic methods for free, which would make your life as an implementer much easier. Hence why I think custom dtypes would really be the ideal solution.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
131876284 https://github.com/pydata/xarray/issues/525#issuecomment-131876284 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDEzMTg3NjI4NA== shoyer 1217238 2015-08-17T16:14:56Z 2015-08-17T16:14:56Z MEMBER

Unfortunately, the astropy approach uses a numpy.ndarray subclass, which means it's mutually exclusive with dask.array. Otherwise, it does look very nice, though.

On Mon, Aug 17, 2015 at 8:38 AM, Ryan Abernathey notifications@github.com wrote:

Astropy has pretty good units support: http://astropy.readthedocs.org/en/latest/units/

Would it be possible to copy what they do?

Reply to this email directly or view it on GitHub: https://github.com/xray/xray/issues/525#issuecomment-131866203

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
130134807 https://github.com/pydata/xarray/issues/525#issuecomment-130134807 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDEzMDEzNDgwNw== shoyer 1217238 2015-08-12T02:01:01Z 2015-08-12T02:01:01Z MEMBER

Eventually, I hope so!

Unfortunately, doing this in a feasible and maintainable way will probably require upstream fixes in NumPy. In particular, better support for duck-array types (https://github.com/numpy/numpy/issues/4164) and/or the ability to write units as a custom NumPy dtypes. Both of these are on the NumPy roadmap, though they don't have a timeframe for when that will happen.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 153.708ms · About: xarray-datasette