home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

62 rows where issue = 100295585 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 17

  • shoyer 23
  • jthielen 7
  • mhvk 6
  • burnpanck 4
  • keewis 4
  • rabernat 3
  • nbren12 3
  • dopplershift 2
  • hgrecco 2
  • arsenovic 1
  • ngoldbaum 1
  • spencerahill 1
  • StanczakDominik 1
  • amine-aboufirass 1
  • stale[bot] 1
  • TomNicholas 1
  • riley-brady 1

author_association 3

  • MEMBER 31
  • CONTRIBUTOR 18
  • NONE 13

issue 1

  • support for units · 62 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1270731971 https://github.com/pydata/xarray/issues/525#issuecomment-1270731971 https://api.github.com/repos/pydata/xarray/issues/525 IC_kwDOAMm_X85LvdTD riley-brady 82663402 2022-10-06T21:46:38Z 2022-10-06T21:46:38Z NONE

I believe this can be closed via https://github.com/pydata/xarray/pull/5734 @andersy005?

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  support for units 100295585
763119626 https://github.com/pydata/xarray/issues/525#issuecomment-763119626 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDc2MzExOTYyNg== keewis 14808389 2021-01-19T20:33:47Z 2021-01-26T00:16:46Z MEMBER

I would expect astropy quantities to work just fine as long as they are duck arrays

actually, that turns out to be wrong. Since isinstance(data, np.ndarray) returns true for astropy.units.Quantity, it is cast to ndarray using np.asarray: https://github.com/pydata/xarray/blob/7dbbdcafed7f796ab77039ff797bcd31d9185903/xarray/core/variable.py#L231-L245

Adding ~or issubclass(type(data), np.ndarray)~ or type(data) != np.ndarray does allow wrapping a astropy.units quantity in Dataset / DataArray objects ~but it breaks a few tests~. Also, unless we modify the testsuite in xarray/tests/test_units.py to run with astropy.units instead of pint I can't really tell which features of xarray strip the units (in addition to the ones documented for pint). For that, we probably need to somehow create a generalization of the tests for duck arrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
733754394 https://github.com/pydata/xarray/issues/525#issuecomment-733754394 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDczMzc1NDM5NA== keewis 14808389 2020-11-25T14:52:58Z 2020-11-25T16:07:57Z MEMBER

I saw the discussion above about Quantities being more problematic

I would expect astropy quantities to work just fine as long as they are duck arrays. Also, the only limitation I would expect them to have is that they can't "wrap" anything other than numpy arrays. dask arrays are an exception since they can wrap quantities using the _meta attribute (which is something we try to avoid with pint, though).

For reference, the currently remaining issues for all duck arrays (except obviously dask) are documented here

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
733629234 https://github.com/pydata/xarray/issues/525#issuecomment-733629234 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDczMzYyOTIzNA== StanczakDominik 11289391 2020-11-25T10:48:57Z 2020-11-25T10:48:57Z CONTRIBUTOR

Hi! I'm just popping in as a very interested user of both xarray and unit packages to ask: since there's been some awesome progress made here and pint-xarray is now enough of A Thing to have documentation, though obviously experimental - how much work would you expect a corresponding package for astropy's Quantities to take, given the current state of things?

Are there any limitations that would prevent that? I saw the discussion above about Quantities being more problematic due to taking the subclass-from-numpy-arrays route, but I'm not sure how much of a roadblock that still is. I would suspect the API could be shared with pint-xarray (which, obviously, is experimental for now).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
562966170 https://github.com/pydata/xarray/issues/525#issuecomment-562966170 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDU2Mjk2NjE3MA== keewis 14808389 2019-12-08T16:25:58Z 2019-12-08T23:24:23Z MEMBER

sorry for the late reply, @ngoldbaum. For xarray to support unyt, it would have to implement NEP-18 (i.e. __array_function__) which I think it does not yet? I would expect to have unyt support come for free once pint support works so I would focus on that first (see #3594 for the current progress). Extending the tests to also test unyt would be possible but I'm thinking it would be better to put that into the accessor package (as discussed above for a possible pint accessor)?

There were a few unresolved design questions with how unit libraries should implement certain numpy functions (e.g. how where should behave when receiving array(nan) and array(0) which xarray uses to implement nanops, or which unit the result of full_like should have): see hgrecco/pint#905. Any supported unit library would have to behave the same way so I think it would be good to coordinate that. Thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
553644722 https://github.com/pydata/xarray/issues/525#issuecomment-553644722 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDU1MzY0NDcyMg== ngoldbaum 3126246 2019-11-13T23:03:23Z 2019-11-13T23:03:23Z NONE

Hi all, I was just pointed at this by someone who went to the NumFOCUS summit. I'm the lead developer for unyt, which I see has come up a little bit. If anyone wants to chat with me about adding support for unyt along with Pint I'd be happy to discuss.

{
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
531603838 https://github.com/pydata/xarray/issues/525#issuecomment-531603838 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUzMTYwMzgzOA== shoyer 1217238 2019-09-15T22:13:31Z 2019-09-15T22:13:31Z MEMBER

Would it help for this integration to have pint Quanitites implement the dask custom collections interface for when it wraps a dask array? ... Then, instead of xarray checking for isinstance(dask_array_type), it could for check for "duck dask arrays" (e.g., those with both __array_function__ and __dask_graph__)?

Yes, great idea!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
531603357 https://github.com/pydata/xarray/issues/525#issuecomment-531603357 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUzMTYwMzM1Nw== jthielen 3460034 2019-09-15T22:04:39Z 2019-09-15T22:04:39Z CONTRIBUTOR

Based the points raised by @crusaderky in https://github.com/hgrecco/pint/issues/878#issue-492678605 about how much special case handling xarray has for dask arrays, I was thinking recently about what it might take for the xarray > pint > dask.array wrapping discussed here and elsewhere to work as fluidly as xarray > dask.array currently does.

Would it help for this integration to have pint Quanitites implement the dask custom collections interface for when it wraps a dask array? I would think that this would allow a pint Quanitity to behave in a "dask-array-like" way rather than just an "array-like" way. Then, instead of xarray checking for isinstance(dask_array_type), it could for check for "duck dask arrays" (e.g., those with both __array_function__ and __dask_graph__)? There are almost certainly some subtle implementation details that would need to be worked out, but I'm guessing that this could take care of the bulk of the integration.

Also, if I'm incorrect with this line of thought, or there is a better way forward for implementing this wrapping pattern, please do let me know!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524572226 https://github.com/pydata/xarray/issues/525#issuecomment-524572226 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU3MjIyNg== shoyer 1217238 2019-08-24T18:41:13Z 2019-08-24T18:41:13Z MEMBER

The other virtue of the separate package is a faster release cycle. We can (and should!) still put a full example in the xarray docs.

For IO integration, I think the simplest choice would be to write utility functions for going to pint arrays from unit-free arrays with string “units” in attrs (and back). That will be easily composable with xarray’s existing IO functionality.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524570522 https://github.com/pydata/xarray/issues/525#issuecomment-524570522 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU3MDUyMg== jthielen 3460034 2019-08-24T18:12:55Z 2019-08-24T18:39:49Z CONTRIBUTOR

@shoyer I agree, the accessor interface makes a lot of sense for this: it's more conservative on the xarray side, while also giving the most flexibility for the pint + xarray integration.

Based on your feedback and what I'd hope to see out of the pint + xarray integration, I'm thinking a pint-adjacent package like pint-xarray may be the best route forward. ~~I'll create an issue on pint to inquire about that possibility.~~ See https://github.com/hgrecco/pint/issues/849.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524569964 https://github.com/pydata/xarray/issues/525#issuecomment-524569964 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU2OTk2NA== shoyer 1217238 2019-08-24T18:03:24Z 2019-08-24T18:03:24Z MEMBER

That said, I agree that the dedicated accessor is a pretty good interface, especially if you want more methods/property than units and to. Even then array.pint.units and array.pint.to() is pretty readable. This could make sense if we are concerned about different interfaces for different unit libraries.

The accessor route is also definitely more conservative than putting first-class unit support in xarray proper, which I like.

As for where to put it, that's sort of up to you. I think it's probably going to get complicated enough that it should into a library that can be installed, rather than being a boilerplate example in xarray's docs. It could be in xarray if it's going to be very minimal (e.g., only one method + one property) but if you want more than that -- and judging by pint-pandas I would guess you do -- then it should probably go into pint or a dedicated package, e.g., pint-xarray.

I would not write a general purpose xarray+units library unless you're particularly excited about doing that for some reason -- it's much better to start with a particular integration that works well than to make a half-hearted effort at something general purpose without well-motivated use-cases.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524569782 https://github.com/pydata/xarray/issues/525#issuecomment-524569782 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU2OTc4Mg== jthielen 3460034 2019-08-24T18:00:37Z 2019-08-24T18:01:11Z CONTRIBUTOR

Oh, okay, having the fallback like that was how I thought about implementing it. (I'm sorry that I didn't describe that in my initial comment.)

So would the way forward be to implement DataArray.units_convert()/DataArray.units_to() and DataArray.units as you described right now, but wait for and/or delegate IO integration? Or, should there also be a fully-backwards-compatible IO integration implemented now (such as an optional kwarg on open_dataset and to_netcdf)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524568953 https://github.com/pydata/xarray/issues/525#issuecomment-524568953 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU2ODk1Mw== shoyer 1217238 2019-08-24T17:47:22Z 2019-08-24T17:47:22Z MEMBER

With the units attribute, I was presuming based on the past comments that DataArray.units would be a new property; I forgot that DataArray.<attrname> passes along to DataArray.attrs.<attrname>, so that implementing something new for DataArray.units would be a breaking change!

I think the new property is still an option, even if we want to preserve accessing "units" from attrs, e.g., python @property def units(self): if hasattr(self.data, 'units'): # data is an array with units return self.data.units elif 'units' in self.attrs: # consider issuing a FutureWarning here? return self.attrs['units'] else: raise AttributeError('units')

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524568112 https://github.com/pydata/xarray/issues/525#issuecomment-524568112 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU2ODExMg== jthielen 3460034 2019-08-24T17:34:31Z 2019-08-24T17:34:31Z CONTRIBUTOR

@shoyer Thank you for the reply!

That sounds good about the repr custom logic.

With the units attribute, I was presuming based on the past comments that DataArray.units would be a new property; I forgot that DataArray.<attrname> passes along to DataArray.attrs.<attrname>, so that implementing something new for DataArray.units would be a breaking change! In trying to avoid such a change, though, I think it would be confusing to have a DataArray-level DataArray.units_convert method and not a corresponding DataArray-level way of getting at the units. So, would it be okay to just implement this unit interface (unit access, unit conversion, and IO) through an accessor, and start out with just a pint accessor? If so, where should it be implemented?

Possible ideas I had:

  • As a boilerplate example in the xarray documentation that downstream libraries or end-users can implement?
  • In xarray itself?
  • In pint or a new pint-adjacent package (similar to pint-pandas)?
  • A new xarray-adjacent package for general-purpose unit compatibility?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
514880353 https://github.com/pydata/xarray/issues/525#issuecomment-514880353 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUxNDg4MDM1Mw== shoyer 1217238 2019-07-25T03:25:47Z 2019-08-24T05:02:39Z MEMBER

I think we could do basic indexes with units after steps (1) and (2) in the big index refactor plan: https://github.com/pydata/xarray/issues/1603#issuecomment-511126208

At that point, indexes will be something that are propagated entirely separately from arrays. So even if the index will get cast into a pandas.index, the corresponding coordinate array will stick around.

The next level of support would be "unit array indexing", e.g., ds.sel(x=1000*u.meters). This will require an API for letting you define your own index classes -- something that we definitely want in the long term but will take more work to realize.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524520588 https://github.com/pydata/xarray/issues/525#issuecomment-524520588 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDUyMDU4OA== shoyer 1217238 2019-08-24T05:01:54Z 2019-08-24T05:01:54Z MEMBER

For the general xarray method, I think we would probably want something like DataArray.units_convert or DataArray.units_to. Or potentially this could use an accessor, e.g., DataArray.pint.to (which will always be a fallback option).

For the Dataset repr, it would probably be nice to print the units along with some of the array values. So yes, this could probably use some custom logic for recognizing quantity types, among other duck array types. If the number of distinct array types starts to get burdensomely large, we could expose an interface for registering new ones, e.g., xarray.register_inline_array_repr(array_type, inline_repr).

For rolling out a new units attribute and/or IO integration, we will need to be careful to preserve backwards compatibility for now (at least with a warning). I’m sure there is lots of code that expects array.attrs to be a string attribute today, so we should consider our options carefully before breaking all that code. The conservative choice would be to keep existing uses working for now unchanged (as a fallback), and save breaking changes for later once we are confident we know the right solution.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524518305 https://github.com/pydata/xarray/issues/525#issuecomment-524518305 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDUxODMwNQ== jthielen 3460034 2019-08-24T04:17:54Z 2019-08-24T04:17:54Z CONTRIBUTOR

With the progress being made with https://github.com/pydata/xarray/pull/2956, https://github.com/pydata/xarray/pull/3238, and https://github.com/hgrecco/pint/pull/764, I was thinking that now might be a good time to work out the details of the "minimal units layer" mentioned by @shoyer in https://github.com/pydata/xarray/issues/525#issuecomment-482641808 and https://github.com/pydata/xarray/issues/988#issuecomment-413732471?

I'd be glad to try putting together a PR that could follow up on https://github.com/pydata/xarray/pull/3238 for it, but I would want to ask for some guidance:

(For reference, below is the action list from https://github.com/pydata/xarray/issues/988#issuecomment-413732471)

  • The DataArray.units property could forward to DataArray.data.units.
  • A DataArray.to or DataArray.convert method could call the relevant method on data and re-wrap it in a DataArray.
  • A minimal layer on top of xarray's netCDF IO could handle unit attributes by wrapping/unwrapping arrays with pint.

DataArray.units

Having DataArray.units forward to DataArray.data.units should work for pint, unyt, and quantities, but should a fallback to DataArray.data.unit be added for astropy.units? Also, how should DataArray.units behave if DataArray.data does not have a "units" or "unit" attribute, but DataArray.attrs['units'] exists?

DataArray.to()/DataArray.convert()

DataArray.to() would be consistent with the methods for pint, unyt, and astropy.units (the relevant method for quantities looks to be .rescale()), however, it is very similar to the numerous output-related DataArray.to_*() methods. Is this okay, or would DataArray.convert() or some other method name be better to avoid confusion?

Units and IO

While wrapping and unwrapping arrays with pint itself should be straightforward, I really don't know what the best API for it should be, especially for input.

Some possibilities that came to mind (by no means an exhaustive list):

  • Leave open_dataset as it is now, but provide examples in the documentation for how to reconstruct a new Dataset with unit arrays (perhaps provide a boilerplate function or accessor)
  • Add a kwarg like "wrap_units" to open_dataset() that accepts a quantity constructor (like ureg.Quantity in pint) that is applied within each variable
  • Devise some generalized system for specifying the internal array structure in the opened dataset (to handle other duck array types, not just unit arrays)

With any of these, tests for lazy-loading would be crucial (I don't know yet how pint will handle that).

Output may be easier: I was thinking that unwrapping could be done implicitly by automatically putting str(DataArray.units) as the "units" attribute and replacing the unit array with its magnitude/value?

Extra questions based on sparse implementation

__repr__

Will a set of repr functions for each unit array type need to be added like they were for sparse in https://github.com/pydata/xarray/pull/3211? Or should there be some more general system implemented because of all of the possible combinations that would arise with other duck array types?

to_dense()/.to_numpy_data()/.to_numpy()

What is the expected behavior with unit arrays with regards to this soon-to-be-implemented conversion method?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
514877824 https://github.com/pydata/xarray/issues/525#issuecomment-514877824 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUxNDg3NzgyNA== jthielen 3460034 2019-07-25T03:11:20Z 2019-07-25T03:11:20Z CONTRIBUTOR

Thank you for the insight!

So if I'm understanding things correctly as they stand now, dimension coordinates store their values internally as a pandas.Index, which would mean, to implement this directly, this becomes an upstream issue in pandas to allow a ndarray-like unit array inside a pandas.Index? Based on what I've seen on the pandas side, this looks far from straightforward.

With that in mind, would "dimension coordinates with units" (or more generally "dimension coordinates with __array_function__ implementers") be another use case that best falls under flexible indices (#1603)?

(In the mean time, I would guess that the best workaround is using an accessor interface to handle unit-related operations on coordinates, since the attrs are preserved.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
514805244 https://github.com/pydata/xarray/issues/525#issuecomment-514805244 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUxNDgwNTI0NA== keewis 14808389 2019-07-24T21:22:33Z 2019-07-24T21:22:33Z MEMBER

In that branch I left it as xfail because I came to the conclusion that there was nothing I could do (directly at least): when creating a DataArray, the coords get * passed through as_variable(), which puts the coord array in Variable and for dimensions at least * calls Variable.to_index_variable(). In there the variable is converted to an IndexVariable and the array is * wrapped by PandasIndexAdapter where the array is * first passed through np.asarray (this would probably have to be removed/changed) and then to pandas.Index, which is where the units get stripped -- which can be verified by directly passing a unit array to it.

The units of coordinates that are not dimensions are not stripped: ```python

ureg = pint.UnitRegistry() v = np.arange(10 * 20).reshape(10, 20) * ureg.m / ureg.s d = np.arange(10) * ureg.m d2 = d.to(ureg.cm) t = np.arange(20) * ureg.s array = xr.DataArray(data=v, dims=('d', 't'), coords={'d': d, 'd2': ('d', d2), 't': t}) array.d.data array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) array.d2.data <Quantity([ 0. 100. 200. 300. 400. 500. 600. 700. 800. 900.], 'centimeter')> ``` However, that branch is a quick hack, and I would suspect that supporting duck arrays has a similar effect.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
514452182 https://github.com/pydata/xarray/issues/525#issuecomment-514452182 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUxNDQ1MjE4Mg== jthielen 3460034 2019-07-24T02:19:08Z 2019-07-24T02:19:08Z CONTRIBUTOR

In light of the recent activity with __array_function__ in #3117, I took a quick look to see if it worked with Pint as modified in https://github.com/hgrecco/pint/pull/764. The basics of sticking a Pint Quantity in a DataArray seem to work well, and the perhaps the greatest issues are on Pint's end...right now https://github.com/hgrecco/pint/pull/764 is limited in the functions it handles through __array_function__, and there are some quirks with operator precedence.

However, the other main problem was that coordinates did not work with Quantity's. Looking again at https://github.com/pydata/xarray/issues/1938#issuecomment-510953379 and #2956, this is not surprising. I'm curious though about what it would take to let indexing work with Pint (or other unit arrays)? For most of my use cases (meteorological analysis as in MetPy), having units with coordinates is just as important as having units with the data itself. I'd be interested in helping implement it, but I would greatly appreciate some initial direction, since I'm new to that part of the xarray codebase.

Also, cc @keewis, since I saw in #2956 you have a unit-support branch that looks like it attempts to extend NumpyIndexingAdapter to work with unit arrays, but still has the coordinates-with-units tests marked as xfail.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
514099076 https://github.com/pydata/xarray/issues/525#issuecomment-514099076 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUxNDA5OTA3Ng== amine-aboufirass 24267968 2019-07-23T08:00:20Z 2019-07-23T08:00:50Z NONE

By defining __array_prepare__ and __array_wraps__ most numpy functions and array attributes work as expected without monkey patching or having a specialized math module.

@hgrecco "Most numpy functions" is a bit of an overstatement. Really important functions like np.dot do not work with Pint.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483835929 https://github.com/pydata/xarray/issues/525#issuecomment-483835929 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MzgzNTkyOQ== mhvk 2789820 2019-04-16T20:44:32Z 2019-04-16T20:44:32Z NONE

Indeed, all of us over-riders have to start to return NotImplemented if we don't know what the other class is - and I write this having recently realized that in astropy's Quantity I have caused similar problems by trying to make a guess (despite being admonished to refuse to do that). Of course, numpy's ndarray is the worst culprit here, just coercing everything to array.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483814686 https://github.com/pydata/xarray/issues/525#issuecomment-483814686 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MzgxNDY4Ng== shoyer 1217238 2019-04-16T19:39:23Z 2019-04-16T19:39:23Z MEMBER

Would __array_function__ solve the problem with operator precedence? I thought they are separate issues because __mul__ and __rmul__ need not call any numpy functions, and will therefore not necessary dispatch to __array_function__.

Let me try to answer this more clearly: these are independent examples of the same problem.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483812516 https://github.com/pydata/xarray/issues/525#issuecomment-483812516 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MzgxMjUxNg== shoyer 1217238 2019-04-16T19:32:28Z 2019-04-16T19:32:28Z MEMBER

There's a whole section in NEP 13 about this: http://www.numpy.org/neps/nep-0013-ufunc-overrides.html#type-casting-hierarchy

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483812113 https://github.com/pydata/xarray/issues/525#issuecomment-483812113 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MzgxMjExMw== shoyer 1217238 2019-04-16T19:31:13Z 2019-04-16T19:31:13Z MEMBER

__array_function__ and __array_ufunc__ have the exact same operator precedence issues as __mul__/__rmul__. In all cases, properly written methods should return NotImplemented in some cases.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483807351 https://github.com/pydata/xarray/issues/525#issuecomment-483807351 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MzgwNzM1MQ== nbren12 1386642 2019-04-16T19:16:19Z 2019-04-16T19:16:19Z CONTRIBUTOR

Would __array_function__ solve the problem with operator precedence? I thought they are separate issues because __mul__ and __rmul__ need not call any numpy functions, and will therefore not necessary dispatch to __array_function__.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483803067 https://github.com/pydata/xarray/issues/525#issuecomment-483803067 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MzgwMzA2Nw== shoyer 1217238 2019-04-16T19:03:43Z 2019-04-16T19:03:43Z MEMBER

Another place to get started with this would be implementing __array_function__ in pint: https://github.com/hgrecco/pint/issues/790

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483802699 https://github.com/pydata/xarray/issues/525#issuecomment-483802699 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MzgwMjY5OQ== shoyer 1217238 2019-04-16T19:02:37Z 2019-04-16T19:02:37Z MEMBER

Your first example dispatches to dask.array.__mul__, which doesn't know anything about pint and doesn't know how to compose its operations because there are no hooks--the pint array just gets coerced to a numpy array. The second goes to pint.Quantity.__mul__, which assumes it can wrap the dask.array (because it duck typing) and seems to succeed in doing so.

Unfortunately I don't think either dask or pint handle this properly right now.

There is a protocol for Python's * operator, which involves calling __mul__ and __rmul__ methods. But if both dask and pint always returns the result instead of NotImplemented, it is impossible to ensure a consistent result for a * b and b * a if a and b are different types. (This exact same issue exists for __array_function__, too, because the dispatching protocol is copied from Python.)

Dask and pint need some system -- either opt-in or opt-out -- for recognizing that they cannot handle operations with some argument types.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
483799967 https://github.com/pydata/xarray/issues/525#issuecomment-483799967 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4Mzc5OTk2Nw== dopplershift 221526 2019-04-16T18:54:37Z 2019-04-16T18:54:37Z CONTRIBUTOR

@shoyer I agree with that wrapping order. I think I'd also be in favor of starting with an experiment to disable coercing to arrays.

@nbren12 The non-communicative multiplication is a consequence of operator dispatch in Python, and the reason why we want __array_function__ from numpy. Your first example dispatches to dask.array.__mul__, which doesn't know anything about pint and doesn't know how to compose its operations because there are no hooks--the pint array just gets coerced to a numpy array. The second goes to pint.Quantity.__mul__, which assumes it can wrap the dask.array (because it duck typing) and seems to succeed in doing so.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482651181 https://github.com/pydata/xarray/issues/525#issuecomment-482651181 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjY1MTE4MQ== shoyer 1217238 2019-04-12T17:09:10Z 2019-04-12T17:09:10Z MEMBER

This would be a good issue to fix upstream, by resolving whether dask should wrap pint or pint should wrap dask :).

For what it's worth, I suspect the optimal wrapping order is: xarrays > pint > dask > numpy. This is because it's useful to get unit compatibility errors at "graph construction time" rather than "runtime".

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482650523 https://github.com/pydata/xarray/issues/525#issuecomment-482650523 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjY1MDUyMw== shoyer 1217238 2019-04-12T17:06:49Z 2019-04-12T17:06:49Z MEMBER

One additional issue. It seems like pint has some odd behavior with dask. Multiplication (and I assume addition) is not commutative:

This would be a good issue to fix upstream, by resolving whether dask should wrap pint or pint should wrap dask :).

I would really like to see units support in xarray, and I'm just wondering what the barrier to contribution to this issue is? Like is this a "leave @shoyer to it" kind of task? Or is it something which less experienced developers (such as myself) can help with?

I don't think there's a huge barrier to entry here and I would encourage someone else to dive into this. We could start by adding an experimental flag to xarray to disable coercing everything to numpy arrays, and do some experiments to see what works.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482647686 https://github.com/pydata/xarray/issues/525#issuecomment-482647686 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjY0NzY4Ng== TomNicholas 35968931 2019-04-12T16:57:43Z 2019-04-12T16:57:43Z MEMBER

I would really like to see units support in xarray, and I'm just wondering what the barrier to contribution to this issue is? Like is this a "leave @shoyer to it" kind of task? Or is it something which less experienced developers (such as myself) can help with?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482643700 https://github.com/pydata/xarray/issues/525#issuecomment-482643700 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjY0MzcwMA== nbren12 1386642 2019-04-12T16:45:17Z 2019-04-12T16:45:17Z CONTRIBUTOR

One additional issue. It seems like pint has some odd behavior with dask. Multiplication (and I assume addition) is not commutative: ``` In [42]: da.ones((10,)) * ureg.m Out[42]: dask.array<mul, shape=(10,), dtype=float64, chunksize=(10,)>

In [43]: ureg.m * da.ones((10,)) Out[43]: dask.array<mul, shape=(10,), dtype=float64, chunksize=(10,)> <Unit('meter')> ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482643387 https://github.com/pydata/xarray/issues/525#issuecomment-482643387 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjY0MzM4Nw== shoyer 1217238 2019-04-12T16:44:16Z 2019-04-12T16:44:16Z MEMBER

(I just added a third bullet to my list above)

  1. once our minimum required numpy version is 1.17

@shoyer - what would be an approximate time frame for this?

First, we'll need to wait for NumPy 1.17 to be released :). But more seriously, if we do a breaking release of xarray we can probably with bumping the required NumPy version significantly.

It's definitely a smoother experience for users if we allow at least slightly older versions of NumPy (e.g., so they can use newer xarray with a version of NumPy pre-packaged with their system), but if keeping existing things working with the current version of NumPy is a pain, then it may be worth upgrading the minimum required version.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482641808 https://github.com/pydata/xarray/issues/525#issuecomment-482641808 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjY0MTgwOA== shoyer 1217238 2019-04-12T16:39:13Z 2019-04-12T16:40:47Z MEMBER

Three things will need to change internally in xarray:

  1. .data is currently required to return a NumPy or dask array. This will need to be relaxed to include "any duck array type". (For now, we can store an explicit list of these types.)
  2. We need to rewrite Xarray's internal array operations, found in xarray/core/duck_array_ops.py, to use NumPy's API when __array_function__ is enabled instead of our ad-hoc checks. Eventually (once our minimum required numpy version is 1.17), we should be able to delete most of duck_array_ops entirely!
  3. We should figure out what a minimal "units layer" would look like in xarray, exposing a few attributes or methods that call out to underlying unit implementations, e.g., DataArray.units should be redirected to pull out DataArray.data.units
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482642261 https://github.com/pydata/xarray/issues/525#issuecomment-482642261 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjY0MjI2MQ== rabernat 1197350 2019-04-12T16:40:42Z 2019-04-12T16:40:42Z MEMBER

2. once our minimum required numpy version is 1.17

@shoyer - what would be an approximate time frame for this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482641944 https://github.com/pydata/xarray/issues/525#issuecomment-482641944 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjY0MTk0NA== rabernat 1197350 2019-04-12T16:39:40Z 2019-04-12T16:39:40Z MEMBER

Probably worth pinging @dopplershift again. He has wrestled with this a lot.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
482639629 https://github.com/pydata/xarray/issues/525#issuecomment-482639629 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ4MjYzOTYyOQ== nbren12 1386642 2019-04-12T16:32:25Z 2019-04-12T16:32:25Z CONTRIBUTOR

@rabernat recent post inspired me to check out this issue. What would this issue entail now that __array_function__ is in numpy? Is there some reason this is more complicated than adding an appropriate __array_function__ to pint's quantity class?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
457871736 https://github.com/pydata/xarray/issues/525#issuecomment-457871736 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ1Nzg3MTczNg== shoyer 1217238 2019-01-26T22:35:19Z 2019-01-26T22:35:26Z MEMBER

This is still relevant. Hopefully the advent of __array_function__ in NumPy will make this easier/possible.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
457830371 https://github.com/pydata/xarray/issues/525#issuecomment-457830371 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDQ1NzgzMDM3MQ== stale[bot] 26384082 2019-01-26T13:17:09Z 2019-01-26T13:17:09Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
248468400 https://github.com/pydata/xarray/issues/525#issuecomment-248468400 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDI0ODQ2ODQwMA== mhvk 2789820 2016-09-20T23:39:29Z 2016-09-20T23:39:29Z NONE

@burnpanck - thanks for the very well-posed description of why units are so useful not as some meta-data, but as an integral property. Of course, this is also why making them part of a new dtype is a great idea! But failing that, I'd agree that it has to be part of something like an ndarray subclass; this is indeed what we do in astropy.units.Quantity (and concatenate does not work for us either...).

Now, off-topic but still: what is a little less wonderful is that there seem to be quite a few independent units implementations around (even just in astronomy, there is that of amuse; ours is based on things initially developped by pynbody). It may well be hard to merge them at this stage, but it would be good to think how we could at least interoperate...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
248255299 https://github.com/pydata/xarray/issues/525#issuecomment-248255299 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDI0ODI1NTI5OQ== burnpanck 1310437 2016-09-20T09:49:23Z 2016-09-20T09:51:30Z CONTRIBUTOR

Or another way to put it: While typical metadata/attributes are only relevant if you eventually read them (which is where you will notice if they were lost on the way), units are different: They work silently behind the scene at all times, even if you do not explicitly look for them. You want an addition to fail if units don't match, without having to explicitly first test if the operands have units. So what should the ufunc_hook do if it finds two Variables that don't seem to carry units, raise an exception? Most probably not, as that would prevent to use xarray at the same time without units. So if the units are lost on the way, you might never notice, but end up with wrong data. To me, that is just not unlikely enough to happen given the damage it can do (e.g. the time it takes to find out what's going on once you realise you get wrong data).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
248255426 https://github.com/pydata/xarray/issues/525#issuecomment-248255426 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDI0ODI1NTQyNg== burnpanck 1310437 2016-09-20T09:50:00Z 2016-09-20T09:50:00Z CONTRIBUTOR

So for now, I'm hunting for np.asarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
248252494 https://github.com/pydata/xarray/issues/525#issuecomment-248252494 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDI0ODI1MjQ5NA== burnpanck 1310437 2016-09-20T09:36:24Z 2016-09-20T09:36:24Z CONTRIBUTOR

988 would certainly allow to me to implement unit functionality on xarray, probably by leveraging an existing units package.

What I don't like with that approach is the fact that I essentially end up with a separate distinct implementation of units. I am afraid that I will either have to re-implement many of the helpers that I wrote to work with physical quantities to be xarray aware. Furthermore, one important aspect of units packages is that it prevents you from doing conversion mistakes. But that only works as long as you don't forget to carry the units with you. Having units just as attributes to xarray makes it as simple as forgetting to read the attributes when accessing the data to lose the units. The units inside xarray approach would have the advantage that whenever you end up accessing the data inside xarray, you automatically have the units with you. From a conceptual point of view, the units are really an integral part of the data, so they should sit right there with the data. Whenever you do something with the data, you have to deal with the units. That is true no matter if it is implemented as an attribute handler or directly on the data array. My fear is, attributes leave the impression of "optional" metadata which are too easily lost. E.g. xarray doesn't call it's ufunc_hook for some operation where it should, and you silently lose units. My hope is that with nested arrays that carry units, you would instead fail verbosely. Of course, np.concatenate is precisely one of these cases where unit packages struggle with to get their hook in (and where units on dtypes would help). So they fight the same problem. Nonetheless, these problems are known and solved as well as possible in the units packages, but in xarray, one would have to deal with them all over again.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
248081325 https://github.com/pydata/xarray/issues/525#issuecomment-248081325 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDI0ODA4MTMyNQ== shoyer 1217238 2016-09-19T18:35:14Z 2016-09-19T18:35:14Z MEMBER

@burnpanck Take a look at the approach described in #988 and let me know if you think that sounds viable.

NumPy subclasses inside xarray objects would probably mostly work, if we changed some internal uses of np.asarray to np.asanyarray. But it's also a pretty big rabbit hole. I'm still not sure there are any good ways to do operations like concatenate.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
248059952 https://github.com/pydata/xarray/issues/525#issuecomment-248059952 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDI0ODA1OTk1Mg== burnpanck 1310437 2016-09-19T17:24:21Z 2016-09-19T17:24:21Z CONTRIBUTOR

+1 for units support. I agree, parametrised dtypes would be the preferred solution, but I don't want to wait that long (I would be willing to contribute to that end, but I'm afraid that would exceed my knowledge of numpy).

I have never used dask. I understand that the support for dask arrays is a central feature for xarray. However, the way I see it, if one would put a (unit-aware) ndarray subclass into an xarray, then units should work out of the box. As you discussed, this seems not so easy to make work together with dask (particularly in a generic way). However, shouldn't that be an issue that the dask community anyway has to solve (i.e.: currently there is no way to use any units package together with dask, right)? In that sense, allowing such arrays inside xarrays would force users to choose between dask and units, which is something they have to do anyway. But for a big part of users, that would be a very quick way to units!

Or am I missing something here? I'll just try to monkeypatch xarray to that end, and see how far I get...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
242937257 https://github.com/pydata/xarray/issues/525#issuecomment-242937257 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDI0MjkzNzI1Nw== shoyer 1217238 2016-08-27T19:50:22Z 2016-08-27T19:50:22Z MEMBER

988 describes a possible approach for allowing third-party libraries to add units to xarray. It's not as comprehensive as a custom dtype, but might be enough to be useful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
229421229 https://github.com/pydata/xarray/issues/525#issuecomment-229421229 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDIyOTQyMTIyOQ== dopplershift 221526 2016-06-29T17:02:28Z 2016-06-29T17:02:28Z CONTRIBUTOR

I agree that custom dtypes is the right solution (and I'll go dig some more there). In the meantime, I'm not sure why you couldn't wrap an xarray DataArray in one of pint's Quantity instances. With the exception of also wanting units on coordinates, this seems like a straightforward way to get at least some unit functionality.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
182744774 https://github.com/pydata/xarray/issues/525#issuecomment-182744774 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE4Mjc0NDc3NA== shoyer 1217238 2016-02-11T07:19:19Z 2016-02-11T07:19:19Z MEMBER

If anyone is excited about working on the NumPy improvement we need to make this more feasible (namely, custom dtypes and duck typing) at BIDS, you should talk to @njsmith.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
182195886 https://github.com/pydata/xarray/issues/525#issuecomment-182195886 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE4MjE5NTg4Ng== spencerahill 6200806 2016-02-10T04:45:17Z 2016-02-10T04:45:17Z CONTRIBUTOR

Not to be pedantic, but just one more :+1: on ultimately implementing units support within xarray -- that would be huge.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
182170778 https://github.com/pydata/xarray/issues/525#issuecomment-182170778 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE4MjE3MDc3OA== hgrecco 278566 2016-02-10T02:22:07Z 2016-02-10T02:22:07Z NONE

@shoyer When we prototyped Pint we tried putting Quantity objects inside numpy array. It was was working fine but the performance and memory hit was too large. We were convinced that our current design was right when we wrote the first code using it. The case might be different with xarray. It would be nice to see some code using xarray and units (as if this was an already implemented feature).

@mhvk I do agree with your views. We also mention these limitations in the Pint documentation. Wrapping (instead of subclassing) adds another issue: some Numpy functions do not recognize a Quantity object as an array. Therefore any function that call numpy.asanyarray will erase the information that this is a quantity (See my issue here numpy/numpy#4072).

In any case, as was mentioned before in the thread Custom dtypes and Duck typing will be great for this.

In spite of this limitations, we chose wrapping because we want to support quantities even if NumPy is not installed. It has worked really nice for us, working in most of the common cases even for numpy arrays.

Regarding interoperating, it will be great. It will be even better if we can move into one, blessed, solution under the pydata umbrella (or similar).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
181944878 https://github.com/pydata/xarray/issues/525#issuecomment-181944878 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE4MTk0NDg3OA== arsenovic 1228240 2016-02-09T16:32:32Z 2016-02-09T16:32:32Z NONE

id just like to chime in and say that this feature would really be sweet. i always find myself doing a lot work to handle/convert different units. it seems that adding units to labeled axes does a lot to describe a set of data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
181938629 https://github.com/pydata/xarray/issues/525#issuecomment-181938629 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE4MTkzODYyOQ== shoyer 1217238 2016-02-09T16:21:27Z 2016-02-09T16:21:27Z MEMBER

@hgrecco Are you suggesting that pint could wrap xarray objects, or that xarray could wrap pint? Either is certainly possible, though I'm a bit pessimistic that we can come up with a complete solution without the numpy fixes we've been discussing.

Also, just to note, xarray contains a Dataset type for representing collections of variables that often have different units (e.g., temperature and pressure). That suggests to me that it could make more sense to put pint and/or astropy.Quantity objects inside xarray arrays rather than the other way around.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
141853078 https://github.com/pydata/xarray/issues/525#issuecomment-141853078 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE0MTg1MzA3OA== shoyer 1217238 2015-09-21T01:28:18Z 2016-02-09T16:16:38Z MEMBER

@mhvk It would certainly be possible to extend dask.array to handle units, in either of the ways you suggest.

Although you could allow Quantity objects inside dask.arrays, I don't like that approach, because static checks like units really should be done only once when arrays are constructed (akin to dtype checks) rather than at evaluation time, and for every chunk. This suggests that tagging on the outside is the better approach.

So far, so good -- but with the current state of duck array typing in NumPy, it's really hard to be happy with this. Until __numpy_ufunc__ lands, we can't override operations like np.sqrt in a way that is remotely feasible for dask.arrays (we can't afford to load big arrays into memory). Likewise, we need overrides for standard numpy array utility functions like concatenate. But the worst part is that the lack of standard interfaces means that we lose the possibility of composing different arrays backends with your Quantity type -- it will only be able to wrap dask or numpy arrays, not sparse matrices or bolt arrays or some other type yet to be invented.

Once we have all that duck-array stuff, then yes, you certainly could write all a duck-array Quantity type that can wrap generic duck-arrays. But something like Quantity really only needs to override compute operations so that they can propagate dtypes -- there shouldn't be a need to override methods like concatenate. If you had an actual (parametric) dtype for units (e.g., Quantity[float64, 'meters']), then you would get all those dtype agnostic methods for free, which would make your life as an implementer much easier. Hence why I think custom dtypes would really be the ideal solution.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
181916808 https://github.com/pydata/xarray/issues/525#issuecomment-181916808 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE4MTkxNjgwOA== mhvk 2789820 2016-02-09T15:35:37Z 2016-02-09T15:35:37Z NONE

@hgrecco - for astropy's Quantity, we currently also rely on __array_prepare__ and __array_wrap__. The main annoyances are (1) one cannot change the input before a numpy ufunc is called, and therefore often has no choice but to let a wrong calculation proceed; (2) proper recognition in non-ufunc functions is sparse (e.g., np.dot, etc.; see http://docs.astropy.org/en/latest/known_issues.html#quantity-issues)

Aside: at some point I'd hope to get the various implementations of units to talk together: it would be good to have an API that works such that units are inter-operable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
181631334 https://github.com/pydata/xarray/issues/525#issuecomment-181631334 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE4MTYzMTMzNA== hgrecco 278566 2016-02-08T23:59:37Z 2016-02-08T23:59:37Z NONE

I am one of the authors of Pint and I was just pointed here by @arsenovic

Pint does not subclass ndarray, it rathers wrap any numerical type dispatching to the wrapped value any attribute access that does not understand. By defining __array_prepare__ and __array_wraps__ most numpy functions and array attributes work as expected without monkey patching or having a specialized math module. For example:

``` python

import numpy as np import pint ureg = pint.UnitRegistry() [1., 4., 9.] * ureg.meter # a list is interpreted as an ndarray <Quantity([1. 4. 9.], 'meter')> np.sqrt() <Quantity([ 1. 2. 3.], 'meter ** 0.5')> .sum() <Quantity(6.0, 'meter ** 0.5')> ```

I think something similar can be done for xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
141999146 https://github.com/pydata/xarray/issues/525#issuecomment-141999146 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE0MTk5OTE0Ng== mhvk 2789820 2015-09-21T14:29:47Z 2015-09-21T14:29:47Z NONE

p.s. For concatenate, you need unit conversion as well, so sadly Quantity does need to override that too (and currently cannot, which is rather annoying).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
141997335 https://github.com/pydata/xarray/issues/525#issuecomment-141997335 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE0MTk5NzMzNQ== mhvk 2789820 2015-09-21T14:22:33Z 2015-09-21T14:22:33Z NONE

@shoyer - fair enough, and sad we don't have __numpy_ufunc__ yet... I agree that with Quantity inside, one would end up duplicating work for every chunk, which makes it less than ideal even though it would probably be the easier approach to implement.

For the outside method, from the dask perspective, it would indeed be easiest if units were done as a dtype, since then you can punt all the decisions to helper routines. My guess, though, is that it will be a while before numpy will include what is required to tell, e.g., that if I add something in m to something in cm, the second argument has to be multiplied by 0.01. But astropy does provide something just like that: quantity_helpers exposes a dict keyed by operation, which holds functions that return the required converters given the units. E.g., in the above example, internally what happens is

``` converters, result_unit = UFUNC_HELPERSnp.add result_unit

Unit("m")

converters[0]

None

converters[1]

<function astropy.units.quantity_helper.get_converter.\<locals>.\<lambda>>

converters1

0.01

```

In dask, you could run the converters on your individual chunks, though obviously I don't know how easy it is to add an extra step like this without slowing down other aspects too much.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
141697319 https://github.com/pydata/xarray/issues/525#issuecomment-141697319 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDE0MTY5NzMxOQ== mhvk 2789820 2015-09-19T18:53:43Z 2015-09-19T18:53:43Z NONE

@shoyer - as one who thinks unit support is probably the single best thing astropy has (and is co-maintainer of astropy.units), I thought I'd pipe in: why would it be a problem that astropy's Quantity is an ndarray subclass? I must admit not having used dask arrays, but since they use ndarray internally for the pieces, shouldn't the fact that Quantity has the same interface/methods, make it relatively easy to swap ndarray for Quantity internally? I'd be quite happy to help think about this (surely it cannot be as bad as it is for MaskedArray ;-).

Alternatively, maybe it is easier to tag on the outside rather than the inside. This would also not seem to be that hard, given that astropy's Quantity is really just a wrapper around ndarray that carries a Unit instance. I think the parts that truly wrap might be separated from those that override ndarray methods, and would be willing to implement that if there is a good reason (like making dask quantities possible...). It may be that in this case one would not use Quantity proper, but rather just the parts of units where the real magic happens: in the Unit class (which does the unit conversion) and in quantity_helpers.py (which tells what unit conversion is necessary for a given operation/function).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
131876284 https://github.com/pydata/xarray/issues/525#issuecomment-131876284 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDEzMTg3NjI4NA== shoyer 1217238 2015-08-17T16:14:56Z 2015-08-17T16:14:56Z MEMBER

Unfortunately, the astropy approach uses a numpy.ndarray subclass, which means it's mutually exclusive with dask.array. Otherwise, it does look very nice, though.

On Mon, Aug 17, 2015 at 8:38 AM, Ryan Abernathey notifications@github.com wrote:

Astropy has pretty good units support: http://astropy.readthedocs.org/en/latest/units/

Would it be possible to copy what they do?

Reply to this email directly or view it on GitHub: https://github.com/xray/xray/issues/525#issuecomment-131866203

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
131866203 https://github.com/pydata/xarray/issues/525#issuecomment-131866203 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDEzMTg2NjIwMw== rabernat 1197350 2015-08-17T15:38:14Z 2015-08-17T15:38:14Z MEMBER

Astropy has pretty good units support: http://astropy.readthedocs.org/en/latest/units/ Would it be possible to copy what they do?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
130134807 https://github.com/pydata/xarray/issues/525#issuecomment-130134807 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDEzMDEzNDgwNw== shoyer 1217238 2015-08-12T02:01:01Z 2015-08-12T02:01:01Z MEMBER

Eventually, I hope so!

Unfortunately, doing this in a feasible and maintainable way will probably require upstream fixes in NumPy. In particular, better support for duck-array types (https://github.com/numpy/numpy/issues/4164) and/or the ability to write units as a custom NumPy dtypes. Both of these are on the NumPy roadmap, though they don't have a timeframe for when that will happen.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1762.987ms · About: xarray-datasette