html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/525#issuecomment-531603357,https://api.github.com/repos/pydata/xarray/issues/525,531603357,MDEyOklzc3VlQ29tbWVudDUzMTYwMzM1Nw==,3460034,2019-09-15T22:04:39Z,2019-09-15T22:04:39Z,CONTRIBUTOR,"Based the points raised by @crusaderky in https://github.com/hgrecco/pint/issues/878#issue-492678605 about how much special case handling xarray has for dask arrays, I was thinking recently about what it might take for the xarray > pint > dask.array wrapping discussed here and elsewhere to work as fluidly as xarray > dask.array currently does.
Would it help for this integration to have pint Quanitites implement the [dask custom collections interface](https://docs.dask.org/en/latest/custom-collections.html) for when it wraps a dask array? I would think that this would allow a pint Quanitity to behave in a ""dask-array-like"" way rather than just an ""array-like"" way. Then, instead of xarray checking for `isinstance(dask_array_type)`, it could for check for ""duck dask arrays"" (e.g., those with both `__array_function__` and `__dask_graph__`)? There are almost certainly some subtle implementation details that would need to be worked out, but I'm guessing that this could take care of the bulk of the integration.
Also, if I'm incorrect with this line of thought, or there is a better way forward for implementing this wrapping pattern, please do let me know!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,100295585
https://github.com/pydata/xarray/issues/525#issuecomment-524570522,https://api.github.com/repos/pydata/xarray/issues/525,524570522,MDEyOklzc3VlQ29tbWVudDUyNDU3MDUyMg==,3460034,2019-08-24T18:12:55Z,2019-08-24T18:39:49Z,CONTRIBUTOR,"@shoyer I agree, the accessor interface makes a lot of sense for this: it's more conservative on the xarray side, while also giving the most flexibility for the pint + xarray integration.
Based on your feedback and what I'd hope to see out of the pint + xarray integration, I'm thinking a pint-adjacent package like pint-xarray may be the best route forward. ~~I'll create an issue on pint to inquire about that possibility.~~ See https://github.com/hgrecco/pint/issues/849.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,100295585
https://github.com/pydata/xarray/issues/525#issuecomment-524569782,https://api.github.com/repos/pydata/xarray/issues/525,524569782,MDEyOklzc3VlQ29tbWVudDUyNDU2OTc4Mg==,3460034,2019-08-24T18:00:37Z,2019-08-24T18:01:11Z,CONTRIBUTOR,"Oh, okay, having the fallback like that was how I thought about implementing it. (I'm sorry that I didn't describe that in my initial comment.)
So would the way forward be to implement `DataArray.units_convert()`/`DataArray.units_to()` and `DataArray.units` as you described right now, but wait for and/or delegate IO integration? Or, should there also be a fully-backwards-compatible IO integration implemented now (such as an optional kwarg on `open_dataset` and `to_netcdf`)?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,100295585
https://github.com/pydata/xarray/issues/525#issuecomment-524568112,https://api.github.com/repos/pydata/xarray/issues/525,524568112,MDEyOklzc3VlQ29tbWVudDUyNDU2ODExMg==,3460034,2019-08-24T17:34:31Z,2019-08-24T17:34:31Z,CONTRIBUTOR,"@shoyer Thank you for the reply!
That sounds good about the repr custom logic.
With the units attribute, I was presuming based on the past comments that `DataArray.units` would be a new property; I forgot that `DataArray.` passes along to `DataArray.attrs.`, so that implementing something new for `DataArray.units` would be a breaking change! In trying to avoid such a change, though, I think it would be confusing to have a DataArray-level `DataArray.units_convert` method and not a corresponding DataArray-level way of getting at the units. So, would it be okay to just implement this unit interface (unit access, unit conversion, and IO) through an accessor, and start out with just a pint accessor? If so, where should it be implemented?
Possible ideas I had:
- As a boilerplate example in the xarray documentation that downstream libraries or end-users can implement?
- In xarray itself?
- In pint or a new pint-adjacent package (similar to [pint-pandas](https://github.com/hgrecco/pint-pandas))?
- A new xarray-adjacent package for general-purpose unit compatibility?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,100295585
https://github.com/pydata/xarray/issues/525#issuecomment-524518305,https://api.github.com/repos/pydata/xarray/issues/525,524518305,MDEyOklzc3VlQ29tbWVudDUyNDUxODMwNQ==,3460034,2019-08-24T04:17:54Z,2019-08-24T04:17:54Z,CONTRIBUTOR,"With the progress being made with https://github.com/pydata/xarray/pull/2956, https://github.com/pydata/xarray/pull/3238, and https://github.com/hgrecco/pint/pull/764, I was thinking that now might be a good time to work out the details of the ""minimal units layer"" mentioned by @shoyer in https://github.com/pydata/xarray/issues/525#issuecomment-482641808 and https://github.com/pydata/xarray/issues/988#issuecomment-413732471?
I'd be glad to try putting together a PR that could follow up on https://github.com/pydata/xarray/pull/3238 for it, but I would want to ask for some guidance:
(For reference, below is the action list from https://github.com/pydata/xarray/issues/988#issuecomment-413732471)
> - The `DataArray.units` property could forward to `DataArray.data.units`.
> - A `DataArray.to` or `DataArray.convert` method could call the relevant method on data and re-wrap it in a DataArray.
> - A minimal layer on top of xarray's netCDF IO could handle unit attributes by wrapping/unwrapping arrays with pint.
**`DataArray.units`**
Having `DataArray.units` forward to `DataArray.data.units` should work for `pint`, `unyt`, and `quantities`, but should a fallback to `DataArray.data.unit` be added for `astropy.units`? Also, how should `DataArray.units` behave if `DataArray.data` does not have a ""units"" or ""unit"" attribute, but `DataArray.attrs['units']` exists?
**`DataArray.to()`/`DataArray.convert()`**
`DataArray.to()` would be consistent with the methods for `pint`, `unyt`, and `astropy.units` (the relevant method for `quantities` looks to be `.rescale()`), however, it is very similar to the numerous output-related `DataArray.to_*()` methods. Is this okay, or would `DataArray.convert()` or some other method name be better to avoid confusion?
**Units and IO**
While wrapping and unwrapping arrays with `pint` itself should be straightforward, I really don't know what the best API for it should be, especially for input.
Some possibilities that came to mind (by no means an exhaustive list):
- Leave open_dataset as it is now, but provide examples in the documentation for how to reconstruct a new Dataset with unit arrays (perhaps provide a boilerplate function or accessor)
- Add a kwarg like ""wrap_units"" to `open_dataset()` that accepts a quantity constructor (like `ureg.Quantity` in pint) that is applied within each variable
- Devise some generalized system for specifying the internal array structure in the opened dataset (to handle other duck array types, not just unit arrays)
With any of these, tests for lazy-loading would be crucial (I don't know yet how pint will handle that).
Output may be easier: I was thinking that unwrapping could be done implicitly by automatically putting `str(DataArray.units)` as the ""units"" attribute and replacing the unit array with its magnitude/value?
**Extra questions based on sparse implementation**
[`__repr__`](https://github.com/pydata/xarray/pull/3211)
Will a set of repr functions for each unit array type need to be added like they were for sparse in https://github.com/pydata/xarray/pull/3211? Or should there be some more general system implemented because of all of the possible combinations that would arise with other duck array types?
[`to_dense()`/`.to_numpy_data()`/`.to_numpy()`](https://github.com/pydata/xarray/issues/3245)
What is the expected behavior with unit arrays with regards to this soon-to-be-implemented conversion method?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,100295585
https://github.com/pydata/xarray/issues/525#issuecomment-514877824,https://api.github.com/repos/pydata/xarray/issues/525,514877824,MDEyOklzc3VlQ29tbWVudDUxNDg3NzgyNA==,3460034,2019-07-25T03:11:20Z,2019-07-25T03:11:20Z,CONTRIBUTOR,"Thank you for the insight!
So if I'm understanding things correctly as they stand now, dimension coordinates store their values internally as a `pandas.Index`, which would mean, to implement this directly, this becomes an upstream issue in pandas to allow a ndarray-like unit array inside a `pandas.Index`? Based on what I've seen on the pandas side, this looks far from straightforward.
With that in mind, would ""dimension coordinates with units"" (or more generally ""dimension coordinates with `__array_function__` implementers"") be another use case that best falls under flexible indices (#1603)?
(In the mean time, I would guess that the best workaround is using an accessor interface to handle unit-related operations on coordinates, since the `attrs` are preserved.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,100295585
https://github.com/pydata/xarray/issues/525#issuecomment-514452182,https://api.github.com/repos/pydata/xarray/issues/525,514452182,MDEyOklzc3VlQ29tbWVudDUxNDQ1MjE4Mg==,3460034,2019-07-24T02:19:08Z,2019-07-24T02:19:08Z,CONTRIBUTOR,"In light of the recent activity with `__array_function__` in #3117, I took a quick look to see if it worked with Pint as modified in https://github.com/hgrecco/pint/pull/764. The basics of sticking a Pint `Quantity` in a `DataArray` seem to work well, and the perhaps the greatest issues are on Pint's end...right now https://github.com/hgrecco/pint/pull/764 is limited in the functions it handles through `__array_function__`, and there are some quirks with operator precedence.
However, the other main problem was that coordinates did not work with `Quantity`'s. Looking again at https://github.com/pydata/xarray/issues/1938#issuecomment-510953379 and #2956, this is not surprising. I'm curious though about what it would take to let indexing work with Pint (or other unit arrays)? For most of my use cases (meteorological analysis as in MetPy), having units with coordinates is just as important as having units with the data itself. I'd be interested in helping implement it, but I would greatly appreciate some initial direction, since I'm new to that part of the xarray codebase.
Also, cc @keewis, since I saw in #2956 you have a [`unit-support`](https://github.com/keewis/xarray/tree/unit-support) branch that looks like it attempts to extend `NumpyIndexingAdapter` to work with unit arrays, but still has the coordinates-with-units tests marked as xfail.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,100295585