home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 524518305

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/525#issuecomment-524518305 https://api.github.com/repos/pydata/xarray/issues/525 524518305 MDEyOklzc3VlQ29tbWVudDUyNDUxODMwNQ== 3460034 2019-08-24T04:17:54Z 2019-08-24T04:17:54Z CONTRIBUTOR

With the progress being made with https://github.com/pydata/xarray/pull/2956, https://github.com/pydata/xarray/pull/3238, and https://github.com/hgrecco/pint/pull/764, I was thinking that now might be a good time to work out the details of the "minimal units layer" mentioned by @shoyer in https://github.com/pydata/xarray/issues/525#issuecomment-482641808 and https://github.com/pydata/xarray/issues/988#issuecomment-413732471?

I'd be glad to try putting together a PR that could follow up on https://github.com/pydata/xarray/pull/3238 for it, but I would want to ask for some guidance:

(For reference, below is the action list from https://github.com/pydata/xarray/issues/988#issuecomment-413732471)

  • The DataArray.units property could forward to DataArray.data.units.
  • A DataArray.to or DataArray.convert method could call the relevant method on data and re-wrap it in a DataArray.
  • A minimal layer on top of xarray's netCDF IO could handle unit attributes by wrapping/unwrapping arrays with pint.

DataArray.units

Having DataArray.units forward to DataArray.data.units should work for pint, unyt, and quantities, but should a fallback to DataArray.data.unit be added for astropy.units? Also, how should DataArray.units behave if DataArray.data does not have a "units" or "unit" attribute, but DataArray.attrs['units'] exists?

DataArray.to()/DataArray.convert()

DataArray.to() would be consistent with the methods for pint, unyt, and astropy.units (the relevant method for quantities looks to be .rescale()), however, it is very similar to the numerous output-related DataArray.to_*() methods. Is this okay, or would DataArray.convert() or some other method name be better to avoid confusion?

Units and IO

While wrapping and unwrapping arrays with pint itself should be straightforward, I really don't know what the best API for it should be, especially for input.

Some possibilities that came to mind (by no means an exhaustive list):

  • Leave open_dataset as it is now, but provide examples in the documentation for how to reconstruct a new Dataset with unit arrays (perhaps provide a boilerplate function or accessor)
  • Add a kwarg like "wrap_units" to open_dataset() that accepts a quantity constructor (like ureg.Quantity in pint) that is applied within each variable
  • Devise some generalized system for specifying the internal array structure in the opened dataset (to handle other duck array types, not just unit arrays)

With any of these, tests for lazy-loading would be crucial (I don't know yet how pint will handle that).

Output may be easier: I was thinking that unwrapping could be done implicitly by automatically putting str(DataArray.units) as the "units" attribute and replacing the unit array with its magnitude/value?

Extra questions based on sparse implementation

__repr__

Will a set of repr functions for each unit array type need to be added like they were for sparse in https://github.com/pydata/xarray/pull/3211? Or should there be some more general system implemented because of all of the possible combinations that would arise with other duck array types?

to_dense()/.to_numpy_data()/.to_numpy()

What is the expected behavior with unit arrays with regards to this soon-to-be-implemented conversion method?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  100295585
Powered by Datasette · Queries took 0.641ms · About: xarray-datasette