home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where issue = 100295585 and user = 3460034 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • jthielen · 7 ✖

issue 1

  • support for units · 7 ✖

author_association 1

  • CONTRIBUTOR 7
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
531603357 https://github.com/pydata/xarray/issues/525#issuecomment-531603357 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUzMTYwMzM1Nw== jthielen 3460034 2019-09-15T22:04:39Z 2019-09-15T22:04:39Z CONTRIBUTOR

Based the points raised by @crusaderky in https://github.com/hgrecco/pint/issues/878#issue-492678605 about how much special case handling xarray has for dask arrays, I was thinking recently about what it might take for the xarray > pint > dask.array wrapping discussed here and elsewhere to work as fluidly as xarray > dask.array currently does.

Would it help for this integration to have pint Quanitites implement the dask custom collections interface for when it wraps a dask array? I would think that this would allow a pint Quanitity to behave in a "dask-array-like" way rather than just an "array-like" way. Then, instead of xarray checking for isinstance(dask_array_type), it could for check for "duck dask arrays" (e.g., those with both __array_function__ and __dask_graph__)? There are almost certainly some subtle implementation details that would need to be worked out, but I'm guessing that this could take care of the bulk of the integration.

Also, if I'm incorrect with this line of thought, or there is a better way forward for implementing this wrapping pattern, please do let me know!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524570522 https://github.com/pydata/xarray/issues/525#issuecomment-524570522 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU3MDUyMg== jthielen 3460034 2019-08-24T18:12:55Z 2019-08-24T18:39:49Z CONTRIBUTOR

@shoyer I agree, the accessor interface makes a lot of sense for this: it's more conservative on the xarray side, while also giving the most flexibility for the pint + xarray integration.

Based on your feedback and what I'd hope to see out of the pint + xarray integration, I'm thinking a pint-adjacent package like pint-xarray may be the best route forward. ~~I'll create an issue on pint to inquire about that possibility.~~ See https://github.com/hgrecco/pint/issues/849.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524569782 https://github.com/pydata/xarray/issues/525#issuecomment-524569782 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU2OTc4Mg== jthielen 3460034 2019-08-24T18:00:37Z 2019-08-24T18:01:11Z CONTRIBUTOR

Oh, okay, having the fallback like that was how I thought about implementing it. (I'm sorry that I didn't describe that in my initial comment.)

So would the way forward be to implement DataArray.units_convert()/DataArray.units_to() and DataArray.units as you described right now, but wait for and/or delegate IO integration? Or, should there also be a fully-backwards-compatible IO integration implemented now (such as an optional kwarg on open_dataset and to_netcdf)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524568112 https://github.com/pydata/xarray/issues/525#issuecomment-524568112 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU2ODExMg== jthielen 3460034 2019-08-24T17:34:31Z 2019-08-24T17:34:31Z CONTRIBUTOR

@shoyer Thank you for the reply!

That sounds good about the repr custom logic.

With the units attribute, I was presuming based on the past comments that DataArray.units would be a new property; I forgot that DataArray.<attrname> passes along to DataArray.attrs.<attrname>, so that implementing something new for DataArray.units would be a breaking change! In trying to avoid such a change, though, I think it would be confusing to have a DataArray-level DataArray.units_convert method and not a corresponding DataArray-level way of getting at the units. So, would it be okay to just implement this unit interface (unit access, unit conversion, and IO) through an accessor, and start out with just a pint accessor? If so, where should it be implemented?

Possible ideas I had:

  • As a boilerplate example in the xarray documentation that downstream libraries or end-users can implement?
  • In xarray itself?
  • In pint or a new pint-adjacent package (similar to pint-pandas)?
  • A new xarray-adjacent package for general-purpose unit compatibility?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524518305 https://github.com/pydata/xarray/issues/525#issuecomment-524518305 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDUxODMwNQ== jthielen 3460034 2019-08-24T04:17:54Z 2019-08-24T04:17:54Z CONTRIBUTOR

With the progress being made with https://github.com/pydata/xarray/pull/2956, https://github.com/pydata/xarray/pull/3238, and https://github.com/hgrecco/pint/pull/764, I was thinking that now might be a good time to work out the details of the "minimal units layer" mentioned by @shoyer in https://github.com/pydata/xarray/issues/525#issuecomment-482641808 and https://github.com/pydata/xarray/issues/988#issuecomment-413732471?

I'd be glad to try putting together a PR that could follow up on https://github.com/pydata/xarray/pull/3238 for it, but I would want to ask for some guidance:

(For reference, below is the action list from https://github.com/pydata/xarray/issues/988#issuecomment-413732471)

  • The DataArray.units property could forward to DataArray.data.units.
  • A DataArray.to or DataArray.convert method could call the relevant method on data and re-wrap it in a DataArray.
  • A minimal layer on top of xarray's netCDF IO could handle unit attributes by wrapping/unwrapping arrays with pint.

DataArray.units

Having DataArray.units forward to DataArray.data.units should work for pint, unyt, and quantities, but should a fallback to DataArray.data.unit be added for astropy.units? Also, how should DataArray.units behave if DataArray.data does not have a "units" or "unit" attribute, but DataArray.attrs['units'] exists?

DataArray.to()/DataArray.convert()

DataArray.to() would be consistent with the methods for pint, unyt, and astropy.units (the relevant method for quantities looks to be .rescale()), however, it is very similar to the numerous output-related DataArray.to_*() methods. Is this okay, or would DataArray.convert() or some other method name be better to avoid confusion?

Units and IO

While wrapping and unwrapping arrays with pint itself should be straightforward, I really don't know what the best API for it should be, especially for input.

Some possibilities that came to mind (by no means an exhaustive list):

  • Leave open_dataset as it is now, but provide examples in the documentation for how to reconstruct a new Dataset with unit arrays (perhaps provide a boilerplate function or accessor)
  • Add a kwarg like "wrap_units" to open_dataset() that accepts a quantity constructor (like ureg.Quantity in pint) that is applied within each variable
  • Devise some generalized system for specifying the internal array structure in the opened dataset (to handle other duck array types, not just unit arrays)

With any of these, tests for lazy-loading would be crucial (I don't know yet how pint will handle that).

Output may be easier: I was thinking that unwrapping could be done implicitly by automatically putting str(DataArray.units) as the "units" attribute and replacing the unit array with its magnitude/value?

Extra questions based on sparse implementation

__repr__

Will a set of repr functions for each unit array type need to be added like they were for sparse in https://github.com/pydata/xarray/pull/3211? Or should there be some more general system implemented because of all of the possible combinations that would arise with other duck array types?

to_dense()/.to_numpy_data()/.to_numpy()

What is the expected behavior with unit arrays with regards to this soon-to-be-implemented conversion method?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
514877824 https://github.com/pydata/xarray/issues/525#issuecomment-514877824 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUxNDg3NzgyNA== jthielen 3460034 2019-07-25T03:11:20Z 2019-07-25T03:11:20Z CONTRIBUTOR

Thank you for the insight!

So if I'm understanding things correctly as they stand now, dimension coordinates store their values internally as a pandas.Index, which would mean, to implement this directly, this becomes an upstream issue in pandas to allow a ndarray-like unit array inside a pandas.Index? Based on what I've seen on the pandas side, this looks far from straightforward.

With that in mind, would "dimension coordinates with units" (or more generally "dimension coordinates with __array_function__ implementers") be another use case that best falls under flexible indices (#1603)?

(In the mean time, I would guess that the best workaround is using an accessor interface to handle unit-related operations on coordinates, since the attrs are preserved.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
514452182 https://github.com/pydata/xarray/issues/525#issuecomment-514452182 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUxNDQ1MjE4Mg== jthielen 3460034 2019-07-24T02:19:08Z 2019-07-24T02:19:08Z CONTRIBUTOR

In light of the recent activity with __array_function__ in #3117, I took a quick look to see if it worked with Pint as modified in https://github.com/hgrecco/pint/pull/764. The basics of sticking a Pint Quantity in a DataArray seem to work well, and the perhaps the greatest issues are on Pint's end...right now https://github.com/hgrecco/pint/pull/764 is limited in the functions it handles through __array_function__, and there are some quirks with operator precedence.

However, the other main problem was that coordinates did not work with Quantity's. Looking again at https://github.com/pydata/xarray/issues/1938#issuecomment-510953379 and #2956, this is not surprising. I'm curious though about what it would take to let indexing work with Pint (or other unit arrays)? For most of my use cases (meteorological analysis as in MetPy), having units with coordinates is just as important as having units with the data itself. I'd be interested in helping implement it, but I would greatly appreciate some initial direction, since I'm new to that part of the xarray codebase.

Also, cc @keewis, since I saw in #2956 you have a unit-support branch that looks like it attempts to extend NumpyIndexingAdapter to work with unit arrays, but still has the coordinates-with-units tests marked as xfail.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.686ms · About: xarray-datasette