home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

67 rows where user = 3460034 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

issue >30

  • tests for arrays with units 8
  • support for units 7
  • Support for duck Dask Arrays 6
  • Add ability to change underlying array type 4
  • Pint support for DataArray 3
  • Change isinstance checks to duck Dask Array checks #4208 3
  • Read grid mapping and bounds as coords 2
  • NEP 18, physical units, uncertainties, and the scipp library? 2
  • doc\environment.yml not found 2
  • Fix pint integration tests 2
  • constructing nested inline reprs 2
  • Faster unstacking 2
  • Problem opening unstructured grid ocean forecasts with 4D vertical coordinates 1
  • Picking up #1118: Do not convert subclasses of `ndarray` unless required 1
  • sparse and other duck array issues 1
  • .item() on a DataArray with dtype='datetime64[ns]' returns int 1
  • swap dimensions with multi-dimensional coordinates 1
  • Add glossary to documentation 1
  • Recommendations for domain-specific accessor documentation 1
  • Fix documentation 1
  • Pint support for variables 1
  • bump min deps for 0.15 1
  • Add public API for Dataset._copy_listed 1
  • removing uneccessary dimension 1
  • Subset from conditional coordinates 1
  • cache rasterio example files 1
  • decode_cf doesn't work for ancillary_variables in attributes 1
  • Unable to decode a date in nanoseconds 1
  • Add cupy support 1
  • setting variables named in CF attributes as coordinate variables 1
  • …

user 1

  • jthielen · 67 ✖

author_association 1

  • CONTRIBUTOR 67
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
890165626 https://github.com/pydata/xarray/issues/5648#issuecomment-890165626 https://api.github.com/repos/pydata/xarray/issues/5648 IC_kwDOAMm_X841Dtl6 jthielen 3460034 2021-07-30T21:34:32Z 2021-07-30T21:41:41Z CONTRIBUTOR

Count me in for the meeting!


Here are a few suggestions about possible topics to add to the agenda (based on linked issues/discussions), if we can fit it all in:

  • Canonical/minimal API of a "duck array" and how to detect it (though may be superseded by NEPs 30 and 47 among others)
  • Consistency of type deferral (i.e., between construction, binary ops, __array_ufunc__, __array_function__, and array modules...for example, these are uniform in Pint, but construction and array module functions are deliberately different from the others for Dask arrays)
  • API for inter-type casting and changing what types are used in a nested array (e.g. #3245 and #5568)
  • How to handle unknown duck arrays
  • Nested array reprs (both short and full)
  • Best practices for "carrying through" operations belonging to wrapped types (i.e., doing Dask-related things to a Pint Quantity or xarray DataArray that contains a Dask array), even if multiple layers deep

Also, tagging a few other array type libraries and maintainers/contributors who may be interested (please ping the relevant folks if you know them):

  • unyt (@ngoldbaum)
  • astropy.units (??)
  • NumPy masked arrays (??)
  • scipp (@SimonHeybrock, xref https://github.com/pydata/xarray/issues/3509)

(interesting side note is the first three of these are all ndarray subclasses right now...perhaps discussing the interplay between array subclassing and wrapping is in order too?)

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Duck array compatibility meeting 956103236
872441814 https://github.com/pydata/xarray/issues/5559#issuecomment-872441814 https://api.github.com/repos/pydata/xarray/issues/5559 MDEyOklzc3VlQ29tbWVudDg3MjQ0MTgxNA== jthielen 3460034 2021-07-01T17:57:32Z 2021-07-01T18:05:19Z CONTRIBUTOR

Is it correct that xarray ends up calling dask.array.mean() on the pint.Quantity(dask.Array) object inside the DataArray? I took a guess at that since I can replicate what is happening inside the DataArray with

```python import dask.array as da

da = xr.DataArray([1,2,3], attrs={'units': 'metres'})

chunked = da.chunk(1).pint.quantify()

da.mean(chunked.variable._data) ```

Also, the Dask warning Passing an object to dask.array.from_array which is already a Dask collection. This can lead to unexpected behavior. is a big red flag that the Pint Quantity is making its way into Dask internals where it should not end up.

If so, I think this gets into a thorny issue with duck array handling in Dask. It was decided in https://github.com/dask/dask/pull/6393 that deliberately calling Dask array operations like elemwise (so, presumably by extension, blockwise and the reductions in dask.array.reductions like mean()) on a non-Dask array implies that the user wants to turn that array into a dask array. This get problematic, however, for upcast types like Pint Quantities that wrap Dask Arrays, since then you can get dask.Array(pint.Quantity(dask.Array)), which is what I think is going on here?

If this all checks out, I believe this becomes a Dask issue to improve upcast type/duck Dask array handling.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  UserWarning when wrapping pint & dask arrays together 935062144
793204639 https://github.com/pydata/xarray/issues/5012#issuecomment-793204639 https://api.github.com/repos/pydata/xarray/issues/5012 MDEyOklzc3VlQ29tbWVudDc5MzIwNDYzOQ== jthielen 3460034 2021-03-09T00:38:37Z 2021-03-09T00:39:02Z CONTRIBUTOR

I'll leave the exact issue up to someone more acquainted with string and slice handling on TimedeltaIndex, but in case it is useful, here are two things I noticed (to hopefully speed along the troubleshooting):

1) Using pandas Timedeltas instead of strings (i.e., ds.sel(time=slice(pd.Timedelta("01:00:00"), pd.Timedelta("01:15:00")))) works, giving

<xarray.Dataset> Dimensions: (time: 39, yh: 1, zh: 1) Coordinates: xh float32 ... * yh (yh) float32 0.0 * zh (zh) float32 0.0 * time (time) timedelta64[ns] 01:00:00 01:07:30 ... 01:14:30 01:15:00 Data variables: (12/26) x (time) float32 ... y (time) float32 ... z (time) float32 ... u (time) float32 ... v (time) float32 ... w (time) float32 ... ... ... b (time) float32 ... vpg (time) float32 ... zvort (time) float32 ... rho (time) float32 ... qsl (time) float32 ... qsi (time) float32 ... Attributes: CM1 version: cm1r20.1 Conventions: CF-1.7 missing_value: -999999.9

However, notice that jump in the time coordinate...which leads to

2) In the failing example, the time coordinate is non-monotonic. If we .sortby('time'), then ds.sel(time=slice("01:00:00","01:15:00")) works as expected:

<xarray.Dataset> Dimensions: (time: 47, yh: 1, zh: 1) Coordinates: xh float32 1.0 * yh (yh) float32 0.0 * zh (zh) float32 0.0 * time (time) timedelta64[ns] 01:00:00 01:00:30 ... 01:15:00 01:15:30 Data variables: (12/26) x (time) float32 ... y (time) float32 ... z (time) float32 ... u (time) float32 ... v (time) float32 ... w (time) float32 ... ... ... b (time) float32 ... vpg (time) float32 ... zvort (time) float32 ... rho (time) float32 ... qsl (time) float32 ... qsi (time) float32 ... Attributes: CM1 version: cm1r20.1 Conventions: CF-1.7 missing_value: -999999.9

(the working example also has a monotonic time coordinate)

And so, I would guess this is an example of string-based timedelta selection failing on non-monotonic coordinates.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  trouble slicing on sub-minute dataset 825079209
789146762 https://github.com/pydata/xarray/pull/4979#issuecomment-789146762 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc4OTE0Njc2Mg== jthielen 3460034 2021-03-02T19:13:38Z 2021-03-02T19:13:38Z CONTRIBUTOR

It's great to be able to follow along with the discussion here! I'm definitely interested in seeing where the duck array index support ends up.

One use-case motivated question: the flexible indexes refactoring has also been pointed to as the resolution to https://github.com/pydata/xarray/issues/2233, where multidimensional coordinates have the same name as one of their dimensions. I wasn't quite able to tell through the narrative here if that has been addressed along the way yet or not ("A. only 1D coordinates with a name matching their dimension name" for implicit index creation does seem to get close though). So, would it be worth directly addressing https://github.com/pydata/xarray/issues/2233 here, or should that wait?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
782188310 https://github.com/pydata/xarray/issues/2233#issuecomment-782188310 https://api.github.com/repos/pydata/xarray/issues/2233 MDEyOklzc3VlQ29tbWVudDc4MjE4ODMxMA== jthielen 3460034 2021-02-19T16:34:23Z 2021-02-19T16:41:05Z CONTRIBUTOR

I've seen this issue come up a few more times recently, so I wanted to ask: in lieu of a "full fix" (a la https://github.com/pydata/xarray/pull/2405, with deep data model changes holding up the PR), would there be support for a partial coordinate-renaming-based fix? I'd imagine something like the following:

To read in netCDF like the following,

<class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4 data model, file format HDF5): Conventions: CDM-Extended-CF history: Written by CFPointWriter title: Extract Points data from Grid file /data/ldm/pub/native/grid/NCEP/RR/CONUS_13km/RR_CONUS_13km_20210219_1400.grib2.ncx3#LambertConformal_337X451-40p01N-98p14W featureType: timeSeriesProfile time_coverage_start: 2021-02-19T16:00:00Z time_coverage_end: 2021-02-19T16:00:00Z geospatial_lat_min: 42.0295 geospatial_lat_max: 42.0305 geospatial_lon_min: -93.6405 geospatial_lon_max: -93.6395 dimensions(sizes): profile(1), station(1), isobaric(37), station_name_strlen(10), station_description_strlen(34) variables(dimensions): float32 isobaric(station, profile, isobaric), float32 u-component_of_wind_isobaric(station, profile, isobaric), float32 v-component_of_wind_isobaric(station, profile, isobaric), float32 Temperature_isobaric(station, profile, isobaric), float32 Relative_humidity_isobaric(station, profile, isobaric), |S1 station_name(station, station_name_strlen), |S1 station_description(station, station_description_strlen), float64 latitude(station), float64 longitude(station), float64 time(station, profile) groups:

(note the problematic isobaric(station, profile, isobaric)), one could specify a kwarg to xr.open_dataset to rename it,

python ds = xr.open_dataset("my_problematic_file.nc", rename_vars={'isobaric': 'isobaric_coord'})

thus giving

<xarray.Dataset> Dimensions: (isobaric: 37, profile: 1, station: 1) Dimensions without coordinates: isobaric, profile, station Data variables: u-component_of_wind_isobaric (station, profile, isobaric) float32 ... v-component_of_wind_isobaric (station, profile, isobaric) float32 ... Temperature_isobaric (station, profile, isobaric) float32 ... Relative_humidity_isobaric (station, profile, isobaric) float32 ... station_name (station) |S10 ... station_description (station) |S34 ... latitude (station) float64 ... longitude (station) float64 ... time (station, profile) datetime64[ns] ... isobaric_coord (station, profile, isobaric) float32 ... Attributes: Conventions: CDM-Extended-CF history: Written by CFPointWriter title: Extract Points data from Grid file /data/ldm/pub/na... featureType: timeSeriesProfile time_coverage_start: 2021-02-19T16:00:00Z time_coverage_end: 2021-02-19T16:00:00Z geospatial_lat_min: 42.0295 geospatial_lat_max: 42.0305 geospatial_lon_min: -93.6405 geospatial_lon_max: -93.6395

{
    "total_count": 6,
    "+1": 6,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem opening unstructured grid ocean forecasts with 4D vertical coordinates 332471780
761843786 https://github.com/pydata/xarray/pull/2844#issuecomment-761843786 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDc2MTg0Mzc4Ng== jthielen 3460034 2021-01-17T16:58:34Z 2021-01-17T16:58:34Z CONTRIBUTOR

I unfortunately have not been following along with the recent developments in this PR, so these may have already been previously covered. Sorry if that is the case!

However, earlier on in the development of this PR, there were some substantial concerns about backwards compatibility (e.g., for libraries like MetPy that currently rely on grid_mapping and the like being in attrs) and the improved propagation of encoding through operations (so that moving these to encoding doesn't mean they are unnecessarily lost). What is the current status with regards to these?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
753426440 https://github.com/pydata/xarray/pull/4746#issuecomment-753426440 https://api.github.com/repos/pydata/xarray/issues/4746 MDEyOklzc3VlQ29tbWVudDc1MzQyNjQ0MA== jthielen 3460034 2021-01-02T03:49:28Z 2021-01-02T03:49:28Z CONTRIBUTOR

I think it is best to check against the global pint.Quantity (isinstance(self.data, pint.Quantity)). This should work to check for a Quantity from any unit registry (rather than the other methods which are registry-specific since a class builder is used).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Faster unstacking 777153550
753423841 https://github.com/pydata/xarray/pull/4746#issuecomment-753423841 https://api.github.com/repos/pydata/xarray/issues/4746 MDEyOklzc3VlQ29tbWVudDc1MzQyMzg0MQ== jthielen 3460034 2021-01-02T03:15:56Z 2021-01-02T03:15:56Z CONTRIBUTOR

@max-sixty That error seems to be arising because data is an numpy.ndarray and reordered is an xarray.Variable with a pint.Quantity inside, hence trying to assign a unit-aware array to an ndarray causes the units to be stripped.

Perhaps relevant to the discussion is this Pint issue https://github.com/hgrecco/pint/issues/882 where it was tentatively decided that Pint's wrapped implementation of np.full_like would base its units off of fill_value alone, so as it is now, I think _unstack_once would need to create a Quantity-wrapped nan in the units of self.data for Pint to create an Quantity filled with nans (if I'm interpreting the implementation here correctly and that is what is needed). This seems like something where units-on-the-dtype would be useful, but alas, things aren't there yet!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Faster unstacking 777153550
677698531 https://github.com/pydata/xarray/issues/4324#issuecomment-677698531 https://api.github.com/repos/pydata/xarray/issues/4324 MDEyOklzc3VlQ29tbWVudDY3NzY5ODUzMQ== jthielen 3460034 2020-08-20T14:25:39Z 2020-08-20T14:25:39Z CONTRIBUTOR

I brought this up at the meeting, and if I remember correctly, the recommendation was to take a step back and solve nested duck arrays in general (e.g. by writing a design doc – a NEP?). Correct me if I'm wrong, @shoyer, but I think the hope was that after that it would be easier to design nested reprs.

Sounds good! Would it make sense to discuss that over on https://github.com/dask/dask/issues/5329, the pre-existing issue for this larger problem of nested duck array reprs/meta?

Also, since _repr_inline_ is already implemented, I'd still like to see a basic version of https://github.com/xarray-contrib/pint-xarray/pull/22 merged in, so at least we have good results in the "easy" case of xarray > Pint > NumPy.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  constructing nested inline reprs 675342733
676633628 https://github.com/pydata/xarray/issues/3894#issuecomment-676633628 https://api.github.com/repos/pydata/xarray/issues/3894 MDEyOklzc3VlQ29tbWVudDY3NjYzMzYyOA== jthielen 3460034 2020-08-19T20:04:10Z 2020-08-19T20:24:38Z CONTRIBUTOR

Would this proposal mean that subsetting variables with __getitem__ would no longer work? If so, I'd make the humble request as a downstream user/library contributor for it to have a very generous deprecation period, since it is core functionality I rely on a lot (and include in many tutorials).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add public API for Dataset._copy_listed 588112617
674303204 https://github.com/pydata/xarray/pull/4340#issuecomment-674303204 https://api.github.com/repos/pydata/xarray/issues/4340 MDEyOklzc3VlQ29tbWVudDY3NDMwMzIwNA== jthielen 3460034 2020-08-14T22:56:01Z 2020-08-14T22:56:01Z CONTRIBUTOR

This looks like the same issue with 3.3.1: https://github.com/matplotlib/matplotlib/issues/18254? Ran into this in MetPy as well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pin matplotlib in ci/requirements/doc.yml 679433399
672077055 https://github.com/pydata/xarray/issues/4324#issuecomment-672077055 https://api.github.com/repos/pydata/xarray/issues/4324 MDEyOklzc3VlQ29tbWVudDY3MjA3NzA1NQ== jthielen 3460034 2020-08-11T16:36:17Z 2020-08-11T16:36:17Z CONTRIBUTOR

@benbovy These are some good suggestions about possible final products to aim towards in the implementation. Unfortunately, there are a couple problems to keep in mind:

  • The main impetus for custom inline text reprs was including units in the dataset overview, and so, that functionality needs to be kept available. Having a very compact description representing the duck array type would be insufficient.
  • Inspecting the full order of general nested duck arrays is non-trivial. Right now, every duck array that acts as a wrapper/container has its own way of referencing what it contains and combining its own reprs with what it contains. The simplest way for this to work (and how it works now) is to trust that each array layer is handling the next one below it properly without any information of the other layers beyond that. https://github.com/dask/dask/issues/5329 proposed establishing some shared way of organizing/managing all this nested duck array metadata, which almost certainly would be needed for anything more complex/robust.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  constructing nested inline reprs 675342733
671094846 https://github.com/pydata/xarray/pull/2844#issuecomment-671094846 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MTA5NDg0Ng== jthielen 3460034 2020-08-09T20:03:09Z 2020-08-09T20:03:09Z CONTRIBUTOR

For what it's worth, here's a bit of an elaboration on my view from https://github.com/pydata/xarray/issues/4215#issuecomment-656789017 and https://github.com/pydata/xarray/issues/4121#issuecomment-639758583:

ancillary_variables really fit a role of "non-coordinate linked variables," which doesn't have a good place in xarray's current data model. Both the current behavior (any links are completely lost if not manually handled) and linking as coordinates (conceptually confusing and causing spurious proliferation) are problematic. Short of a "linked" or "subsidiary" variable concept being added to xarray's data model (could be worth discussing on a new issue?), cf-xarray seems like the best place for managing ancillary_variables, either through - A method that extracts ancilliary_variables from a Dataset given a variable name (https://github.com/xarray-contrib/cf-xarray/issues/5#issue-628569748) - Parsing the ancilliary_variables attribute and replacing the string with a Dataset or list of DataArrays or Variables containing the actual ancillary variables (https://github.com/pydata/xarray/issues/4121#issue-630573329 and https://github.com/pydata/xarray/pull/2844#issuecomment-670991386) - A method or addition to __getitem__ on the accessor returning a Dataset with the linked variables included (https://github.com/pydata/xarray/pull/2844#issuecomment-670996109)

For those interested, the issue for ancillary_variables in cf-xarray is https://github.com/xarray-contrib/cf-xarray/issues/5.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
669287050 https://github.com/pydata/xarray/pull/4248#issuecomment-669287050 https://api.github.com/repos/pydata/xarray/issues/4248 MDEyOklzc3VlQ29tbWVudDY2OTI4NzA1MA== jthielen 3460034 2020-08-05T16:13:06Z 2020-08-05T16:13:25Z CONTRIBUTOR

I think _repr_*_ is the standard from IPython/Jupyter (https://ipython.readthedocs.io/en/stable/config/integrating.html), so _repr_short_ or _repr_inline_ is just following along with that.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  allow customizing the inline repr of a duck array 663825726
663121190 https://github.com/pydata/xarray/issues/4208#issuecomment-663121190 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY2MzEyMTE5MA== jthielen 3460034 2020-07-23T17:01:44Z 2020-07-23T17:03:20Z CONTRIBUTOR

In Xarray we implemented the Dask collection spec. https://docs.dask.org/en/latest/custom-collections.html#the-dask-collection-interface

We might want to do that with Pint as well, if they're going to contain Dask things. That way Dask operations like dask.persist, dask.visualize, and dask.compute will work normally.

That's exactly what's been done in Pint (see https://github.com/hgrecco/pint/pull/1129)! @dcherian's points go beyond just that and address what Pint hasn't covered yet through the standard collection interface.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
662602111 https://github.com/pydata/xarray/issues/3245#issuecomment-662602111 https://api.github.com/repos/pydata/xarray/issues/3245 MDEyOklzc3VlQ29tbWVudDY2MjYwMjExMQ== jthielen 3460034 2020-07-22T18:03:51Z 2020-07-22T18:03:51Z CONTRIBUTOR

One question about as_numpy: should it convert pint arrays into NumPy by stripping units? Or should it convert the arrays underlying pint and keep the units? I guess the first would probably make more sense for DataArray.as_numpy(). The second behavior could be achieved with DataArray.pint.as_numpy().

This brings up a point I was wondering about as well: how should these as_*() methods relate to multiply-nested arrays? (Not sure if best to discuss that here or new issue.)

{
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  sparse and other duck array issues 484240082
660157829 https://github.com/pydata/xarray/pull/4221#issuecomment-660157829 https://api.github.com/repos/pydata/xarray/issues/4221 MDEyOklzc3VlQ29tbWVudDY2MDE1NzgyOQ== jthielen 3460034 2020-07-17T15:04:50Z 2020-07-17T15:04:50Z CONTRIBUTOR

@rpmanser For what it's worth, I'd think doing tests like test_dask.py but with a post-https://github.com/hgrecco/pint/pull/1129 Pint Quantity or suitably mocked wrapper class between xarray and Dask would be best. But, I think this is more of a maintainer-level question.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Change isinstance checks to duck Dask Array checks #4208 656163384
660156875 https://github.com/pydata/xarray/issues/4234#issuecomment-660156875 https://api.github.com/repos/pydata/xarray/issues/4234 MDEyOklzc3VlQ29tbWVudDY2MDE1Njg3NQ== jthielen 3460034 2020-07-17T15:03:01Z 2020-07-17T15:03:01Z CONTRIBUTOR

The accessors could also be where wrapped xarray methods could go (so in your previous example, it could be time_slice.cupy.plot() to handle the NumPy conversion internally), but based on your explanations I'd agree that most of this makes more sense to be compatibility code that goes in xarray itself.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add ability to change underlying array type 659129613
660148938 https://github.com/pydata/xarray/issues/4234#issuecomment-660148938 https://api.github.com/repos/pydata/xarray/issues/4234 MDEyOklzc3VlQ29tbWVudDY2MDE0ODkzOA== jthielen 3460034 2020-07-17T14:48:59Z 2020-07-17T14:48:59Z CONTRIBUTOR

@jacobtomlinson Makes complete sense! Just wanted to make sure the option was considered as a possibility.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add ability to change underlying array type 659129613
660142143 https://github.com/pydata/xarray/issues/4234#issuecomment-660142143 https://api.github.com/repos/pydata/xarray/issues/4234 MDEyOklzc3VlQ29tbWVudDY2MDE0MjE0Mw== jthielen 3460034 2020-07-17T14:36:40Z 2020-07-17T14:36:40Z CONTRIBUTOR

@jacobtomlinson Indeed! A similar to_cupy could also be made for existing DataArrays on a DataArray accessor, and any other CuPy-related attributes/methods could be attached to the same accessors as well.

Though, to have those accessors be registered with just importing xarray and cupy, they would of course need to live in either xarray or cupy. Not sure if that or an xarray-adjacent package (cupy-xarray?) is better.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add ability to change underlying array type 659129613
660129151 https://github.com/pydata/xarray/issues/4234#issuecomment-660129151 https://api.github.com/repos/pydata/xarray/issues/4234 MDEyOklzc3VlQ29tbWVudDY2MDEyOTE1MQ== jthielen 3460034 2020-07-17T14:12:23Z 2020-07-17T14:13:17Z CONTRIBUTOR

@jacobtomlinson Have you considered implementing an accessor for this and any other CuPy-specific functionality? This is the approach we are taking with Pint integration (pint-xarray), and I wonder if it might also be a good approach for CuPy.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add ability to change underlying array type 659129613
658140708 https://github.com/pydata/xarray/pull/4221#issuecomment-658140708 https://api.github.com/repos/pydata/xarray/issues/4221 MDEyOklzc3VlQ29tbWVudDY1ODE0MDcwOA== jthielen 3460034 2020-07-14T12:03:05Z 2020-07-14T12:03:19Z CONTRIBUTOR

Revisiting it I now realize that is_array_like is actually not a good name for that function: officially, "array_like" means that it can be casted using np.asarray. Maybe we should rename that to is_duck_array? But then we'd also need to check for __array_ufunc__ and __array_function__ because we also need those to properly wrap duck arrays.

As long as that doesn't break any of the current uses, I think that would be the best way forwards. This would require xarray to be on NumPy 1.16+ (in order to ensure hasattr(np.ndarray, "__array_function__"), but that's only 9 days away by NEP 29, so hopefully that isn't too much of an issue. Then, most of the checks for "can this be wrapped by xarray" should be able to become something like

python is_duck_array(x) and not isinstance(x, (DataArray, Variable)) with is_duck_dask_array only being needed for checking Dask Array-like support.

Although, while we're at it, do we also need more careful handling of upcast types? I'm not sure if there even are any out there right now (not sure if a HoloViews Dataset counts here), but that doesn't necessarily mean there never will be any.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Change isinstance checks to duck Dask Array checks #4208 656163384
657895835 https://github.com/pydata/xarray/pull/4221#issuecomment-657895835 https://api.github.com/repos/pydata/xarray/issues/4221 MDEyOklzc3VlQ29tbWVudDY1Nzg5NTgzNQ== jthielen 3460034 2020-07-14T00:22:09Z 2020-07-14T00:22:09Z CONTRIBUTOR

Also, a broader discussion that's I've seen hinted to in the past, but is brought to the forefront by this PR: how should xarray be checking for general duck array types it can wrap?

Right now, it looks like there is a mix of

python is_array_like(data) and python hasattr(data, "__array_function__") or isinstance(data, dask_array_type)

and it would be nice to bring consistency. However, both these checks seem like they'd let too many types through. For example, is_array_like as currently implemented would still let through too much, such as MemoryCachedArray and xarray Variables/DataArrays, and __array_function__ doesn't truly indicate a NumPy duck array on its own (and it really only works to only capture downcast types right now since xarray does not yet implement __array_function__ itself, #3917).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Change isinstance checks to duck Dask Array checks #4208 656163384
657151954 https://github.com/pydata/xarray/issues/4212#issuecomment-657151954 https://api.github.com/repos/pydata/xarray/issues/4212 MDEyOklzc3VlQ29tbWVudDY1NzE1MTk1NA== jthielen 3460034 2020-07-12T00:14:36Z 2020-07-12T00:35:25Z CONTRIBUTOR

@jacobtomlinson Any idea how this would play with the work that's been going on for units here; I'm specifically wondering if xarray ( pint ( cupy )) would/could work.

As far as I'd see it, the pieces to get this working are

  • xarray( pint ) (practically ready, just working out indexes and corner cases: #3594)
  • pint( cupy ) (https://github.com/hgrecco/pint/issues/964)
  • xarray( cupy ) (this PR)

and then finally testing xarray( pint( cupy )) works automatically from there. https://github.com/hgrecco/pint/issues/964 was deferred due to CI/testing concerns, so it will be great to see what @jacobtomlinson can come up with here for xarray, since hopefully at some point it would be transferable over to pint as well.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add cupy support 654135405
657152588 https://github.com/pydata/xarray/issues/4208#issuecomment-657152588 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NzE1MjU4OA== jthielen 3460034 2020-07-12T00:19:53Z 2020-07-12T00:19:53Z CONTRIBUTOR

Does/should any of this also consider #4212 (CuPy)?

Only indirectly, since this deals with duck Dask arrays (things like Pint that go between xarray and Dask) rather than Dask chunks, which CuPy would be. But, once this, #4212, https://github.com/hgrecco/pint/issues/964, and https://github.com/dask/dask/pull/6393 are all in place, then we can test if xarray( pint( dask( cupy ))) works automatically from it all or not.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
656250148 https://github.com/pydata/xarray/issues/4208#issuecomment-656250148 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NjI1MDE0OA== jthielen 3460034 2020-07-09T17:17:34Z 2020-07-10T17:47:16Z CONTRIBUTOR

Based on @mrocklin's comment in https://github.com/dask/dask/issues/6385, the plan will be to check for duck Dask Arrays with dask.base.is_dask_collection along with xarray's previously used duck array check. This works properly with Pint Quantities as implemented in https://github.com/hgrecco/pint/pull/1129 (returning True if the Pint Quantity contains a Dask Array, and False if it does not).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
656789017 https://github.com/pydata/xarray/issues/4215#issuecomment-656789017 https://api.github.com/repos/pydata/xarray/issues/4215 MDEyOklzc3VlQ29tbWVudDY1Njc4OTAxNw== jthielen 3460034 2020-07-10T17:17:59Z 2020-07-10T17:41:31Z CONTRIBUTOR

I agree with #3689 that it makes the most sense to have decode_coords set those variables referenced in bounds as coordinates. By the same reasoning, I would think the other special variable-linked attrs of CF Section 7 should be treated similarly:

  • cell_measures
  • climatology
  • geometry
  • node_coordinates
  • node_count
  • part_node_count
  • interior_ring

grid_mapping and ancillary_variables were also brought up. grid_mapping definitely makes sense to be interpreted as a coordinate variable, since it is inherently linked to the CRS of the data. I would say no however to ancillary_variables, since those are not really about coordinates and instead about linked data variables (like uncertainties).

My one concern with #2844 is clarifying the role of encoding vs. attrs. I don't have any good conclusions about it, but I'd want to be very cautious about not having these "links" defined by the CF conventions disappear unexpectedly because they were decoded by decode_coords, moved to encoding, and then erased due to some xarray operation clearing encoding on the returned data. I'd hope to keep them around in some fashion so that they are still usable by libraries like cf-xarray and MetPy, among others.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  setting variables named in CF attributes as coordinate variables 654889988
656358719 https://github.com/pydata/xarray/issues/4208#issuecomment-656358719 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NjM1ODcxOQ== jthielen 3460034 2020-07-09T21:24:42Z 2020-07-09T21:26:37Z CONTRIBUTOR

@rpmanser That sounds like a good plan to me at least, but it would be great if any of the xarray maintainers would be able to chime in. Also, thanks again for being willing to work on this while I try working on https://github.com/dask/dask/issues/4583. The hidden 4th step is of course testing--primarily that this doesn't break existing functionality, but also that it works for duck Dask Arrays other than Dask Arrays themselves (not sure if Pint Quantities in upcoming v0.15 or a mocked class would be better).

Also, thank you @dcherian for pointing out those checks, you found them faster than I did!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
656119415 https://github.com/pydata/xarray/issues/4208#issuecomment-656119415 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NjExOTQxNQ== jthielen 3460034 2020-07-09T13:13:43Z 2020-07-09T13:13:43Z CONTRIBUTOR

I think there are already enough headaches with __iter__ being always defined and confusing libraries such as pandas (hgrecco/pint#1128). I don't see why pint should be explicitly aware of dask (except in unit tests)? It should only deal with generic NEP18-compatible libraries (numpy, dask, sparse, cupy, etc.).

Since Pint wraps Dask, in order to leverage Dask Array functionality on Pint Quantities, we need to have the Dask collection interface available. In a sense, Pint needs special handling for Dask like xarray Variables do since they both can be upcast types of Dask Array. Implicitly passing through attributes (how Pint handles special methods/attributes of downcast types in general) from the wrapped Dask Array is not sufficient, however, because the finalizers have to rewrap with Quantity (see https://github.com/hgrecco/pint/pull/1129/files#diff-d9924213798d0fc092b8cff13928d747R1947-R1950), hence the explicit awareness of Dask being needed in Pint.

We should ask the dask team to formalize what defines a "dask-array-like", like they already did with dask collections, and implement their definition in xarray.

Done! See https://github.com/dask/dask/issues/6385.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
655817246 https://github.com/pydata/xarray/issues/4208#issuecomment-655817246 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NTgxNzI0Ng== jthielen 3460034 2020-07-08T23:57:36Z 2020-07-08T23:59:28Z CONTRIBUTOR

Maybe something like this would work?

def is_duck_dask_array(x): return getattr(x, 'chunks', None) is not None

xarray.DataArray would pass this test (chunks is either None for non-dask arrays or a tuple for dask arrays), so this would be consistent with what we already do.

That would be a straightforward solution to both problems! A Pint Quantity containing a Dask Array passes along the chunks attribute from the Dask Array, and a Pint Quantity containing something else will raise an AttributeError. Unless there are other objections, I'll see what it will take to swap out the existing Dask checks for this in the xarray internals and hopefully get around to a PR (after I get some MetPy stuff done first).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
651851046 https://github.com/pydata/xarray/issues/4183#issuecomment-651851046 https://api.github.com/repos/pydata/xarray/issues/4183 MDEyOklzc3VlQ29tbWVudDY1MTg1MTA0Ng== jthielen 3460034 2020-06-30T15:01:27Z 2020-06-30T15:01:27Z CONTRIBUTOR

We would absolutely accept a pull request for this, but I'm not 100% sure that nanoseconds are valid per CF conventions. CF conventions references udunits.dat, which as far as I can tell includes no units smaller than 1 second: https://www.unidata.ucar.edu/software/udunits/udunits-1/udunits.txt

Nanoseconds (and seconds with any other SI prefix from yocto to yotta) are indeed valid in the CF conventions. For reference, there is this table in v1.8 of the conventions, as well as the SI prefix and SI base unit databases of UDUNITS.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unable to decode a date in nanoseconds 646038170
639758583 https://github.com/pydata/xarray/issues/4121#issuecomment-639758583 https://api.github.com/repos/pydata/xarray/issues/4121 MDEyOklzc3VlQ29tbWVudDYzOTc1ODU4Mw== jthielen 3460034 2020-06-05T19:52:21Z 2020-06-05T19:52:21Z CONTRIBUTOR

I think this could be a good fit for cf-xarray (see https://github.com/xarray-contrib/cf-xarray/issues/5). Although, with that in mind, perhaps a broader discussion on what CF features best fit in decode_cf vs. cf-xarray would likely be useful at some point?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  decode_cf doesn't work for ancillary_variables in attributes 630573329
635000563 https://github.com/pydata/xarray/pull/4102#issuecomment-635000563 https://api.github.com/repos/pydata/xarray/issues/4102 MDEyOklzc3VlQ29tbWVudDYzNTAwMDU2Mw== jthielen 3460034 2020-05-27T23:39:36Z 2020-05-27T23:39:55Z CONTRIBUTOR

For these kind of cached files, would it be worth using something like Pooch for xarray?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cache rasterio example files 626083981
620797019 https://github.com/pydata/xarray/issues/4013#issuecomment-620797019 https://api.github.com/repos/pydata/xarray/issues/4013 MDEyOklzc3VlQ29tbWVudDYyMDc5NzAxOQ== jthielen 3460034 2020-04-28T19:03:33Z 2020-04-28T19:03:33Z CONTRIBUTOR

Does the drop argument on the .where() method satisfy your needs? For example,

python ds.where((ds.y >= 30) & (ds.y <= 50), drop=True)

reduces the shape of the foo variable to (4, 3), as desired.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Subset from conditional coordinates 608536405
610516526 https://github.com/pydata/xarray/issues/3946#issuecomment-610516526 https://api.github.com/repos/pydata/xarray/issues/3946 MDEyOklzc3VlQ29tbWVudDYxMDUxNjUyNg== jthielen 3460034 2020-04-07T17:22:49Z 2020-04-07T17:22:49Z CONTRIBUTOR

If allowing for some degree of tolerance, something like this would also be quite useful in geographic coordinate transformations when going from 2D lon/lat auxiliary coordinates to 1D x/y dimension coordinates. Here's an example of what we're currently doing in MetPy for this: https://github.com/Unidata/MetPy/blob/3aa0118ffbda48be2a426dee956183b7cef81f0c/src/metpy/xarray.py#L907-L947

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  removing uneccessary dimension 595813283
610159680 https://github.com/pydata/xarray/pull/3643#issuecomment-610159680 https://api.github.com/repos/pydata/xarray/issues/3643 MDEyOklzc3VlQ29tbWVudDYxMDE1OTY4MA== jthielen 3460034 2020-04-07T03:58:05Z 2020-04-07T03:58:05Z CONTRIBUTOR

Also, after glancing through all this, it seems like xarray is dealing inconsistently with the type casting hierarchy:

  • Construction/wrapping allowing __array_function__-defining duck arrays among other explictly listed additions (pandas.Index, dask arrays, etc., see https://github.com/pydata/xarray/blob/9b5140e0711247c373987b56726282140b406d7f/xarray/core/variable.py#L166)
  • Binary ops only denying based on xarray's internal hierarchy (Dataset, DataArray, Variable), otherwise deferring to underlying data
  • __array_ufunc__ allowing a list of supported types (https://github.com/pydata/xarray/blob/9b5140e0711247c373987b56726282140b406d7f/xarray/core/arithmetic.py#L24-L30 along with SupportsArithmetic)
  • (__array_function__ yet to be defined)

Should this be its own issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pint support for DataArray 539988974
610159608 https://github.com/pydata/xarray/pull/3643#issuecomment-610159608 https://api.github.com/repos/pydata/xarray/issues/3643 MDEyOklzc3VlQ29tbWVudDYxMDE1OTYwOA== jthielen 3460034 2020-04-07T03:57:45Z 2020-04-07T03:57:45Z CONTRIBUTOR

@keewis Is part of the specific problem that Pint has a guarded import of xarray when defining its upcast types? Would it help if it checked for the fully qualified class name instead?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pint support for DataArray 539988974
583562118 https://github.com/pydata/xarray/pull/3706#issuecomment-583562118 https://api.github.com/repos/pydata/xarray/issues/3706 MDEyOklzc3VlQ29tbWVudDU4MzU2MjExOA== jthielen 3460034 2020-02-07T19:17:08Z 2020-02-07T19:17:08Z CONTRIBUTOR

@keewis For what it is worth, as far as identical goes, I think it makes the most sense to treat unit matching like dtype matching as @shoyer mentioned. Although, I had interpreted @max-sixty's comment https://github.com/pydata/xarray/pull/3238#issuecomment-533181017 to mean that dtypes are compared, it appears from @shoyer's comment https://github.com/pydata/xarray/pull/3706#issuecomment-583259053 that this not the case. If strict unit checking is required, I think that may be better served by an additional assert unit == "meter" type statement.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pint support for variables 551680561
576782065 https://github.com/pydata/xarray/pull/3713#issuecomment-576782065 https://api.github.com/repos/pydata/xarray/issues/3713 MDEyOklzc3VlQ29tbWVudDU3Njc4MjA2NQ== jthielen 3460034 2020-01-21T17:08:39Z 2020-01-21T17:08:39Z CONTRIBUTOR

looks like pint doesn't have nanquantile?

@dcherian Hope you don't mind me jumping in, but that is correct at the moment (see https://github.com/hgrecco/pint/issues/981). It would be easy to add to Pint, though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  bump min deps for 0.15 552994673
573102940 https://github.com/pydata/xarray/issues/3509#issuecomment-573102940 https://api.github.com/repos/pydata/xarray/issues/3509 MDEyOklzc3VlQ29tbWVudDU3MzEwMjk0MA== jthielen 3460034 2020-01-10T16:21:08Z 2020-01-10T16:22:20Z CONTRIBUTOR

@SimonHeybrock So sorry I neglected to reply back in November! https://github.com/hgrecco/pint/issues/982 and https://github.com/hgrecco/pint/issues/918 pinged my recollection of this issue. In short, Pint actually doesn't currently support uncertainties with arrays, only scalars, so my earlier wrapping comment was mistaken. So, moving forward with __array_function__ in Scipp in order to integrate it as a "duck array type" within xarray might be a good way to get physical units and uncertainties working together in xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NEP 18, physical units, uncertainties, and the scipp library? 520815068
569166918 https://github.com/pydata/xarray/pull/3643#issuecomment-569166918 https://api.github.com/repos/pydata/xarray/issues/3643 MDEyOklzc3VlQ29tbWVudDU2OTE2NjkxOA== jthielen 3460034 2019-12-27T02:09:45Z 2019-12-27T02:09:45Z CONTRIBUTOR

@jthielen, it looks like this needs numpy.pad and numpy.resize in pint. I tried to implement it myself, but while numpy.resize is easy (add an entry to the implement_consistent_units_by_argument loop), that is not the case for numpy.pad. Regardless of whether my proposed fixes are actually the best way, numpy.pad is used in other places of the code base so we'd probably need both, anyway. What do you think?

No problem at all to add these! See https://github.com/hgrecco/pint/pull/956.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pint support for DataArray 539988974
564633712 https://github.com/pydata/xarray/issues/3256#issuecomment-564633712 https://api.github.com/repos/pydata/xarray/issues/3256 MDEyOklzc3VlQ29tbWVudDU2NDYzMzcxMg== jthielen 3460034 2019-12-11T16:50:33Z 2019-12-11T16:50:33Z CONTRIBUTOR

Would it make sense to add something to the dt accessor, such as_datetime? @dopplershift's example would still be somewhat verbose, but it seems more sensible:

python times[0].dt.as_datetime().item()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  .item() on a DataArray with dtype='datetime64[ns]' returns int 484699415
562905492 https://github.com/pydata/xarray/pull/3600#issuecomment-562905492 https://api.github.com/repos/pydata/xarray/issues/3600 MDEyOklzc3VlQ29tbWVudDU2MjkwNTQ5Mg== jthielen 3460034 2019-12-08T02:15:04Z 2019-12-08T02:15:04Z CONTRIBUTOR

cumprod fails using xarray's structures (works fine with Quantity), so the xfail is still necessary. I also didn't isolate the dims from any tests other than align and I think we still need at least one more iteration to get full isolation (I'd like to leave that to a new PR, though -- maybe when we attempt to silence the UnitStrippedWarnings?).

Sounds like a good plan.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix pint integration tests 533580931
562348563 https://github.com/pydata/xarray/pull/3600#issuecomment-562348563 https://api.github.com/repos/pydata/xarray/issues/3600 MDEyOklzc3VlQ29tbWVudDU2MjM0ODU2Mw== jthielen 3460034 2019-12-05T22:32:44Z 2019-12-05T22:32:44Z CONTRIBUTOR

In general, this looks good!

After running through this, a couple thoughts did come to mind:

  • Do you have tests of align/combine_*/concat that have Quantity data but no Quantity coordinates? I think these could help isolate the coordinate/indexing issue from the any other operation-related issue.
  • Would you be able to alter the cumprod tests to test a dimensionless Quantity? np.cumprod is only supported (and only really makes sense) for dimensionless Quantities
  • Could you change the xfail reason for matrix multiplication to pint not supporting einsum? __matmul__ has been implemented in https://github.com/hgrecco/pint/pull/905, but xarray's __matmul__ appears to route through np.einsum.
  • Would it make sense to remove the np.rank test as discussed here: https://github.com/hgrecco/pint/pull/905#issuecomment-562109451?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix pint integration tests 533580931
556478921 https://github.com/pydata/xarray/issues/3556#issuecomment-556478921 https://api.github.com/repos/pydata/xarray/issues/3556 MDEyOklzc3VlQ29tbWVudDU1NjQ3ODkyMQ== jthielen 3460034 2019-11-20T22:21:48Z 2019-11-20T22:21:48Z CONTRIBUTOR

@r-beer That's strange, could it be a browser cache issue? I took the screenshot from the stable version of the docs (http://xarray.pydata.org/en/stable/contributing.html#requirements).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  doc\environment.yml not found 526243198
556475826 https://github.com/pydata/xarray/pull/3554#issuecomment-556475826 https://api.github.com/repos/pydata/xarray/issues/3554 MDEyOklzc3VlQ29tbWVudDU1NjQ3NTgyNg== jthielen 3460034 2019-11-20T22:18:54Z 2019-11-20T22:18:54Z CONTRIBUTOR

But maybe that's too aggressive. Any thoughts anyone?

I'd like to have documentation available for anything in the public API, so that it can be a useful reference for both previously existing and new code. I think for deprecated functionality it makes the most sense to keep the docs, but warn prominently that it should not be used in new code (e.g. https://unidata.github.io/MetPy/dev/api/generated/metpy.calc.html#deprecated)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix documentation 525972395
556469727 https://github.com/pydata/xarray/issues/3556#issuecomment-556469727 https://api.github.com/repos/pydata/xarray/issues/3556 MDEyOklzc3VlQ29tbWVudDU1NjQ2OTcyNw== jthielen 3460034 2019-11-20T22:13:15Z 2019-11-20T22:13:15Z CONTRIBUTOR

I think this was already updated in https://github.com/pydata/xarray/pull/3410? Here's what I see in the contributing guidelines:

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  doc\environment.yml not found 526243198
552594045 https://github.com/pydata/xarray/issues/3509#issuecomment-552594045 https://api.github.com/repos/pydata/xarray/issues/3509 MDEyOklzc3VlQ29tbWVudDU1MjU5NDA0NQ== jthielen 3460034 2019-11-11T20:06:49Z 2019-11-11T20:06:49Z CONTRIBUTOR

With regards to physical units (and to a lesser extent propagation of uncertainties), this would have overlap with pint. Efforts have been ongoing towards integration with xarray through NEP-18 (xref https://github.com/pydata/xarray/issues/525, https://github.com/hgrecco/pint/pull/764, https://github.com/hgrecco/pint/issues/845, https://github.com/hgrecco/pint/issues/849, as well as https://github.com/pydata/xarray/pull/3238 and following test implementation PRs), but are still not quite there yet...hopefully very soon though!

Would you be able to describe any advantages/disadvantages you would see with xarray wrapping scipp, versus something like xarray > pint > uncertainties > numpy?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NEP 18, physical units, uncertainties, and the scipp library? 520815068
537564221 https://github.com/pydata/xarray/issues/3361#issuecomment-537564221 https://api.github.com/repos/pydata/xarray/issues/3361 MDEyOklzc3VlQ29tbWVudDUzNzU2NDIyMQ== jthielen 3460034 2019-10-02T16:05:57Z 2019-10-02T16:05:57Z CONTRIBUTOR

@gmaze Just as an example, here is what we recently added for MetPy: https://unidata.github.io/MetPy/latest/api/generated/metpy.xarray.html. Previously, we just had a narrative-form tutorial.

If there are other recommendations, it would be great to incorporate those into MetPy as well!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Recommendations for domain-specific accessor documentation 500949040
536350457 https://github.com/pydata/xarray/pull/3352#issuecomment-536350457 https://api.github.com/repos/pydata/xarray/issues/3352 MDEyOklzc3VlQ29tbWVudDUzNjM1MDQ1Nw== jthielen 3460034 2019-09-29T23:05:46Z 2019-09-29T23:05:46Z CONTRIBUTOR

Since the current wording of the "Coordinate" definition seems to imply only one dimension, should the multidimensional coordinate issue that remains be cleared up here or in a follow-up PR? If it's a follow-up, I'd be glad to submit one for it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add glossary to documentation 499807676
532383428 https://github.com/pydata/xarray/pull/3238#issuecomment-532383428 https://api.github.com/repos/pydata/xarray/issues/3238 MDEyOklzc3VlQ29tbWVudDUzMjM4MzQyOA== jthielen 3460034 2019-09-17T20:14:33Z 2019-09-17T20:14:33Z CONTRIBUTOR

@keewis Thank you for catching this! I've pushed an update to https://github.com/andrewgsavage/pint/pull/6 with your suggested changes. Raising an error on incompatible/missing units is definitely something that will be need to be added, but it may take some re-thinking the current implementations and how convert_to_consistent_units is used. My instinct is to auto-wrap non-Quantities (of known types on the type casting hierarchy) as dimensionless (as in https://github.com/hgrecco/pint/pull/764#issuecomment-523272038), and let pint's usual unit comparison checks handle the rest. But, knowing exactly what to do here and when may be something that would have to wait until some other discussions with pint are cleared up a little more (like https://github.com/hgrecco/pint/pull/764 vs. https://github.com/hgrecco/pint/pull/875 and https://github.com/hgrecco/pint/issues/845 / https://github.com/hgrecco/pint/issues/878)?

I'm thinking that discussion regarding bugs with pint's upcoming __array_function__ may fit best on https://github.com/andrewgsavage/pint/pull/6 right now, since https://github.com/hgrecco/pint/pull/764 seems to have halted in favor of https://github.com/hgrecco/pint/pull/875? (That is, until a new PR is made that follows up on https://github.com/hgrecco/pint/pull/875 to replicate what was done in https://github.com/hgrecco/pint/pull/764 and https://github.com/andrewgsavage/pint/pull/6.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tests for arrays with units 484015016
531606320 https://github.com/pydata/xarray/pull/3238#issuecomment-531606320 https://api.github.com/repos/pydata/xarray/issues/3238 MDEyOklzc3VlQ29tbWVudDUzMTYwNjMyMA== jthielen 3460034 2019-09-15T22:51:41Z 2019-09-15T22:51:41Z CONTRIBUTOR

@keewis Thank you for pointing that out, I forgot to mention that right now mixed types are not handled by https://github.com/hgrecco/pint/pull/764 (see https://github.com/hgrecco/pint/pull/764#issuecomment-523272038).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tests for arrays with units 484015016
531603936 https://github.com/pydata/xarray/pull/3238#issuecomment-531603936 https://api.github.com/repos/pydata/xarray/issues/3238 MDEyOklzc3VlQ29tbWVudDUzMTYwMzkzNg== jthielen 3460034 2019-09-15T22:15:16Z 2019-09-15T22:15:53Z CONTRIBUTOR

@keewis My inclination is to think of the units as part of the data, and that, for example, zeros_like returning an array of bare zeros instead of zeros-with-units is reasonably intuitive. But since this is likely not everyone's perspective, I think raising an issue on pint's end would be good.

I'm not sure about the indexing behavior. From the sounds of your prior comment, it works because it functions as an "object-type" index? If so, it may cause a decent hit on performance. I'd definitely want to hear others' thoughts on it too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tests for arrays with units 484015016
531603357 https://github.com/pydata/xarray/issues/525#issuecomment-531603357 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUzMTYwMzM1Nw== jthielen 3460034 2019-09-15T22:04:39Z 2019-09-15T22:04:39Z CONTRIBUTOR

Based the points raised by @crusaderky in https://github.com/hgrecco/pint/issues/878#issue-492678605 about how much special case handling xarray has for dask arrays, I was thinking recently about what it might take for the xarray > pint > dask.array wrapping discussed here and elsewhere to work as fluidly as xarray > dask.array currently does.

Would it help for this integration to have pint Quanitites implement the dask custom collections interface for when it wraps a dask array? I would think that this would allow a pint Quanitity to behave in a "dask-array-like" way rather than just an "array-like" way. Then, instead of xarray checking for isinstance(dask_array_type), it could for check for "duck dask arrays" (e.g., those with both __array_function__ and __dask_graph__)? There are almost certainly some subtle implementation details that would need to be worked out, but I'm guessing that this could take care of the bulk of the integration.

Also, if I'm incorrect with this line of thought, or there is a better way forward for implementing this wrapping pattern, please do let me know!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
531600938 https://github.com/pydata/xarray/pull/3238#issuecomment-531600938 https://api.github.com/repos/pydata/xarray/issues/3238 MDEyOklzc3VlQ29tbWVudDUzMTYwMDkzOA== jthielen 3460034 2019-09-15T21:27:27Z 2019-09-15T21:27:27Z CONTRIBUTOR

@keewis In https://github.com/andrewgsavage/pint/pull/6, I implemented zeros_like, ones_like and empty_like to return base ndarrays rather than quantities, and full_like to have the unit of fill_value. This seemed like the most sensible behavior to me, since it avoids the ambiguities you mention in the full_like case if they were based on the unit of other, and for many of my own use cases, I've often wanted a zeros/ones array with a different unit than the quantity whose shape I am basing it off of.

Does this behavior seem reasonable to you? Also, would this be something that should be cleared up with an issue on pint's end?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tests for arrays with units 484015016
530242012 https://github.com/pydata/xarray/issues/3299#issuecomment-530242012 https://api.github.com/repos/pydata/xarray/issues/3299 MDEyOklzc3VlQ29tbWVudDUzMDI0MjAxMg== jthielen 3460034 2019-09-11T06:32:44Z 2019-09-11T20:47:29Z CONTRIBUTOR

Based on your input and expected result, I would guess that you are looking to regrid your data from the x/y curvilinear grid of your GRIB weather data to a new rectilinear (lon/lat) grid?

If so, I would suggest using xESMF (example here), since xarray itself does not appear to support this type of interpolation yet. As an example, given the dimension names you currently have and expect in the output, your code could look something like the following:

```python import xesmf as xe

...load data...

ds_xy_grid = ds_xy_grid.rename(latitude='lat', longitude='lon') ds_out = xr.Dataset({'lat': (['lat'], np.linspace(44.77, 56.14, 421)), 'lon': (['lon'], np.linspace(2.976, 19.84, 461)}) regridder = xe.Regridder(ds_xy_grid, ds_out, 'bilinear') ds_lonlat_grid = regridder(ds_xy_grid) ds_lonlat_grid = ds_lonlat_grid.rename(lat='latitude', lon='longitude') ```

(updated as per comment below)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  swap dimensions with multi-dimensional coordinates 491719758
527245079 https://github.com/pydata/xarray/pull/3238#issuecomment-527245079 https://api.github.com/repos/pydata/xarray/issues/3238 MDEyOklzc3VlQ29tbWVudDUyNzI0NTA3OQ== jthielen 3460034 2019-09-02T21:24:52Z 2019-09-02T21:31:05Z CONTRIBUTOR

After digging a bit more into np.prod based on your comment, the situation is more complicated than I thought it was. I forgot that array.prod() has an implementation because pint has a ufunc implementation for it (so, actually np.prod(array) gives a result by way of __array_ufunc__ before https://github.com/hgrecco/pint/pull/764, but it no longer functions that way when __array_function__ is used). I hesitate to delegate to that implementation though, since it gives incorrect units when the axis and/or where arguments are used (see https://github.com/hgrecco/pint/issues/867).

array.any() and array.all() work now because they fall back to .any() and .all() on the magnitude; they aren't explicitly handled (or tested) by pint. Falling back to the magnitude here seems mostly fine, except possibly for offset units. Once a decision is made about the expected behavior in that case, I can add them to the __array_function__ list and add tests in pint.

Based on your tests and examples, I would agree that https://github.com/pydata/xarray/issues/3241 isn't really fixed. I also agree that adding the method tests is a good idea.

Thank you for clarifying about the sel/isel/loc tests. For some reason, I'm unfortunately getting results more consistent with the stripped unit behavior, rather than what your test results are showing. For example:

```python import xarray as xr import numpy as np import pint

unit_registry = pint.UnitRegistry()

array = np.linspace(5, 10, 20).astype(int) * unit_registry.m x = np.arange(len(array)) * unit_registry.s data_array = xr.DataArray(data=array, coords={"x": x}, dims=["x"])

print(data_array.sel(x = [15, 16] * unit_registry.volts)) outputs <xarray.DataArray (x: 2)> <Quantity([8 9], 'meter')> Coordinates: * x (x) int64 15 16 `` whereas it should be giving aKeyError` by your tests, if I'm interpreting them correctly. What happens in your environment when you run the above snippet?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tests for arrays with units 484015016
527230031 https://github.com/pydata/xarray/pull/3238#issuecomment-527230031 https://api.github.com/repos/pydata/xarray/issues/3238 MDEyOklzc3VlQ29tbWVudDUyNzIzMDAzMQ== jthielen 3460034 2019-09-02T19:32:16Z 2019-09-02T19:32:16Z CONTRIBUTOR

Thank you for the update! Here are responses to each issue brought up:

np.prod (along with np.product and np.nanprod) was not implemented yet since it seems non-trivial to determine the number of unit multiplications that occur given both axis and where arguments in a performant way. I can take a closer look at it though, and hopefully will be able to have an implementation soon. (If anyone has any ideas on this, let me know!)

I passed up np.all and np.any since I associated those with boolean arrays, rather than arrays with units. However, since it sounds like they are needed here, I can add those to the list. (Looking at it though, behavior with non-multiplicative units is unclear right now...see https://github.com/hgrecco/pint/issues/866)

np.allclose should be easy to implement, and I agree it would be nice to not have to have that workaround.

np.maximum and np.minimum appear to be routed through __array_ufunc__ instead of __array_function__. I wasn't aware of that issue until you pointed it out, but looking into the implementation, "maximum" and "minimum" (and any other ufunc not explicitly listed by pint in BaseQuantity.__set_units, BaseQuantity.__copy_units, or BaseQuantity.__prod_units) will be behave in this way since checking input compatibility but not wrapping output is the fallback behavior. This seems like a bug to me that it doesn't raise a UnitStrippedWarning, and at the very least, it is inconsistent with the fail-to-NotImplemented behavior of the __array_function__ implementation. I'll bring up this point at https://github.com/hgrecco/pint/pull/764.

np.argmin/np.argmax/np.sum: there seems to be a lot going on with these at least when I ran the tests locally. First, when called on data_array, they appear to be going deep into xarray/core/duck_array_ops.py and xarray/core/nanops.py rather than ever calling np.nanargmin/np.nanargmax/np.nansum on data_array.data. Second, even if that is the proper pathway, it ends up failing because it ends up calling np.result_type on mixed ndarray and QuantitySequence arguments. This is still unresolved on pint's end (see https://github.com/hgrecco/pint/pull/764#issuecomment-523272038 for the specific issue, https://github.com/hgrecco/pint/issues/845 for the more general discussion). All this also only happens on float dtype...it works just fine with int dtype, which is puzzling to me.

np.median: This one has been driving me crazy. In my environment, when running in pytest, it is erroring out because somewhere np.median being called on data_array is trying to cast the result (a pint QuantityScalar) to a float/int, which is not allowed by pint when the quantity is not dimensionless. But, I have no idea why it is ending up there. However, when replicating the test outside of pytest, it doesn't error, but instead the result is missing the magnitude (with a UnitStrippedWarning raised when going through the __array_struct__ attribute on the quantity. data_array.median(), however, works just fine. I'm not really sure what to do here.

For indexing, when you say "working," would you be able to clarify what your expected behavior is for indexing? Based on https://github.com/pydata/xarray/issues/525#issuecomment-514880353 and the preceding discussion, right now indices will have units stripped, so for me at least, I would expect any attempt at unit-aware indexing to either not work or raise a UnitStrippedWarning.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tests for arrays with units 484015016
524570522 https://github.com/pydata/xarray/issues/525#issuecomment-524570522 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU3MDUyMg== jthielen 3460034 2019-08-24T18:12:55Z 2019-08-24T18:39:49Z CONTRIBUTOR

@shoyer I agree, the accessor interface makes a lot of sense for this: it's more conservative on the xarray side, while also giving the most flexibility for the pint + xarray integration.

Based on your feedback and what I'd hope to see out of the pint + xarray integration, I'm thinking a pint-adjacent package like pint-xarray may be the best route forward. ~~I'll create an issue on pint to inquire about that possibility.~~ See https://github.com/hgrecco/pint/issues/849.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524569782 https://github.com/pydata/xarray/issues/525#issuecomment-524569782 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU2OTc4Mg== jthielen 3460034 2019-08-24T18:00:37Z 2019-08-24T18:01:11Z CONTRIBUTOR

Oh, okay, having the fallback like that was how I thought about implementing it. (I'm sorry that I didn't describe that in my initial comment.)

So would the way forward be to implement DataArray.units_convert()/DataArray.units_to() and DataArray.units as you described right now, but wait for and/or delegate IO integration? Or, should there also be a fully-backwards-compatible IO integration implemented now (such as an optional kwarg on open_dataset and to_netcdf)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524568112 https://github.com/pydata/xarray/issues/525#issuecomment-524568112 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDU2ODExMg== jthielen 3460034 2019-08-24T17:34:31Z 2019-08-24T17:34:31Z CONTRIBUTOR

@shoyer Thank you for the reply!

That sounds good about the repr custom logic.

With the units attribute, I was presuming based on the past comments that DataArray.units would be a new property; I forgot that DataArray.<attrname> passes along to DataArray.attrs.<attrname>, so that implementing something new for DataArray.units would be a breaking change! In trying to avoid such a change, though, I think it would be confusing to have a DataArray-level DataArray.units_convert method and not a corresponding DataArray-level way of getting at the units. So, would it be okay to just implement this unit interface (unit access, unit conversion, and IO) through an accessor, and start out with just a pint accessor? If so, where should it be implemented?

Possible ideas I had:

  • As a boilerplate example in the xarray documentation that downstream libraries or end-users can implement?
  • In xarray itself?
  • In pint or a new pint-adjacent package (similar to pint-pandas)?
  • A new xarray-adjacent package for general-purpose unit compatibility?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524518305 https://github.com/pydata/xarray/issues/525#issuecomment-524518305 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUyNDUxODMwNQ== jthielen 3460034 2019-08-24T04:17:54Z 2019-08-24T04:17:54Z CONTRIBUTOR

With the progress being made with https://github.com/pydata/xarray/pull/2956, https://github.com/pydata/xarray/pull/3238, and https://github.com/hgrecco/pint/pull/764, I was thinking that now might be a good time to work out the details of the "minimal units layer" mentioned by @shoyer in https://github.com/pydata/xarray/issues/525#issuecomment-482641808 and https://github.com/pydata/xarray/issues/988#issuecomment-413732471?

I'd be glad to try putting together a PR that could follow up on https://github.com/pydata/xarray/pull/3238 for it, but I would want to ask for some guidance:

(For reference, below is the action list from https://github.com/pydata/xarray/issues/988#issuecomment-413732471)

  • The DataArray.units property could forward to DataArray.data.units.
  • A DataArray.to or DataArray.convert method could call the relevant method on data and re-wrap it in a DataArray.
  • A minimal layer on top of xarray's netCDF IO could handle unit attributes by wrapping/unwrapping arrays with pint.

DataArray.units

Having DataArray.units forward to DataArray.data.units should work for pint, unyt, and quantities, but should a fallback to DataArray.data.unit be added for astropy.units? Also, how should DataArray.units behave if DataArray.data does not have a "units" or "unit" attribute, but DataArray.attrs['units'] exists?

DataArray.to()/DataArray.convert()

DataArray.to() would be consistent with the methods for pint, unyt, and astropy.units (the relevant method for quantities looks to be .rescale()), however, it is very similar to the numerous output-related DataArray.to_*() methods. Is this okay, or would DataArray.convert() or some other method name be better to avoid confusion?

Units and IO

While wrapping and unwrapping arrays with pint itself should be straightforward, I really don't know what the best API for it should be, especially for input.

Some possibilities that came to mind (by no means an exhaustive list):

  • Leave open_dataset as it is now, but provide examples in the documentation for how to reconstruct a new Dataset with unit arrays (perhaps provide a boilerplate function or accessor)
  • Add a kwarg like "wrap_units" to open_dataset() that accepts a quantity constructor (like ureg.Quantity in pint) that is applied within each variable
  • Devise some generalized system for specifying the internal array structure in the opened dataset (to handle other duck array types, not just unit arrays)

With any of these, tests for lazy-loading would be crucial (I don't know yet how pint will handle that).

Output may be easier: I was thinking that unwrapping could be done implicitly by automatically putting str(DataArray.units) as the "units" attribute and replacing the unit array with its magnitude/value?

Extra questions based on sparse implementation

__repr__

Will a set of repr functions for each unit array type need to be added like they were for sparse in https://github.com/pydata/xarray/pull/3211? Or should there be some more general system implemented because of all of the possible combinations that would arise with other duck array types?

to_dense()/.to_numpy_data()/.to_numpy()

What is the expected behavior with unit arrays with regards to this soon-to-be-implemented conversion method?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
524099935 https://github.com/pydata/xarray/pull/3238#issuecomment-524099935 https://api.github.com/repos/pydata/xarray/issues/3238 MDEyOklzc3VlQ29tbWVudDUyNDA5OTkzNQ== jthielen 3460034 2019-08-22T22:20:49Z 2019-08-22T22:20:49Z CONTRIBUTOR

I noticed you have @pytest.mark.filterwarnings("error::pint.errors.UnitStrippedWarning"), which I believe is what is raising that warning to the error level. What happens when you remove that?

As to why the warning is happening in the first place, I think that is because the __array_function__ implementation in pint right now only exists in a PR. So, when xarray looks for an __array_function__ attribute on the pint Quantity right now, it falls back to this:

https://github.com/hgrecco/pint/blob/2afdc4bf5c5727ed9cef6fdaccb00b88813c4a24/pint/quantity.py#L1438-L1449

Hence, the UnitStrippedWarning. However, hasattr(data, "__array_function__") should still work, albeit for the wrong reason since pint doesn't yet have a true __array_function__ method.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tests for arrays with units 484015016
524088212 https://github.com/pydata/xarray/pull/3238#issuecomment-524088212 https://api.github.com/repos/pydata/xarray/issues/3238 MDEyOklzc3VlQ29tbWVudDUyNDA4ODIxMg== jthielen 3460034 2019-08-22T21:39:38Z 2019-08-22T21:39:38Z CONTRIBUTOR

@keewis: In case it helps, I've added a bunch of additional __array_function__ implementations in pint in https://github.com/andrewgsavage/pint/pull/6, which will hopefully get merged soon into the main https://github.com/hgrecco/pint/pull/764 PR. If through these tests with xarray you find something missing on pint's end, let me know and I'd be glad to help.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tests for arrays with units 484015016
523229282 https://github.com/pydata/xarray/pull/2956#issuecomment-523229282 https://api.github.com/repos/pydata/xarray/issues/2956 MDEyOklzc3VlQ29tbWVudDUyMzIyOTI4Mg== jthielen 3460034 2019-08-20T23:07:27Z 2019-08-20T23:07:27Z CONTRIBUTOR

I'd love to add tests that verify that xarray can properly wrap arrays with units as soon as possible. We could even merge tests that are all marked "xfail" and that currently only pass when using the experimental version of pint from hgrecco/pint#764.

This is actually something that I've been working towards recently, but I've ran into some delays. I'm still working on the pint side of things since there is a fair number of numpy functions not yet implemented and tested in hgrecco/pint#764.

@keewis Would you want to take the lead on these xarray + pint with __array_function__ tests, since you've been working on related tests here, or would you want me to see what I can come up with based on what I've been doing so far?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Picking up #1118: Do not convert subclasses of `ndarray` unless required 443157666
514877824 https://github.com/pydata/xarray/issues/525#issuecomment-514877824 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUxNDg3NzgyNA== jthielen 3460034 2019-07-25T03:11:20Z 2019-07-25T03:11:20Z CONTRIBUTOR

Thank you for the insight!

So if I'm understanding things correctly as they stand now, dimension coordinates store their values internally as a pandas.Index, which would mean, to implement this directly, this becomes an upstream issue in pandas to allow a ndarray-like unit array inside a pandas.Index? Based on what I've seen on the pandas side, this looks far from straightforward.

With that in mind, would "dimension coordinates with units" (or more generally "dimension coordinates with __array_function__ implementers") be another use case that best falls under flexible indices (#1603)?

(In the mean time, I would guess that the best workaround is using an accessor interface to handle unit-related operations on coordinates, since the attrs are preserved.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585
514452182 https://github.com/pydata/xarray/issues/525#issuecomment-514452182 https://api.github.com/repos/pydata/xarray/issues/525 MDEyOklzc3VlQ29tbWVudDUxNDQ1MjE4Mg== jthielen 3460034 2019-07-24T02:19:08Z 2019-07-24T02:19:08Z CONTRIBUTOR

In light of the recent activity with __array_function__ in #3117, I took a quick look to see if it worked with Pint as modified in https://github.com/hgrecco/pint/pull/764. The basics of sticking a Pint Quantity in a DataArray seem to work well, and the perhaps the greatest issues are on Pint's end...right now https://github.com/hgrecco/pint/pull/764 is limited in the functions it handles through __array_function__, and there are some quirks with operator precedence.

However, the other main problem was that coordinates did not work with Quantity's. Looking again at https://github.com/pydata/xarray/issues/1938#issuecomment-510953379 and #2956, this is not surprising. I'm curious though about what it would take to let indexing work with Pint (or other unit arrays)? For most of my use cases (meteorological analysis as in MetPy), having units with coordinates is just as important as having units with the data itself. I'd be interested in helping implement it, but I would greatly appreciate some initial direction, since I'm new to that part of the xarray codebase.

Also, cc @keewis, since I saw in #2956 you have a unit-support branch that looks like it attempts to extend NumpyIndexingAdapter to work with unit arrays, but still has the coordinates-with-units tests marked as xfail.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support for units 100295585

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 18.934ms · About: xarray-datasette