home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where "created_at" is on date 2023-07-19 and user = 2448579 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 2
  • pull 1

state 2

  • closed 2
  • open 1

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1812301185 I_kwDOAMm_X85sBYWB 8005 Design for IntervalIndex dcherian 2448579 open 0     5 2023-07-19T16:30:50Z 2023-09-09T06:30:20Z   MEMBER      

Is your feature request related to a problem?

We should add a wrapper for pandas.IntervalIndex this would solve a long standing problem around propagating "bounds" variables (CF conventions, https://github.com/pydata/xarray/issues/1475)

The CF design

CF "encoding" for intervals is to use bounds variables. There is an attribute "bounds" on the dimension coordinate, that refers to a second variable (at least 2D). Example: x has an attribute bounds that refers to x_bounds.

```python import numpy as np

left = np.arange(0.5, 3.6, 1) right = np.arange(1.5, 4.6, 1) bounds = np.stack([left, right])

ds = xr.Dataset( {"data": ("x", [1, 2, 3, 4])}, coords={"x": ("x", [1, 2, 3, 4], {"bounds": "x_bounds"}), "x_bounds": (("bnds", "x"), bounds)}, ) ds ```

A fundamental problem with our current data model is that we lose x_bounds when we extract ds.data because there is a dimension bnds that is not shared with ds.data. Very important metadata is now lost!

We would also like to use the "bounds" to enable interval based indexing. ds.sel(x=1.1) should give you the value from the appropriate interval.

Pandas IntervalIndex

All the indexing is easy to implement by wrapping pandas.IntervalIndex, but there is one limitation. pd.IntervalIndex saves two pieces of information for each interval (left bound, right bound). CF saves three : left bound, right bound (see x_bounds) and a "central" value (see x). This should be OK to work around in our wrapper.

Fundamental Question

To me, a core question is whether x_bounds needs to be preserved after creating an IntervalIndex. 1. If so, we need a better rule around coordinate variable propagation. In this case, the IntervalIndex would be associated with x and x_bounds. So the rule could be > "propagate all variables necessary to propagate an index associated with any of the dimensions on the extracted variable."

So when extracting `ds.data` we propagate all variables necessary to propagate indexes associated with `ds.data.dims` that is `x` which would say "propagate `x`, `x_bounds`, and the IntervalIndex.
  1. Alternatively, we could choose to drop x_bounds entirely. I interpret this approach as "decoding" the bounds variable to an interval index object. When saving to disk, we would encode the interval index in two variables. (See below)

Describe the solution you'd like

I've prototyped (2) [approach 1 in this notebook) following @benbovy's suggestion

```python from xarray import Variable from xarray.indexes import PandasIndex class XarrayIntervalIndex(PandasIndex): def __init__(self, index, dim, coord_dtype): assert isinstance(index, pd.IntervalIndex) # for PandasIndex self.index = index self.dim = dim self.coord_dtype = coord_dtype @classmethod def from_variables(cls, variables, options): assert len(variables) == 1 (dim,) = tuple(variables) bounds = options["bounds"] assert isinstance(bounds, (xr.DataArray, xr.Variable)) (axis,) = bounds.get_axis_num(set(bounds.dims) - {dim}) left, right = np.split(bounds.data, 2, axis=axis) index = pd.IntervalIndex.from_arrays(left.squeeze(), right.squeeze()) coord_dtype = bounds.dtype return cls(index, dim, coord_dtype) def create_variables(self, variables): from xarray.core.indexing import PandasIndexingAdapter newvars = {self.dim: xr.Variable(self.dim, PandasIndexingAdapter(self.index))} return newvars def __repr__(self): string = f"Xarray{self.index!r}" return string def to_pandas_index(self): return self.index @property def mid(self): return PandasIndex(self.index.right, self.dim, self.coord_dtype) @property def left(self): return PandasIndex(self.index.right, self.dim, self.coord_dtype) @property def right(self): return PandasIndex(self.index.right, self.dim, self.coord_dtype) ```

python ds1 = ( ds.drop_indexes("x") .set_xindex("x", XarrayIntervalIndex, bounds=ds.x_bounds) .drop_vars("x_bounds") ) ds1

python ds1.sel(x=1.1)

Describe alternatives you've considered

I've tried some approaches in this notebook

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8005/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1812504689 I_kwDOAMm_X85sCKBx 8006 Fix documentation about datetime_unit of xarray.DataArray.differentiate dcherian 2448579 closed 0     0 2023-07-19T18:31:10Z 2023-09-01T09:37:15Z 2023-09-01T09:37:15Z MEMBER      

Should say that Y and M cannot be supported with datetime64

Discussed in https://github.com/pydata/xarray/discussions/8000

<sup>Originally posted by **jesieleo** July 19, 2023</sup> I have a piece of data that looks like this ``` <xarray.Dataset> Dimensions: (time: 612, LEV: 15, latitude: 20, longitude: 357) Coordinates: * time (time) datetime64[ns] 1960-01-15 1960-02-15 ... 2010-12-15 * LEV (LEV) float64 5.01 15.07 25.28 35.76 ... 149.0 171.4 197.8 229.5 * latitude (latitude) float64 -4.75 -4.25 -3.75 -3.25 ... 3.75 4.25 4.75 * longitude (longitude) float64 114.2 114.8 115.2 115.8 ... 291.2 291.8 292.2 Data variables: u (time, LEV, latitude, longitude) float32 ... Attributes: (12/30) cdm_data_type: Grid Conventions: COARDS, CF-1.6, ACDD-1.3 creator_email: chepurin@umd.edu creator_name: APDRC creator_type: institution creator_url: https://www.atmos.umd.edu/~ocean/ ... ... standard_name_vocabulary: CF Standard Name Table v29 summary: Simple Ocean Data Assimilation (SODA) soda po... time_coverage_end: 2010-12-15T00:00:00Z time_coverage_start: 1983-01-15T00:00:00Z title: SODA soda pop2.2.4 [TIME][LEV][LAT][LON] Westernmost_Easting: 118.25 ``` when i try to use xarray.DataArray.differentiate `data.u.differentiate('time',datetime_unit='M')` will appear ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\Anaconda3\lib\site-packages\xarray\core\dataarray.py", line 3609, in differentiate ds = self._to_temp_dataset().differentiate(coord, edge_order, datetime_unit) File "D:\Anaconda3\lib\site-packages\xarray\core\dataset.py", line 6372, in differentiate coord_var = coord_var._to_numeric(datetime_unit=datetime_unit) File "D:\Anaconda3\lib\site-packages\xarray\core\variable.py", line 2428, in _to_numeric numeric_array = duck_array_ops.datetime_to_numeric( File "D:\Anaconda3\lib\site-packages\xarray\core\duck_array_ops.py", line 466, in datetime_to_numeric array = array / np.timedelta64(1, datetime_unit) TypeError: Cannot get a common metadata divisor for Numpy datatime metadata [ns] and [M] because they have incompatible nonlinear base time units. ``` Would you please told me is this a BUG?
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8006/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1812646094 PR_kwDOAMm_X85V7g7q 8007 Update copyright year in README dcherian 2448579 closed 0     0 2023-07-19T20:00:50Z 2023-07-20T21:13:27Z 2023-07-20T21:13:26Z MEMBER   0 pydata/xarray/pulls/8007  
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8007/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 56.802ms · About: xarray-datasette