github: issues: 3 rows where comments = 5, state = "open" and user = 2448579 sorted by updated

3 rows where comments = 5, state = "open" and user = 2448579 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	author_association	body	reactions	repo	type
1954445639	I_kwDOAMm_X850fnlH	8350	optimize align for scalars at least	dcherian 2448579	open	5	2023-10-20T14:48:25Z	2023-10-20T19:17:39Z	MEMBER	What happened? Here's a simple rescaling calculation: ```python import numpy as np import xarray as xr ds = xr.Dataset( {"a": (("x", "y"), np.ones((300, 400))), "b": (("x", "y"), np.ones((300, 400)))} ) mean = ds.mean() # scalar std = ds.std() # scalar rescaled = (ds - mean) / std ``` The profile for the last line shows 30% (!!!) time spent in `align` (really `reindex_like`) except there's nothing to reindex when only scalars are involved! This is a small example inspired by a ML pipeline where this normalization is happening very many times in a tight loop. cc @benbovy What did you expect to happen? A fast path for when no reindexing needs to happen.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8350/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1217566173	I_kwDOAMm_X85IkpXd	6528	cumsum drops index coordinates	dcherian 2448579	open	5	2022-04-27T16:04:08Z	2023-09-22T07:55:56Z	MEMBER	What happened? cumsum drops index coordinates. Seen in #6525, #3417 What did you expect to happen? Preserve index coordinates Minimal Complete Verifiable Example ```Python import xarray as xr ds = xr.Dataset( {"foo": (("x",), [7, 3, 1, 1, 1, 1, 1])}, coords={"x": [0, 1, 2, 3, 4, 5, 6]}, ) ds.cumsum("x") ``` `<xarray.Dataset> Dimensions: (x: 7) Dimensions without coordinates: x Data variables: foo (x) int64 7 10 11 12 13 14 15` Relevant log output No response Anything else we need to know? No response Environment xarray main	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6528/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1812301185	I_kwDOAMm_X85sBYWB	8005	Design for IntervalIndex	dcherian 2448579	open	5	2023-07-19T16:30:50Z	2023-09-09T06:30:20Z	MEMBER	Is your feature request related to a problem? We should add a wrapper for `pandas.IntervalIndex` this would solve a long standing problem around propagating "bounds" variables (CF conventions, https://github.com/pydata/xarray/issues/1475) The CF design CF "encoding" for intervals is to use bounds variables. There is an attribute `"bounds"` on the dimension coordinate, that refers to a second variable (at least 2D). Example: `x` has an attribute `bounds` that refers to `x_bounds`. ```python import numpy as np left = np.arange(0.5, 3.6, 1) right = np.arange(1.5, 4.6, 1) bounds = np.stack([left, right]) ds = xr.Dataset( {"data": ("x", [1, 2, 3, 4])}, coords={"x": ("x", [1, 2, 3, 4], {"bounds": "x_bounds"}), "x_bounds": (("bnds", "x"), bounds)}, ) ds ``` A fundamental problem with our current data model is that we lose `x_bounds` when we extract `ds.data` because there is a dimension `bnds` that is not shared with `ds.data`. Very important metadata is now lost! We would also like to use the "bounds" to enable interval based indexing. `ds.sel(x=1.1)` should give you the value from the appropriate interval. Pandas IntervalIndex All the indexing is easy to implement by wrapping pandas.IntervalIndex, but there is one limitation. `pd.IntervalIndex` saves two pieces of information for each interval (left bound, right bound). CF saves three : left bound, right bound (see `x_bounds`) and a "central" value (see `x`). This should be OK to work around in our wrapper. Fundamental Question To me, a core question is whether `x_bounds` needs to be preserved after creating an `IntervalIndex`. 1. If so, we need a better rule around coordinate variable propagation. In this case, the IntervalIndex would be associated with `x` and `x_bounds`. So the rule could be > "propagate all variables necessary to propagate an index associated with any of the dimensions on the extracted variable." So when extracting `ds.data` we propagate all variables necessary to propagate indexes associated with `ds.data.dims` that is `x` which would say "propagate `x`, `x_bounds`, and the IntervalIndex. Alternatively, we could choose to drop `x_bounds` entirely. I interpret this approach as "decoding" the bounds variable to an interval index object. When saving to disk, we would encode the interval index in two variables. (See below) Describe the solution you'd like I've prototyped (2) [approach 1 in this notebook) following @benbovy's suggestion ```python from xarray import Variable from xarray.indexes import PandasIndex class XarrayIntervalIndex(PandasIndex): def __init__(self, index, dim, coord_dtype): assert isinstance(index, pd.IntervalIndex) # for PandasIndex self.index = index self.dim = dim self.coord_dtype = coord_dtype @classmethod def from_variables(cls, variables, options): assert len(variables) == 1 (dim,) = tuple(variables) bounds = options["bounds"] assert isinstance(bounds, (xr.DataArray, xr.Variable)) (axis,) = bounds.get_axis_num(set(bounds.dims) - {dim}) left, right = np.split(bounds.data, 2, axis=axis) index = pd.IntervalIndex.from_arrays(left.squeeze(), right.squeeze()) coord_dtype = bounds.dtype return cls(index, dim, coord_dtype) def create_variables(self, variables): from xarray.core.indexing import PandasIndexingAdapter newvars = {self.dim: xr.Variable(self.dim, PandasIndexingAdapter(self.index))} return newvars def __repr__(self): string = f"Xarray{self.index!r}" return string def to_pandas_index(self): return self.index @property def mid(self): return PandasIndex(self.index.right, self.dim, self.coord_dtype) @property def left(self): return PandasIndex(self.index.right, self.dim, self.coord_dtype) @property def right(self): return PandasIndex(self.index.right, self.dim, self.coord_dtype) ``` `python ds1 = ( ds.drop_indexes("x") .set_xindex("x", XarrayIntervalIndex, bounds=ds.x_bounds) .drop_vars("x_bounds") ) ds1` `python ds1.sel(x=1.1)` Describe alternatives you've considered I've tried some approaches in this notebook	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8005/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

3 rows where comments = 5, state = "open" and user = 2448579 sorted by updated_at descending

What happened?

What did you expect to happen?

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Relevant log output

Anything else we need to know?

Environment

Is your feature request related to a problem?

The CF design

Pandas IntervalIndex

Fundamental Question

Describe the solution you'd like

Describe alternatives you've considered

Advanced export