github: issue_comments: 12 rows where issue = 955936490 sorted by updated

12 rows where issue = 955936490 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1239162673	https://github.com/pydata/xarray/issues/5647#issuecomment-1239162673	https://api.github.com/repos/pydata/xarray/issues/5647	IC_kwDOAMm_X85J3B8x	benbovy 4160723	2022-09-07T09:47:13Z	2022-09-07T09:47:13Z	MEMBER	I think we can close this issue, which has mostly been addressed in #5692. I opened #7002 for the potential problem of index alignment vs. coordinate ordering.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Flexible indexes: review the implementation of alignment and merge 955936490
949618104	https://github.com/pydata/xarray/issues/5647#issuecomment-949618104	https://api.github.com/repos/pydata/xarray/issues/5647	IC_kwDOAMm_X844mgW4	benbovy 4160723	2021-10-22T13:11:01Z	2021-10-22T13:11:01Z	MEMBER	@shoyer I'm now reviewing the merging logic. I works mostly fine, with some minor concerns: `MergeElement` (variable, index) tuples are not grouped per matching (unique) index, so index equality will be checked more than once for multi-coordinate indexes. Not a big deal I think, unless for large multi-indexes with many levels. A proper fix for this would probably require some heavier refactoring, though. Is it ok to leave this as is for now? I'm not sure how we should handle the case where `foo` is the name of both an un-indexed dimension and a coordinate with other dimension(s). Let's take this example: ```python from xarray.tests.test_dataset import create_test_multiindex data = create_test_multiindex() data <xarray.Dataset> Dimensions: (x: 4) Coordinates: * x (x) MultiIndex - level_1 (x) object 'a' 'a' 'b' 'b' - level_2 (x) int64 1 2 1 2 Data variables: empty python other = xr.Dataset({"z": ("level_1", [0, 1])}) merged = data.merge(other) ValueError: conflicting level / dimension names. level_1 already exists as a level name. ``` Do we raise a `ValueError` here because otherwise we don't know what to return from `merged["level_1"]` (i.e., the multi-index virtual coordinate or a default dimension coordinate)? Can we allow this case now that multi-index level coordinates are not virtual anymore? Note: the following example does not raise any error in xarray: ```python data = xr.Dataset(coords={"x": [0, 1, 2, 3], "level_1": ("x", ["a", "a", "b", "b"])}) other = xr.Dataset({"z": ("level_1", [0, 1])}) merged = data.merge(other) merged <xarray.Dataset> Dimensions: (x: 4, level_1: 2) Coordinates: * x (x) int64 0 1 2 3 level_1 (x) <U1 'a' 'a' 'b' 'b' Data variables: z (level_1) int64 0 1 ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Flexible indexes: review the implementation of alignment and merge 955936490
946883858	https://github.com/pydata/xarray/issues/5647#issuecomment-946883858	https://api.github.com/repos/pydata/xarray/issues/5647	IC_kwDOAMm_X844cE0S	benbovy 4160723	2021-10-19T16:16:24Z	2021-10-19T16:16:24Z	MEMBER	Other possible solutions (from discussion with @shoyer): Take care of coordinate order (and maybe other things) inside `Index.join` and `Index.equals`, e.g., for `PandasMultiIndex` maybe reorder the levels beforehand. pros: more flexible cons: not great to implicitly reorder levels if it's a costly operation? Find matching indexes using a two-passes approach: (1) group all indexes by dimension name and (2) check compatibility between the indexes listed in each group.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Flexible indexes: review the implementation of alignment and merge 955936490
946546464	https://github.com/pydata/xarray/issues/5647#issuecomment-946546464	https://api.github.com/repos/pydata/xarray/issues/5647	IC_kwDOAMm_X844aycg	benbovy 4160723	2021-10-19T09:47:04Z	2021-10-19T09:47:04Z	MEMBER	I've used the following key type to find matching indexes: `python CoordNamesAndDims = FrozenSet[Tuple[Hashable, Tuple[Hashable, ...]]] MatchingIndexKey = Tuple[CoordNamesAndDims, Type[Index]]` where the order of coordinates doesn't matter. For `PandasMultiIndex` this is wrong, we should use a `Tuple` instead of a `FrozenSet` (assuming that the coordinate names are properly ordered in `obj.xindexes`), but I'm wondering if using a `Tuple` wouldn't be too restrictive in some other cases. Are there potential custom indexes where the order of coordinates doesn't matter? Maybe a good example is a meta-index for staggered grids where the cell center coordinate and the cell edges coordinates might be given in any order. Possible solutions to address this: Xarray indexes may reorder the coordinate variables, possibly via `Index.create_variables()`, to ensure consistent order Xarray indexes must implement the `Index.matching_key` abstract property in order to support re-indexing and alignment. Option 2 is more flexible but option 1 might be enough. Option 1 may also be great for clearer indexes and coordinates sections in Xarray objects `repr`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Flexible indexes: review the implementation of alignment and merge 955936490
945933099	https://github.com/pydata/xarray/issues/5647#issuecomment-945933099	https://api.github.com/repos/pydata/xarray/issues/5647	IC_kwDOAMm_X844Ycsr	benbovy 4160723	2021-10-18T16:09:11Z	2021-10-18T16:09:11Z	MEMBER	If we like, we could probably even add static type information on reindex_like, to require the same index type Yes, I did that for `Index.join()` and `Index.reindex_like()` in #5692.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Flexible indexes: review the implementation of alignment and merge 955936490
945931002	https://github.com/pydata/xarray/issues/5647#issuecomment-945931002	https://api.github.com/repos/pydata/xarray/issues/5647	IC_kwDOAMm_X844YcL6	shoyer 1217238	2021-10-18T16:07:06Z	2021-10-18T16:07:06Z	MEMBER	For alignment, I think we could directly call `idx.reindex_like(other)` for each index in each object to align where `other` corresponds to the aligned index that matches `idx`. Sounds good to me! If we like, we could probably even add static type information on `reindex_like`, to require the same index type: ```python T = TypeVar("T", bound="Index") ... def reindex_like(self: T, other: T) -> DimIntIndexers: ... ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Flexible indexes: review the implementation of alignment and merge 955936490
945928870	https://github.com/pydata/xarray/issues/5647#issuecomment-945928870	https://api.github.com/repos/pydata/xarray/issues/5647	IC_kwDOAMm_X844Ybqm	shoyer 1217238	2021-10-18T16:04:52Z	2021-10-18T16:04:52Z	MEMBER	2. `dim` has a multi-index, `labels` is a `pd.MultiIndex` and level names exactly match: no real issue either in this case, but shouldn't we discourage this implicit behavior in the mid/long term and instead encourage using `reindex_like` for such more advanced use case? I agree, this doesn't make sense in the long term because the "name" of the MultiIndex is no longer necessary: it's just the same index that happens to index each of the levels. Let's preserve it (for now) for backwards compat, but in the long term the ideal is certainly either (a) using `reindex_like` or (b) using `reindex` with separate kwargs for each level 3. `dim` has a multi-index and `labels` is (cast to) a single pandas index (or the other way around): this is currently possible in Xarray but it seems silly? After re-indexing, all data along `dim` is filled with `fill_value`... Would it be fine to instead raise an error now? Would it really break any user case? If part of Xarray's API currently doesn't raise an error but instead returns all NaNs, and this case can be detected based on static type information (e.g., shapes, dtypes, dimension names, variable name, index types), then I agree that the best user experience is almost certainly to raise an error instead. NaNs in numerical computing are essentially a mechanism for "runtime error handling" that can arise from values in array computation. If something can be identified based on type information, that is a better user experience. 4. `dim` has a multi-index, `labels` is a `pd.MultiIndex` and multi-index level names don't match: same problem than for case 3. Cases 3 and 4 are a big obstacle for me right now. I really don't know how we can still support those special cases without deeply re-thinking the problem. If they could be considered as a bug, then the new implementation would already raise an nice error message :-). I think we can consider these edge cases bugs and fix them :)	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Flexible indexes: review the implementation of alignment and merge 955936490
945592260	https://github.com/pydata/xarray/issues/5647#issuecomment-945592260	https://api.github.com/repos/pydata/xarray/issues/5647	IC_kwDOAMm_X844XJfE	benbovy 4160723	2021-10-18T09:43:18Z	2021-10-18T10:47:26Z	MEMBER	Ok, I'm now hitting another obstacle while working on reindex. So one general approach for both alignment and re-indexing that is "pretty straightforward" to implement with the new Xarray index data model is: (1) find matching indexes based on their corresponding coordinate/dimension names and index type, and (2) call `idx.join(other)` and/or `idx.reindex_like(other)` where `other` is another index object of the same type than `idx`. This is what I've done so far in #5692. Relaxing any of the constraints in (1) would be much more complicated to implement. We would need to do some sort of mapping from dimension labels to all involved (multi-/meta-)indexes, then check for conflicts in dimension indexers returned from multiple indexes, possibly handle/remove multi-index coordinates (or convert back to non-indexed coordinates), etc. One problem with `Dataset.reindex` and `DataArray.reindex` is that we can pass any `{dim: labels}` as indexers. How should we adapt the API for flexible indexes? How to ensure backwards compatibility? The current possible cases are: Both `dim` and `labels` correspond (or may be cast) to single pandas indexes: there's no issue in this case with the constraints stated in (1) above. `dim` has a multi-index, `labels` is a `pd.MultiIndex` and level names exactly match: no real issue either in this case, but shouldn't we discourage this implicit behavior in the mid/long term and instead encourage using `reindex_like` for such more advanced use case? `dim` has a multi-index and `labels` is (cast to) a single pandas index (or the other way around): this is currently possible in Xarray but it seems silly? After re-indexing, all data along `dim` is filled with `fill_value`... Would it be fine to instead raise an error now? Would it really break any user case? `dim` has a multi-index, `labels` is a `pd.MultiIndex` and multi-index level names don't match: same problem than for case 3. Cases 3 and 4 are a big obstacle for me right now. I really don't know how we can still support those special cases without deeply re-thinking the problem. If they could be considered as a bug, then the new implementation would already raise an nice error message :-).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Flexible indexes: review the implementation of alignment and merge 955936490
944273383	https://github.com/pydata/xarray/issues/5647#issuecomment-944273383	https://api.github.com/repos/pydata/xarray/issues/5647	IC_kwDOAMm_X844SHfn	benbovy 4160723	2021-10-15T12:53:45Z	2021-10-15T12:57:38Z	MEMBER	I've now re-implemented most of the alignment logic in #5692, using a dedicated class so that all intermediate steps are implemented in a clearer and more maintainable way. One remaining task is to get the re-indexing logic right. Rather than relying on an `Index.dim_indexers` property, I feel that it'd be better to have two methods: ```python DimIntIndexers = Dict[Hashable, Any] class Index: def reindex(self, dim_labels: Mapping[Hashable, Any]) -> DimIntIndexers: ... `def reindex_like(self, other: "Index") -> DimIntIndexers: ...` ``` For alignment, I think we could directly call `idx.reindex_like(other)` for each index in each object to align where `other` corresponds to the aligned index that matches `idx`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Flexible indexes: review the implementation of alignment and merge 955936490
937865982	https://github.com/pydata/xarray/issues/5647#issuecomment-937865982	https://api.github.com/repos/pydata/xarray/issues/5647	IC_kwDOAMm_X8435rL-	benbovy 4160723	2021-10-07T14:48:02Z	2021-10-07T14:48:02Z	MEMBER	@shoyer I'm looking more deeply into this. I think it will be impossible to avoid a heavy refactoring in `core/alignment.py` if we want to support "meta-indexes" and handle non-dimension indexed coordinates, so I'd like to check with you about how best we can tackle this problem. I'm thinking about the following approach: Instead of looking at the dimensions of each object to align, look at their indexes. matching indexes must all correspond to the same set of coordinate names and dimension names (should we also check the index type?) if coordinate names vs. indexes match only partially, then raise an error For each of the matched indexes, check for index equality and perform join if that's not the case (or just pick the matching index that has been explicitly passed to `align` via its `indexes` argument) For convenience, we could expose an `Index.join(self, other, how)` method instead of having two separate `Index.union()` and `Index.intersection()` methods. `Index.join()` might also return index coordinate variables that we can assign to the aligned objects if the indexes do not provide an implementation for `.equals`, check for coordinate equality if they do not provide an implementation for `.join`, raise an error or ignore it? Probably better to ignore it, i.e., do not create a joined index in the aligned objects and align the coordinates just as regular variables (this might be taken care by another index)... Add an `Index.dim_indexers(self) -> Dict[Hashable, Any]` property that returns label indexers for each dimension involved in the index, and which will be used to conform the objects to the joined indexes. this is probably where we could make sure to not cast str coords to object for pandas indexes Merge `dim_indexers` returned by all joined indexes and raise if there's any conflict (i.e., two distinct indexes should not return indexers along a common dimension). Check the size of unlabelled dimensions (also against the sizes found in the merged `dim_indexers` for matching dimensions) Reindex the objects Does that sounds right to you? I'd really appreciate any guidance on this before going further as I'm worried about missing something important or another more straightforward approach.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Flexible indexes: review the implementation of alignment and merge 955936490
889483293	https://github.com/pydata/xarray/issues/5647#issuecomment-889483293	https://api.github.com/repos/pydata/xarray/issues/5647	IC_kwDOAMm_X841BHAd	benbovy 4160723	2021-07-29T21:49:02Z	2021-07-29T21:49:02Z	MEMBER	We should raise an error in this cases (and/or suggest setting a MultiIndex). It should also be OK if not every index implements alignment, in which case they should raise an error if coordinates do not match exactly. Agreed. I don't think we should try to support alignment with multi-dimensional (non-orthogonal) inside align(). Instead we can just require the coordinates corresponding to such indexes to match exactly (or within a tolerance). I could see some examples (e.g., union or intersection of staggered grids) where it could still be useful to have a (meta-)index that implements alignment, though. Actually, while working on #5636 I was thinking more about how to perform alignment based on `xarray.Index` objects given that the concept of an `xarray.Index` should be clearly distinct from the concept of a `Variable`. We could maybe add an `xarray.Index.sizes` property similarly to `Variable.sizes`. Alternatively, we could allow `xarray.Index.union()` and `xarray.Index.intersection()` returning both the joined index and the new `Variable` objects created from the latter index. I'd lean towards the 2nd option, which is consistent with the class method constructors added in #5636. Not sure the 1st option is a good idea and it doesn't solve all the issues mentioned in my comment above.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Flexible indexes: review the implementation of alignment and merge 955936490
889288761	https://github.com/pydata/xarray/issues/5647#issuecomment-889288761	https://api.github.com/repos/pydata/xarray/issues/5647	IC_kwDOAMm_X841AXg5	shoyer 1217238	2021-07-29T16:26:01Z	2021-07-29T16:26:01Z	MEMBER	Conceptually, `align` should ensure that all indexes in the provided arguments match. This is potentially ambiguous for cases with multiple (non-MultiIndex) indexes, if the result of aligning the separate indexes does not match, e.g., if we have: - Index for `x` along dimension `x`: `[1, 2, 3]` vs `[1, 2]` - Index for `y` along dimension `x`: `[1, 2, 3]` vs `[2, 3]` We should raise an error in this cases (and/or suggest setting a MultiIndex). It should also be OK if not every index implements alignment, in which case they should raise an error if coordinates do not match exactly. With regards to your concern: This currently works well since a pd.Index can be directly treated as a 1-d array but this won’t be always the case anymore with custom indexes. I don't think we should try to support alignment with multi-dimensional (non-orthogonal) inside `align()`. Instead we can just require the coordinates correspondign to such indexes to match exactly (or within a tolerance). I can see supporting alignment between two DataArrays that are indexed by a KDTreeIndex over multiple coordintaes along a single dimension in the same way tthat we support alignment with MultiIndex indexes currently. But if your indexes correpsond to multi-dimensional arrays (rather than just multiple coordinates), joining indexes together is a much messier operation, one that may not be possible without thinking carefully about interpolation/regridding. In many cases it may not be possible to retain the multi-dimensional nature of the indexes in the result (e.g., the union of two partially overlapping grids). Since the desired behavior is not clear, it is better to force the user to make a choice, either by stacking the dimensions in multi-dimensional indexes into 1D (like a MultiIndex) or by calling a specialized method for interpolation/regridding.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Flexible indexes: review the implementation of alignment and merge 955936490

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);