html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/6293#issuecomment-1259228475,https://api.github.com/repos/pydata/xarray/issues/6293,1259228475,IC_kwDOAMm_X85LDk07,4160723,2022-09-27T09:22:04Z,2023-08-24T11:42:53Z,MEMBER,"Following thoughts and discussions in various issues (e.g., #6836), I'd like to suggest another section to the ones in the top comment:
## Deprecate `pandas.MultiIndex` special cases in Xarray
- remove the multi-index “dimension” coordinate (tuple elements)
- do not automatically promote `pandas.MultiIndex` objects as dimension + level coordinates, e.g., like in `xr.Dataset(coords={“x”: pd_midx})` but instead treat it as a single duck-array.
- do not accept `pandas.MultiIndex` as `dim` argument in `xarray.concat()` (#7148)
- remove `obj.to_index()` for all xarray objects?
- (EDIT) remove `Dataset.reset_index()` and `DataArray.reset_index()`
They are source of many problems and complexities in Xarray internals (many regressions reported since the index refactor were related to those special cases) and I'm not sure that the value they add is really worth the trouble. Also, in the long term the special treatment of `PandasMultiIndex` vs. other Xarray multi-indexes may add some confusion.
Some of those features are widely used (e.g., the creation of Dataset / DataArray from pandas multi-indexes is used in many places in unit tests), so we would need convenient alternatives and a smooth transition.
","{""total_count"": 5, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1148021907
https://github.com/pydata/xarray/issues/6836#issuecomment-1504975778,https://api.github.com/repos/pydata/xarray/issues/6836,1504975778,IC_kwDOAMm_X85ZtBui,4160723,2023-04-12T09:42:39Z,2023-04-12T09:42:39Z,MEMBER,A special-case sounds reasonable to me as well as a temporary fix before looking into if/how we can refactor groupby so that it works with multiple kinds of built-in and/or custom indexes.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1318992926
https://github.com/pydata/xarray/pull/7653#issuecomment-1480906129,https://api.github.com/repos/pydata/xarray/issues/7653,1480906129,IC_kwDOAMm_X85YRNWR,4160723,2023-03-23T10:01:35Z,2023-03-23T10:01:35Z,MEMBER,"For the html repr an option that is easy to implement would be to add `max-height` and `overflow-y: scroll` CSS properties here: https://github.com/pydata/xarray/blob/1e361ccb9123fe25acfd9e3364c911c1eec7d9db/xarray/static/css/style.css#L256-L261
I don't think the default browser scrollbar will look very pretty inside the repr, but it might be OK if we don't set max-height to a too small value.
A ""click to expand"" UI would certainly look prettier, but I doubt it would be easy to implement that in pure-CSS. ""Expand on hover"" is easier but that would be quite annoying UX I think.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1633513067
https://github.com/pydata/xarray/issues/7563#issuecomment-1463633814,https://api.github.com/repos/pydata/xarray/issues/7563,1463633814,IC_kwDOAMm_X85XPUeW,4160723,2023-03-10T10:59:07Z,2023-03-10T10:59:07Z,MEMBER,"Thanks for the report @lkugler !
Directly assigning a multi-index like `mda['position'] = midx` is now ambiguous because all levels of the multi-index are now exposed as actual coordinates. We should provide a temporary fix or at least issue a warning. A proper way to assign a pandas multi-index is implemented in #7368. In the meantime, the workaround below should work for your example (it might stop working in the future, though):
```python
mda.coords.update(xr.Dataset(coords={""position"": midx}))
```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1600983717
https://github.com/pydata/xarray/pull/7530#issuecomment-1440178393,https://api.github.com/repos/pydata/xarray/issues/7530,1440178393,IC_kwDOAMm_X85V12DZ,4160723,2023-02-22T14:51:32Z,2023-02-22T14:51:32Z,MEMBER,"I've imported the generated PDF in inkscape, fixed the font and converted it to paths, added a small margin and exported it as svg. I attach the file here, @dcherian feel free to add it in this PR.

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1584791395
https://github.com/pydata/xarray/issues/7539#issuecomment-1438377578,https://api.github.com/repos/pydata/xarray/issues/7539,1438377578,IC_kwDOAMm_X85Vu-Zq,4160723,2023-02-21T12:13:18Z,2023-02-21T12:13:18Z,MEMBER,"In general I also find that `xr.concat` is a powerful feature (incl. auto-alignment and merge options) at the expense that it may sometimes (often?) be hard to reason about. Would it make sense to have a simpler version? To avoid making `xr.concat` signature even more complicated, maybe another top-level function like `xr.concat_noalign`? Or any suggestion in #7045 to deactivate auto-alignment Xarray-wise. Or indeed at least make it clearer in the docs that something like `drop_indexes` or `reset_coords` should be used first in order to skip auto-alignment for some variables.
> I don't really know what I would prefer to happen with the coordinates. I guess to have created a time coordinate of size {new: 2, time: 4, cols: 2}, but then I don't know what that implies for the underlying index. @benbovy do you have any thoughts?
I guess easiest for a concat version with no auto-alignment would be to drop the index when such case happens. (note: one problem in your example is that the Xarray data model still does not allow having a multi-dimensional ""time"" variable with ""time"" as also one of its dimensions, but this could be now relaxed).
I've been also wondering whether some kind of `NDPandasIndex` would make any sense, i.e., a n-d coordinate variable with an internal 1-d (flattened) pandas index and some logic to convert between those n-d vs. 1-d spaces. This is the kind of approach used in xoak for using a kd-tree with coordinates of arbitrary dimensions, where labels in the form of nd-arrays for each coordinate are mapped into the `[n_points, n_coords]` shape (and inversely for getting the integer indices back as nd-arrays). This works well for point-wise indexing, but I doubt it would be very useful beyond that (e.g., slicing, etc.).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1588461863
https://github.com/pydata/xarray/issues/7076#issuecomment-1431496828,https://api.github.com/repos/pydata/xarray/issues/7076,1431496828,IC_kwDOAMm_X85VUuh8,4160723,2023-02-15T14:54:27Z,2023-02-15T14:54:27Z,MEMBER,"@ACHMartin the issue is when you do `newds['z'] = stacked.z`. In the last versions of Xarray multi-index levels have each their own (real) coordinates, for consistency and clarity we soon won't support assigning a multi-index to a single coordinate of a Dataset / DataArray like that.
I think that in other places we still do support it with a deprecation notice, but apparently in your example this is not the case. `unstack` doesn't work because the multi-index(es) and the coordinates of `newds` are not consistent.
I don't know exactly what is your real problem, but from now on you should avoid implicitly assign a multi-index with `xr_obj[""my_coord""] = ...` or `xr_obj.assign(my_coord=...)`. Instead you should re-create the multi-index, e.g., in your minimal example `newds = newds.set_index(z=[""across"", ""along""])`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1384465119
https://github.com/pydata/xarray/issues/7463#issuecomment-1427538729,https://api.github.com/repos/pydata/xarray/issues/7463,1427538729,IC_kwDOAMm_X85VFoMp,4160723,2023-02-13T08:31:49Z,2023-02-13T09:26:10Z,MEMBER,"There are two issues:
- whether we should continue allowing IndexVariable data be updated in place via `.data` property. IMO we should really deprecate it, especially that now it is possible to have custom, possibly expensive index structures built from one or more coordinates.
- whether `deep=True` should deep copy the Xarray index objects. I don't have strong opinion on this. There is a similar discussion on the pandas side: https://github.com/pandas-dev/pandas/issues/19862. I wonder if we reverted the change here because some high-level operations in Xarray were *by default* deep copying the indexes? I don't think we would want such behavior unless the user explicitly sets `deep=True` somewhere?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1550792876
https://github.com/pydata/xarray/issues/7463#issuecomment-1426311006,https://api.github.com/repos/pydata/xarray/issues/7463,1426311006,IC_kwDOAMm_X85VA8de,4160723,2023-02-10T20:31:10Z,2023-02-10T20:38:48Z,MEMBER,"Yes I think we should, but I might have missed the rationale behind allowing it if this is intentional.
EDIT: perhaps better to issue a warning first to avoid some breaking change. We could also try to fix it (make a deep copy) at the same time as deprecating it, but that might be tricky without again introducing performance regressions.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1550792876
https://github.com/pydata/xarray/issues/7463#issuecomment-1426299770,https://api.github.com/repos/pydata/xarray/issues/7463,1426299770,IC_kwDOAMm_X85VA5t6,4160723,2023-02-10T20:25:12Z,2023-02-10T20:25:12Z,MEMBER,"I think that the reverting change in IndexVariable came after refactoring copy in Xarray introduced some performance regression (https://github.com/pydata/xarray/pull/7209#issuecomment-1305593478).
I didn't see #1463 (https://github.com/pydata/xarray/issues/1463#issuecomment-340454702), though. It feels weird to me that we can mutate an IndexVariable via its `data` property, considering that the underlying index is immutable. IIUC `xarr2.x.data[0] = 45` replaces the full index with a new one? I'm not sure if it is a good idea to allow this. For a pandas index that's probably OK (it is reasonably cheap to rebuild a new index) but for a custom index that is expensive to build (e.g., kd-tree) I don't think this behavior is desirable.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1550792876
https://github.com/pydata/xarray/issues/2028#issuecomment-1422518769,https://api.github.com/repos/pydata/xarray/issues/2028,1422518769,IC_kwDOAMm_X85Uyenx,4160723,2023-02-08T12:29:27Z,2023-02-08T12:41:00Z,MEMBER,"@gewitterblitz there is a kdtree-based index example in #7041 that works with multi-dimensional coordinates. You could also have a look at https://xoak.readthedocs.io/en/latest/ (it doesn't use Xarray indexes - soon hopefully - so the current API is via Xarray accessors).
EDIT: seeing your previous https://github.com/pydata/xarray/issues/2028#issuecomment-921926536, not sure how you could use slices for label selection using those indexes as I don't think the wrapped scipy / sklearn kdtree objects support range queries. Other spatial indexes may support it (e.g., there's an example in https://github.com/martinfleis/xvec of selecting points using a `shapely.box`, although currently it only supports 1-d geometry coordinates).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,309691307
https://github.com/pydata/xarray/issues/2028#issuecomment-1421222703,https://api.github.com/repos/pydata/xarray/issues/2028,1421222703,IC_kwDOAMm_X85UtiMv,4160723,2023-02-07T18:01:39Z,2023-02-07T18:01:39Z,MEMBER,"@aberges-grd If your non-index coordinate supports it (I guess it does?), you could assign a default index to the coordinate with `set_xindex` and then use slices for selection like any other (dimension) coordinate backed by a pandas index.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,309691307
https://github.com/pydata/xarray/issues/7405#issuecomment-1384164579,https://api.github.com/repos/pydata/xarray/issues/7405,1384164579,IC_kwDOAMm_X85SgKzj,4160723,2023-01-16T14:42:23Z,2023-01-16T14:42:23Z,MEMBER,Yes thanks for the report. Looks like `Dataset._coord_names` got out of sync somehow.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1512708767
https://github.com/pydata/xarray/pull/7368#issuecomment-1382070832,https://api.github.com/repos/pydata/xarray/issues/7368,1382070832,IC_kwDOAMm_X85SYLow,4160723,2023-01-13T16:13:16Z,2023-01-13T16:13:16Z,MEMBER,"Thanks for the review @shoyer. I addressed your comments.
Everything seems OK except a rather annoying mypy error that I'm struggling with:
The `DataAlignable` type variable should now encompass both `DataWithCoords` and `Coordinates`, since in this PR we add alignment support for the latter. I somewhat naively tried the options below without success:
- `DataAlignable = TypeVar(""DataAlignable"", bound=DataWithCoords | Coordinates)` -> doesn't work since we cannot mix DataWithCoords and Coordinates when aligning each object (input type = output type)
- `DataAlignable = TypeVar(""DataAlignable"", bound=DataWithCoords, Coordinates)` -> doesn't work with subclasses
- `DataAlignable = TypeVar(""DataAlignable"", Dataset, DataArray, Coordinates)` -> doesn't work with generic types `T_Dataset`, etc.?
- I even tried using a Protocol
@headtr1ck @Illviljan any idea? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1485037066
https://github.com/pydata/xarray/pull/7418#issuecomment-1372908509,https://api.github.com/repos/pydata/xarray/issues/7418,1372908509,IC_kwDOAMm_X85R1Ovd,4160723,2023-01-05T23:08:15Z,2023-01-05T23:08:15Z,MEMBER,"Again, there is likely more good reasons merging the Datatree code with Xarray than not doing it, but IMHO such decision should be made very carefully. You certainly do know better than me what positive vs. negative impacts it would have here! I'm just speaking generally from my experience of having struggled while doing some heavy refactoring in Xarray recently :)","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1519552711
https://github.com/pydata/xarray/pull/7418#issuecomment-1372888139,https://api.github.com/repos/pydata/xarray/issues/7418,1372888139,IC_kwDOAMm_X85R1JxL,4160723,2023-01-05T22:46:05Z,2023-01-05T22:46:05Z,MEMBER,"I don't have strong opinions for or against including datatree in Xarray. It indeed makes sense if it is using many Xarray internals and if there are many existing or potential applications for it. Additional load (CI) is fine if datatree doesn't bring any extra dependency and won't do so in the near future (which seems to be the case).
> Datatree should become a first-class Xarray object
> Since Datatree sits above DataArray and Dataset, it should not interfere with any of our existing API.
Would it mean that if someone wants to later add *any feature ""x"" or ""y""* into Xarray, they just need implementing the feature for Dataset (and possibly DataArray) and it will be guaranteed to work with Datatree? (I guess so but I'm not familiar enough with Datatree to know it for sure).
Otherwise, if there is any extra implementation effort required to make feature ""x"" or ""y"" work with Datatree, then I'm concerned about the additional burden or obstacle for future contributors and maintainers. Or we could say that this is OK to leave datatree support and wait for someone to take care of it later, but I don't think it is ideal to have such non-synchronized state within Xarray itself.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1519552711
https://github.com/pydata/xarray/pull/7368#issuecomment-1359003371,https://api.github.com/repos/pydata/xarray/issues/7368,1359003371,IC_kwDOAMm_X85RAL7r,4160723,2022-12-20T08:34:06Z,2022-12-20T08:34:06Z,MEMBER,"I'm wondering if instead of `Coordinates.from_pandas_multiindex()` we might want to provide a more generic constructor available as an extension point? For example:
`Coordinates.from_index(index_obj: Any, *, factory=None, **kwargs=None)`
`factory` could be guessed from the type of `index_obj`. Xarray would support by default the `pandas.MultiIndex` and `pandas.Index` types. Like for IO backends, we could provide a `CoordinatesFactoryEntrypoint` so that it could support other index types.
One downside is that specific (mandatory?) options like `dim` for a pandas (multi-)index are not directly visible.
Would it be useful or is it overkill?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1485037066
https://github.com/pydata/xarray/pull/7382#issuecomment-1357719218,https://api.github.com/repos/pydata/xarray/issues/7382,1357719218,IC_kwDOAMm_X85Q7Say,4160723,2022-12-19T14:03:56Z,2022-12-19T14:03:56Z,MEMBER,"I don't know if the optimizations added here will benefit a large set of use cases (it took 6 months before seeing an issue report), but it is worth for at least a few of them. This is ready I think (added some benchmarks).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1498386428
https://github.com/pydata/xarray/pull/7382#issuecomment-1353034657,https://api.github.com/repos/pydata/xarray/issues/7382,1353034657,IC_kwDOAMm_X85Qpauh,4160723,2022-12-15T13:05:55Z,2022-12-15T13:05:55Z,MEMBER,"Quick benchmark taking the example in #7376 (it seems even much faster than in version 2022.3.0!)
```python
# version 2022.3.0
%timeit ds.assign(foo=~ds[""d3""])
# 22.5 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
# main branch
%timeit ds.assign(foo=~ds[""d3""])
# 193 ms ± 1.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# this PR
%timeit ds.assign(foo=~ds[""d3""])
# 1.01 ms ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1498386428
https://github.com/pydata/xarray/issues/7376#issuecomment-1352989233,https://api.github.com/repos/pydata/xarray/issues/7376,1352989233,IC_kwDOAMm_X85QpPox,4160723,2022-12-15T12:27:37Z,2022-12-15T12:27:37Z,MEMBER,"> Thanks @benbovy! Are you also aware of the issue with plain assign being slower on MultiIndex (comment above: https://github.com/pydata/xarray/issues/7376#issuecomment-1350446546)? Do you know what could be the issue there by any chance?
I see that in `ds.assign(foo=~ds[""d3""])`, the coordinates of `~ds[""d3""]` are dropped (#2087), which triggers re-indexing of the multi-index when aligning `ds` with `~ds[""d3""]`. This is a quite expensive operation.
It is not clear to me what would be a clean fix (see, e.g., #2180), but we could probably optimize the alignment logic so that when all unindexed dimension sizes match with indexed dimension sizes (like your example) no re-indexing is performed.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1495605827
https://github.com/pydata/xarray/pull/7368#issuecomment-1352874809,https://api.github.com/repos/pydata/xarray/issues/7368,1352874809,IC_kwDOAMm_X85Qozs5,4160723,2022-12-15T10:42:59Z,2022-12-15T10:42:59Z,MEMBER,OK this is now ready for review (cc @shoyer).,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1485037066
https://github.com/pydata/xarray/pull/7368#issuecomment-1352818155,https://api.github.com/repos/pydata/xarray/issues/7368,1352818155,IC_kwDOAMm_X85Qol3r,4160723,2022-12-15T09:59:03Z,2022-12-15T09:59:03Z,MEMBER,"> Maybe there's some way to optimize that? I don't know if we can completely avoid it with the solution implemented in this PR, though. Promoting Coordinates is pretty clean and future proof IMO (assuming that we'll further refactor Coordinates to actually store variables and indexes, i.e., not as a proxy anymore). Is the (minor? temporary?) regression in performance acceptable and can we just leave it like that for now?
Fixed in [193dad3](https://github.com/pydata/xarray/pull/7368/commits/193dad3393565b6c007c0eb0a2d47b5ade874571) (with some reasonable special case added in `merge_core`).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1485037066
https://github.com/pydata/xarray/pull/7368#issuecomment-1352310432,https://api.github.com/repos/pydata/xarray/issues/7368,1352310432,IC_kwDOAMm_X85Qmp6g,4160723,2022-12-14T22:33:23Z,2022-12-15T01:08:41Z,MEMBER,"I did some profiling to find the cause of the decrease in performance reported in the benchmarks (dataset creation). In summary, this is explained by a `Coordinates` object (built from the `coords` mapping) that is now included in objects to align when merging data vars and coordinates. Previously all non DataArray objects in the `coords` mapping were excluded from alignment (in `deep_align`). The introduced overhead comes from a call to `Coordinates._reindex_callback()`, which (I think?) should do no more than shallow copies and/or xarray wrapping stuff. In the benchmark report this is only marked as significant when creating small datasets (1.5-2x slower), and it becomes insignificant for datasets with more data variables.
Maybe there's some way to optimize that? I don't know if we can completely avoid it with the solution implemented in this PR, though. Promoting `Coordinates` is pretty clean and future proof IMO (assuming that we'll further refactor `Coordinates` to actually store variables and indexes, i.e., not as a proxy anymore). Is the (minor? temporary?) regression in performance acceptable and can we just leave it like that for now?
More details about the new workflow implemented in this PR when creating a new Dataset:
- if Dataset's `coords` argument is a ""simple"" mapping, it is first internally converted into a `Coordinates` object, with the creation of default indexes for dimension coordinates
- if one or more DataArray objects are given in `coords`, their coordinates (variables + indexes) are extracted and merged with the other input coordinates
- see the implementation in `xarray.core.coordinates.create_coords_with_default_indexes`
- otherwise, just reuse the `Coordinates` object passed as `coords`
- coordinates are then merged with data variables
- the `Coordinates` object is aligned with every other ""alignable"" object found in `data_vars`
- coordinate indexes (if any) are passed explicitly to `align` so they are used in priority
- explicitly using a `Coordinates` object skips the creation of default indexes during merging (in `collect_variables_and_indexes()`)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1485037066
https://github.com/pydata/xarray/issues/7376#issuecomment-1352318926,https://api.github.com/repos/pydata/xarray/issues/7376,1352318926,IC_kwDOAMm_X85Qmr_O,4160723,2022-12-14T22:43:11Z,2022-12-14T22:47:37Z,MEMBER,"> Are you aware of any workarounds for this issue with the current code (assuming I would like to preserve MultiIndex).
Unfortunately I don't know about any workaround that would preserve the MultiIndex. Depending on how you use the multi-index, you could instead set two single indexes for ""i1"" and ""i2"" respectively (it is supported now, use `set_xindex()`). I think that groupby will work well in that case. If you really need a multi-index, you could still build it afterwards from the groupby result.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1495605827
https://github.com/pydata/xarray/issues/7376#issuecomment-1350738301,https://api.github.com/repos/pydata/xarray/issues/7376,1350738301,IC_kwDOAMm_X85QgqF9,4160723,2022-12-14T09:40:57Z,2022-12-14T09:40:57Z,MEMBER,"Thanks for the report @ravwojdyla.
Since #5692, multi-indexes level have each their own coordinate variable so copying takes a bit more time as we need to create more variables. Not sure what's happening with `_maybe_cast_to_cftimeindex`, though.
The real issue here, however, is the same than in #6836. In your example, `.groupby(""i1"")` creates 400 000 groups whereas it should create only 4 groups.","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,1495605827
https://github.com/pydata/xarray/pull/7368#issuecomment-1349321538,https://api.github.com/repos/pydata/xarray/issues/7368,1349321538,IC_kwDOAMm_X85QbQNC,4160723,2022-12-13T18:03:17Z,2022-12-13T18:03:17Z,MEMBER,I think this is ready for review!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1485037066
https://github.com/pydata/xarray/pull/7368#issuecomment-1347327518,https://api.github.com/repos/pydata/xarray/issues/7368,1347327518,IC_kwDOAMm_X85QTpYe,4160723,2022-12-12T21:05:56Z,2022-12-12T21:05:56Z,MEMBER,"In order to skip creating default indexes when passing a `Coordinates` object, I first tried a small refactor but in the end I found that the cleanest way to do it was to support alignment for `Coordinates`. I think it makes sense now that Coordinates is part of Xarray's public API as a ""stand-alone"" container like Dataset and DataArray.
The ""no default index with Coordinates"" behavior should be consistent Xarray-wise, i.e., for DataArray / Dataset constructors and also `assign_coords`, `update`, etc.
Sorry this PR is getting big, but hopefully this is almost ready (still a few tests to fix or to add).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1485037066
https://github.com/pydata/xarray/pull/7368#issuecomment-1346344694,https://api.github.com/repos/pydata/xarray/issues/7368,1346344694,IC_kwDOAMm_X85QP5b2,4160723,2022-12-12T11:55:10Z,2022-12-12T11:55:10Z,MEMBER,"> My suggestion would be:
coords passed as a dict: create default indexes
coords passed as IndexedCoordinates: do not create defaults
So if we already have some coordinate data as a dict but don't want any default index, we would need to do this:
```python
ds = xr.Dataset(coords=xr.Coordinates(my_coord_dict))
```
instead of this:
```python
ds = xr.Dataset(coords=my_coord_dict)
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1485037066
https://github.com/pydata/xarray/pull/7368#issuecomment-1346091151,https://api.github.com/repos/pydata/xarray/issues/7368,1346091151,IC_kwDOAMm_X85QO7iP,4160723,2022-12-12T08:36:09Z,2022-12-12T08:36:09Z,MEMBER,"Thanks @shoyer, I've been thinking about similar short/long term plans although so far I haven't figured out how to implement your point 3. I'll give it another try.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1485037066
https://github.com/pydata/xarray/pull/7368#issuecomment-1345314909,https://api.github.com/repos/pydata/xarray/issues/7368,1345314909,IC_kwDOAMm_X85QL-Bd,4160723,2022-12-10T16:59:44Z,2022-12-10T16:59:44Z,MEMBER,"> Long term, do you think it would make sense to merge together Indexes, Coordinates and IndexedCoordinates? They are sort of all containers for the same thing.
Yes I think so.
I'm actually trying to merge `IndexedCoordinates` with `Coordinates` but I'm stuck: the latter is abstract and I don't really see how I could refactor it together with `DatasetCoordinates` and `DataArrayCoordinates`. Do you have any idea on how best to proceed?
Ideally, I'd see `Coordinates` be exposed in Xarray's main namespace with at least the two following constructors:
```python
class Coordinates:
def __init__(
self,
coords: Mapping[Any, Any] | None = None,
indexes: Mapping[Any, Index] | None = None,
):
# Similar to Dataset.__init__ but without the need
# to merge coords and data vars...
# Probably ok to allow more flexibility / less safety here?
...
@classmethod
from_pandas_multiindex(cls, index: pd.MultiIndex, dim: str):
...
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1485037066
https://github.com/pydata/xarray/pull/7368#issuecomment-1344046801,https://api.github.com/repos/pydata/xarray/issues/7368,1344046801,IC_kwDOAMm_X85QHIbR,4160723,2022-12-09T09:13:24Z,2022-12-09T09:16:35Z,MEMBER,"I added `IndexedCoordinates.merge_coords` so that it is easier to combine different coordinates to pass to a new Dataset / DataArray, e.g.,
```python
midx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""one"", ""two""))
coords = xr.IndexedCoordinates.from_pandas_multiindex(midx, ""x"")
coords = coords.merge_coords({""y"": [0, 1, 2]})
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'a' 'a' 'b' 'b'
# * two (x) int64 1 2 1 2
# * y (y) int64 0 1 2
ds = xr.Dataset(coords=coords)
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'a' 'a' 'b' 'b'
# * two (x) int64 1 2 1 2
# * y (y) int64 0 1 2
# Data variables:
# *empty*
```
`IndexedCoordinates.merge_coords` is very much like `Coordinates.merge` except that it returns a new Coordinates object instead of a Dataset.
Or should we just use `merge`? It would require that:
- `Coordinates.merge` accepts `Mapping[Any, Any]` for its `other` argument. Only changing the type hint is enough here since the implementation already accepts any input passed to Dataset.
- When a Dataset is passed as `coords` argument to a new Dataset and DataArray, both variables and indexes should be extracted. It is already the case for Dataset but I think it only works for PandasIndex and PandasMultiIndex (default indexes & backwards compatibility).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1485037066
https://github.com/pydata/xarray/pull/7368#issuecomment-1344004727,https://api.github.com/repos/pydata/xarray/issues/7368,1344004727,IC_kwDOAMm_X85QG-J3,4160723,2022-12-09T08:32:28Z,2022-12-09T09:14:17Z,MEMBER,"`IndexedCoordinates` and `Indexes` have a lot of overlap. At some point we might consider merging the two classes, like @shoyer suggests in https://github.com/pydata/xarray/pull/7214#issuecomment-1295283938. The main difference is that one is a mapping of coordinates and the other is a mapping of indexes. `IndexedCoordinates` is mostly reusing `Indexes` and `Dataset` under the hood, it is only a facade.
Alternatively to an `IndexedCoordinates` subclass I was wondering if we could reuse the `Coordinates` base class? There's some benefit of providing a subclass:
- besides specific constructors like `.from_pandas_multiindex()` it has a generic `__init__` for advanced use cases. Not sure it is a good idea to add this constructor to the base class?
- unlike Coordinates, IndexedCoordinates is immutable.
What if the `Indexes` class was a facade based on `IndexedCoordinates` instead of the other way around? It would probably make more sense but it would also be a bigger refactor. I've chosen the easy way :). ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1485037066
https://github.com/pydata/xarray/pull/7347#issuecomment-1335509983,https://api.github.com/repos/pydata/xarray/issues/7347,1335509983,IC_kwDOAMm_X85PmkPf,4160723,2022-12-02T16:33:59Z,2022-12-02T16:33:59Z,MEMBER,Great! (I was worried that it would mess up #7345).,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1472483025
https://github.com/pydata/xarray/pull/7347#issuecomment-1334986216,https://api.github.com/repos/pydata/xarray/issues/7347,1334986216,IC_kwDOAMm_X85PkkXo,4160723,2022-12-02T09:35:42Z,2022-12-02T09:35:42Z,MEMBER,@dcherian we can merge this after #7345 to make things easier for the release? ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1472483025
https://github.com/pydata/xarray/issues/7045#issuecomment-1326262197,https://api.github.com/repos/pydata/xarray/issues/7045,1326262197,IC_kwDOAMm_X85PDSe1,4160723,2022-11-24T10:35:02Z,2022-11-24T10:35:02Z,MEMBER,"I find the analogy with relational databases quite meaningful!
Rectangular grids likely have been the primary use case in Xarray for a long time, but I wonder to which extent it is the case nowadays. Probably a good question to ask for the next user survey?
Interestingly, the [2021 user survey results](https://github.com/xarray-contrib/user-survey/blob/main/2021.ipynb) (*) show that ""interoperability with pandas"" is not a critical feature while ""label-based indexing, interpolation, groupby, reindexing, etc."" is most important, although the description of the latter is rather broad. It would be interesting to compute the correlation between these two variables. The results also show that ""more flexible indexing (selection, alignment)"" is very useful or critical for 2/3 of the participants.
Not sure how to interpret those results within the context of this discussion, though.
(*) The [2022 user survey results](https://github.com/xarray-contrib/user-survey/blob/c03361f6ac8c270a89cc97c4df20de26c923badb/2021-vs-2022.ipynb) doesn't show significant differences in general
> suppose one could in principle have an array with coordinates such that none of the coordinates aligned with any particular axis, but it seems improbable.
Not that improbable for unstructured meshes, curvilinear grids, staggered grids, etc. Xarray is often chosen to handle them too (e.g., [uxarray](https://github.com/UXARRAY/uxarray), [xgcm](https://github.com/xgcm/xgcm)).","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1376109308
https://github.com/pydata/xarray/issues/7297#issuecomment-1324753837,https://api.github.com/repos/pydata/xarray/issues/7297,1324753837,IC_kwDOAMm_X85O9iOt,4160723,2022-11-23T09:17:33Z,2022-11-23T09:17:33Z,MEMBER,"> But does this still work properly with broadcasting? For example, let's say there is another data variable b (midx) and an operation is done like ds_stacked['c'] = ds_stacked.a + ds_stacked.b. Then it should be that c (midx) and a (x) should be ""repeated"" to midx.x
I think it would keep things much simpler if we consider ""x"" and ""midx"" as two separate dimensions in the stacked Dataset, i.e., ds_stacked['c'] would result in a 2-d array (x, midx). There's no such thing like a ""midx.x"" dimension in Xarray.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1454832041
https://github.com/pydata/xarray/issues/7297#issuecomment-1323849354,https://api.github.com/repos/pydata/xarray/issues/7297,1323849354,IC_kwDOAMm_X85O6FaK,4160723,2022-11-22T15:24:53Z,2022-11-22T15:36:46Z,MEMBER,"The last example in your comment is probably the most meaningful one:
```
#
# Dimensions: (x: 2, midx: 4)
# Coordinates:
# * midx (midx) object MultiIndex
# * x (midx) int32 1 1 2 2
# * y (midx) int32 3 4 3 4
# Data variables:
# a (x) int32 6 7
```
To avoid name conflicts, we could just discard the original dimension coordinates x and y. Like here above, ""x"" becomes a dimension without coordinate. In that example, when unstacking we would retrieve the ""x"" dimension coordinate like in the original dataset.
(note: I think it is now possible to have a dimension ""x"" and a coordinate ""x"" with different dimensions, but I haven't checked).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1454832041
https://github.com/pydata/xarray/issues/7297#issuecomment-1323478134,https://api.github.com/repos/pydata/xarray/issues/7297,1323478134,IC_kwDOAMm_X85O4qx2,4160723,2022-11-22T10:50:01Z,2022-11-22T10:50:01Z,MEMBER,"Interesting! I don't think that when adding stack / unstack we were thinking that variables with only a subset of the stacked dimensions would be a common use case.
I guess it would be possible to add some option to stack only the variables that have all the dimensions to be stacked, and leave the other variables unchanged? However, one problem with keeping the original dimension coordinates is that we would have name conflicts between the single index coordinates and the multi-index coordinates.
In your expected example, the ""x"" coordinate is part of the multi-index but it doesn't have the same dimension ""midx""? I find it rather confusing.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1454832041
https://github.com/pydata/xarray/issues/7278#issuecomment-1316230358,https://api.github.com/repos/pydata/xarray/issues/7278,1316230358,IC_kwDOAMm_X85OdBTW,4160723,2022-11-16T02:57:48Z,2022-11-16T02:57:48Z,MEMBER,"👍
Use it at your own risk 😉 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1444752393
https://github.com/pydata/xarray/issues/7250#issuecomment-1313866757,https://api.github.com/repos/pydata/xarray/issues/7250,1313866757,IC_kwDOAMm_X85OUAQF,4160723,2022-11-14T14:45:39Z,2022-11-14T14:45:39Z,MEMBER,"That's a bug in this method: https://github.com/pydata/xarray/blob/6f9e33e94944f247a5c5c5962a865ff98a654b30/xarray/core/indexing.py#L1528-L1532
Xarray array wrappers for pandas indexes keep track of the original dtype and should restore it when converted into numpy arrays. Something like this should work for the same method:
```python
def __array__(self, dtype: DTypeLike = None) -> np.ndarray:
if dtype is None:
dtype = self.dtype
if self.level is not None:
return np.asarray(
self.array.get_level_values(self.level).values, dtype=dtype
)
else:
return super().__array__(dtype)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1433998942
https://github.com/pydata/xarray/issues/6836#issuecomment-1313748084,https://api.github.com/repos/pydata/xarray/issues/6836,1313748084,IC_kwDOAMm_X85OTjR0,4160723,2022-11-14T13:55:02Z,2022-11-14T13:55:02Z,MEMBER,"> we can fix that in safe_cast_to_index()
...we *cannot* fix that in `safe_cast_to_index()` (or we can add a parameter to specify the desired result).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1318992926
https://github.com/pydata/xarray/issues/7282#issuecomment-1313741685,https://api.github.com/repos/pydata/xarray/issues/7282,1313741685,IC_kwDOAMm_X85OTht1,4160723,2022-11-14T13:51:21Z,2022-11-14T13:51:21Z,MEMBER,Thanks @jjpr-mit and @mschrimpf for the report. See https://github.com/pydata/xarray/issues/6836#issuecomment-1313739883.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1445905299
https://github.com/pydata/xarray/issues/6836#issuecomment-1313739883,https://api.github.com/repos/pydata/xarray/issues/6836,1313739883,IC_kwDOAMm_X85OThRr,4160723,2022-11-14T13:49:47Z,2022-11-14T13:49:47Z,MEMBER,From #7282 it looks like we need to convert the multi-index level to a single index when casting the group to an index. And from #7105 we can fix that in `safe_cast_to_index()` (sometimes the full multi-index is expected) so we probably need a special case in `groupby`.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1318992926
https://github.com/pydata/xarray/issues/7278#issuecomment-1311942192,https://api.github.com/repos/pydata/xarray/issues/7278,1311942192,IC_kwDOAMm_X85OMqYw,4160723,2022-11-11T16:52:54Z,2022-11-11T16:52:54Z,MEMBER,"You may look at the logic implemented in the `map_index_queries()` function in `xarray.core.indexing`. This function is still not public API, but it calls `.sel()` for each index object, which should be more stable (although experimental).
Eventually we'll probably make `merge_sel_results()` public too. It might be useful for third-party indexes.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1444752393
https://github.com/pydata/xarray/issues/6308#issuecomment-1305780610,https://api.github.com/repos/pydata/xarray/issues/6308,1305780610,IC_kwDOAMm_X85N1KGC,4160723,2022-11-07T15:28:35Z,2022-11-07T15:28:35Z,MEMBER,"The kind of data wrapped in an Xarray Dataset (e.g., a Numpy array, a Dask array or any other array #5648) is already something useful that `xr.doctor` or `xr.describe` may tell!
From my experience of introducing Xarray to new users, they often completely ignore what is under the hood until something or someone makes them aware, likely after they experience some weird behavior or performance issue that is hard to figure out by themselves. Xarray objects are flexible container wrappers connected to a wide range of other Python libraries, such that it is hard to give a short introduction that covers all the important aspects (lazy / non-lazy, chunked / non-chunked, etc.). For example, it may be possible that someone who has never heard of Dask nor Zarr follows an Xarray tutorial that starts by opening a chunked dataset from a zarr store. In this case the rich repr of the Xarray Dataset doesn't even help.
Rather than a performance report or a profiling tool, the proposal here (still very elusive) is to provide a helper function that returns some information and explanation in plain english (why not with some hyperlinks, pretty printing, etc.) that would help users making sense of an Xarray object and its wrapped data/metadata. Some kind of interactive documentation very specific to the actual Xarray object. Some kind of smart tool that would partially ""replace"" custom (though very basic) user support.
","{""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 2, ""rocket"": 0, ""eyes"": 0}",,1151751524
https://github.com/pydata/xarray/pull/7209#issuecomment-1305593478,https://api.github.com/repos/pydata/xarray/issues/7209,1305593478,IC_kwDOAMm_X85N0caG,4160723,2022-11-07T13:09:05Z,2022-11-07T13:09:05Z,MEMBER,"The change in `Variable.to_index_variable` seems sensible (not sure when one wants a deep copy of an `IndexVariable` or an Xarray / Pandas index).
`to_index_variable` may be called in some core functions of Xarray internals (e.g., in `as_variable()`) so it might be tricky to benchmark its effect Xarray-wise. Perhaps it would be good to track it down in the original issue #7181?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1421441672
https://github.com/pydata/xarray/pull/7214#issuecomment-1297046405,https://api.github.com/repos/pydata/xarray/issues/7214,1297046405,IC_kwDOAMm_X85NT1uF,4160723,2022-10-31T12:54:50Z,2022-10-31T12:54:50Z,MEMBER,"Thanks for the suggestion @shoyer, in general I like it very much! ""Coordinates possibly baked by one or more indexes"" feels much more natural than ""indexes and their corresponding coordinates"". Even though indexes have been promoted as 1st class citizens in the data model, their right place should still be in the background compared to coordinates. So having a `Coordinates` object that encapsulates the indexes makes a lot of sense to me.
My main concern is about the timing, as such a broader refactor might postpone some work in progress on the public API and the documentation. Ideally this shouldn't discourage users to start experimenting with custom indexes and building an ecosystem around it, as soon as possible.
There might be a fast path towards your suggestion, at least regarding the public facing API (your points 1 and 4):
- Keep ""private"" the constructor of `Indexes` and keep it immutable.
- Add a new `IndexedCoordinates(Coordinates)` class. Unlike `DatasetCoordinates` and `DataArrayCoordinates`, it would have a public constructor and/or alternative class methods (e.g., `.from_pandas_multi_index()` suggested by @dcherian)
- In general, passing any `Coordinates` object to `coords` would assign both the coordinates and the indexes.
This would let us the possibility to achieve a broader (mostly internal) refactor of `Indexes` and `Coordinates` objects later without the risk of introducing too much breaking changes.
Alternatively, we could just wait for that refactor to finish before implementing explicit assignment of coordinates and indexes. We already have `.set_xindex()` and `.drop_indexes()` that are relevant and we could wait before deprecating `xr.Dataset(coords={""x"": pandas_midx})`. Not sure when such big refactor will happen, though, the wait could be long.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422543378
https://github.com/pydata/xarray/pull/7214#issuecomment-1294783661,https://api.github.com/repos/pydata/xarray/issues/7214,1294783661,IC_kwDOAMm_X85NLNSt,4160723,2022-10-28T09:49:02Z,2022-10-28T09:49:02Z,MEMBER,"> not necessarily do consistency checks (beyond verifying that the coordinate variables exist).
I'd just want to add that, from my experience with debugging multi-index issues, it is hard even for advanced users to see what's going wrong when coordinates and indexes are not consistent.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422543378
https://github.com/pydata/xarray/pull/7214#issuecomment-1294771427,https://api.github.com/repos/pydata/xarray/issues/7214,1294771427,IC_kwDOAMm_X85NLKTj,4160723,2022-10-28T09:38:22Z,2022-10-28T09:38:22Z,MEMBER,"> Maybe a more generic Indexes class method that could be reused by 3rd-party indexes too? E.g., via some kind of hook or entrypoint...
An `Indexes` accessor? Or this is going too far? 🙂 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422543378
https://github.com/pydata/xarray/pull/7214#issuecomment-1293946521,https://api.github.com/repos/pydata/xarray/issues/7214,1293946521,IC_kwDOAMm_X85NIA6Z,4160723,2022-10-27T19:04:19Z,2022-10-27T19:52:21Z,MEMBER,"> Explicitly providing indexes is an advanced user feature.
Agreed. However, `xr.Dataset(coords={""x"": pandas_midx})` is something that presumably a lot of users rely on (it is used extensively in Xarray's tests) and that we should really deprecate IMO. If we don't provide a convenient alternative, I expect many of those users will complain.
> it's easier to explicitly manipulate indexes in the form of a dict
While generally I also prefer handling plain `dict` objects over custom dict-like objects, here I don't see much reasons of manipulating Xarray index objects independently of their coordinate variables. `Indexes` allows keeping them tied together, and it is already returned by `.xindexes`.
EDIT -- For more context: initially an `Indexes` object was almost equivalent to a `Frozen(obj._indexes)`. In #5692 I tried hard and struggled to keep dealing with separate dicts of indexes and indexed variables, but in the end it made things much easier to encapsulate the variables in `Indexes`, which is also used internally in different places. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422543378
https://github.com/pydata/xarray/pull/7214#issuecomment-1293902008,https://api.github.com/repos/pydata/xarray/issues/7214,1293902008,IC_kwDOAMm_X85NH2C4,4160723,2022-10-27T18:21:02Z,2022-10-27T18:21:02Z,MEMBER,"> How about Indexes.from_pandas_multi_index() classmethod?
Yes that would make sense. However, it would be adding another `pandas.MultiIndex` special case while we'd like to remove them in Xarray. Maybe a more generic `Indexes` class method that could be reused by 3rd-party indexes too? E.g., via some kind of hook or entrypoint... The tricky thing is that arguments would probably differ much from one index type to another.
> 1. does indexes get merged with existing ._indexes?
Indexes are not merged together but the new / replaced coordinate variables must be compatible with the other variables of the dataset. `Dataset.assign_indexes(indexes)` is actually implemented like this:
```python
def assign_indexes(self, indexes: Indexes[Index]):
ds_indexes = Dataset(indexes=indexes)
return (
self
# prepare drop-in index / coordinate replacement
.drop_vars(indexes, errors=""ignore"")
# ensure the new indexes / coordinates are compatible with the Dataset
.merge(
ds_indexes,
compat=""minimal"", # probably not the right option?
join=""override"", # fastest option? (no real effect because of `drop_vars`)
combine_attrs=""no_conflicts"",
)
)
```
> 2. Can we extract enough information from Index to have xr.merge(Indexes) -> Indexes work?
That is actually a good idea for https://github.com/pydata/xarray/pull/7214#issuecomment-1292089179! Not sure I would reuse `xr.merge()` for this as it would make the API messy, but why not an `xr.merge_indexes()` top-level function or an `Indexes.merge()` method?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422543378
https://github.com/pydata/xarray/pull/7221#issuecomment-1293860075,https://api.github.com/repos/pydata/xarray/issues/7221,1293860075,IC_kwDOAMm_X85NHrzr,4160723,2022-10-27T17:40:52Z,2022-10-27T17:40:52Z,MEMBER,"Thanks @hmaarrfk!
> I haven't fully understood why we had that code though?
Me neither. I don't remember ever seeing this assertion error raised while refactoring things. Any idea @shoyer? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198
https://github.com/pydata/xarray/pull/7222#issuecomment-1293624950,https://api.github.com/repos/pydata/xarray/issues/7222,1293624950,IC_kwDOAMm_X85NGyZ2,4160723,2022-10-27T14:37:10Z,2022-10-27T14:37:10Z,MEMBER,"Thanks @hmaarrfk!
> I think the rapid return, helps by about 40% is still pretty good.
Yes definitely. I think we just forgot to add it.
> However, I will argue that Aligner should really not be a class.
The reason of using a class is mainly for better code readability and also so that it is easier to refactor later. The alignment logic is really complex with lots of intermediate objects that are created and/or used at various stages. Probably using functions with some custom containers would have achieved the same goal, to be fair. This part of Xarray internals still deserves to be improved, but that would be a lot of work especially for such a critical piece of code in Xarray.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423321834
https://github.com/pydata/xarray/pull/7214#issuecomment-1293531607,https://api.github.com/repos/pydata/xarray/issues/7214,1293531607,IC_kwDOAMm_X85NGbnX,4160723,2022-10-27T13:31:24Z,2022-10-27T13:42:44Z,MEMBER,"I also added an `.assign_indexes()` method that may be quite convenient. Like for the constructors, it only accepts an `Indexes` instance.
```python
ds = xr.Dataset(coords={""x"": [4, 5, 6, 7]})
ds2 = xr.Dataset(coords={""x"": [1, 2, 3, 4]})
ds.assign_indexes(ds2.xindexes)
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) int64 1 2 3 4
# Data variables:
# *empty*
midx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""one"", ""two""))
indexes = wrap_pandas_multiindex(midx, ""x"")
ds.assign_indexes(indexes)
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'a' 'a' 'b' 'b'
# * two (x) int64 1 2 1 2
# Data variables:
# *empty*
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422543378
https://github.com/pydata/xarray/pull/7214#issuecomment-1293545325,https://api.github.com/repos/pydata/xarray/issues/7214,1293545325,IC_kwDOAMm_X85NGe9t,4160723,2022-10-27T13:41:50Z,2022-10-27T13:41:50Z,MEMBER,"@pydata/xarray I'd be very happy if you could share your thoughts about the examples shown in the last three comments. If you think the API looks good like that, then I will work on adding some tests and on the documentation.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422543378
https://github.com/pydata/xarray/pull/7214#issuecomment-1292089179,https://api.github.com/repos/pydata/xarray/issues/7214,1292089179,IC_kwDOAMm_X85NA7db,4160723,2022-10-26T13:54:22Z,2022-10-26T13:54:22Z,MEMBER,"Passing multiple indexes:
```python
midx1 = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""one"", ""two""))
midx2 = pd.MultiIndex.from_product([[""c"", ""d""], [3, 4]], names=(""three"", ""four""))
indexes1 = wrap_pandas_multiindex(midx1, ""x"")
indexes2 = wrap_pandas_multiindex(midx2, ""y"")
indexes = Indexes(
indexes=dict(**indexes1, **indexes2),
variables=dict(**indexes1.variables, **indexes2.variables)
)
ds = xr.Dataset(indexes=indexes)
#
# Dimensions: (x: 4, y: 4)
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'a' 'a' 'b' 'b'
# * two (x) int64 1 2 1 2
# * y (y) object MultiIndex
# * three (y) object 'c' 'c' 'd' 'd'
# * four (y) int64 3 4 3 4
# Data variables:
# *empty*
```
That's not looking super nice, but probably we can add some convenience function or `Indexes` method.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422543378
https://github.com/pydata/xarray/pull/7214#issuecomment-1291911349,https://api.github.com/repos/pydata/xarray/issues/7214,1291911349,IC_kwDOAMm_X85NAQC1,4160723,2022-10-26T11:47:57Z,2022-10-26T12:14:23Z,MEMBER,"I implemented option 3. We can still change or revert it later if it's not the best one.
A few examples:
```python
import pandas as pd
import xarray as xr
from xarray.indexes import wrap_pandas_multiindex
midx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""one"", ""two""))
```
It is now possible to pass a pandas multi-index to a Dataset like this:
```python
# this returns an `Indexes` object (indexes + coordinates)
indexes = wrap_pandas_multiindex(midx, ""x"")
ds = xr.Dataset(indexes=indexes)
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'a' 'a' 'b' 'b'
# * two (x) int64 1 2 1 2
# Data variables:
# *empty*
```
IMO the above should be preferred over passing it as a coordinate (should we deprecate it now?):
```python
ds_deprecated = xr.Dataset(coords={""x"": midx})
ds_deprecated.identical(ds)
# True
# eventually this would behave like this:
ds_midx_as_array = xr.Dataset(coords={""x"": midx})
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) object ('a', 1) ('a', 2) ('b', 1) ('b', 2)
# Data variables:
# *empty*
```
We can pass indexes around from one Xarray object to another, e.g.,
```python
da = xr.DataArray([1, 2, 3, 4], dims=""x"", indexes=ds.xindexes)
#
# array([1, 2, 3, 4])
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'a' 'a' 'b' 'b'
# * two (x) int64 1 2 1 2
```
Skip creating pandas indexes for dimension coordinates:
```python
ds_noindex = xr.Dataset(coords={""x"": [0, 1, 2]}, indexes={})
#
# Dimensions: (x: 3)
# Coordinates:
# x (x) int64 0 1 2
# Data variables:
# *empty*
ds_noindex.xindexes
# Indexes:
# *empty*
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422543378
https://github.com/pydata/xarray/pull/7214#issuecomment-1291638319,https://api.github.com/repos/pydata/xarray/issues/7214,1291638319,IC_kwDOAMm_X85M_NYv,4160723,2022-10-26T07:52:35Z,2022-10-26T07:52:35Z,MEMBER,"> For passing multiple indexes at once we could probably expand the Indexes API, e.g., with an .update() method.
Maybe with something else than `.update()` (let's keep `Indexes` an immutable collection?)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422543378
https://github.com/pydata/xarray/pull/7214#issuecomment-1291059643,https://api.github.com/repos/pydata/xarray/issues/7214,1291059643,IC_kwDOAMm_X85M9AG7,4160723,2022-10-25T19:50:57Z,2022-10-25T19:50:57Z,MEMBER,"Hmm I'm wondering what would be best between the options below regarding the types for the `indexes` argument:
1. `Indexes[Index]` | `Sequence[Indexes[Index] | None`
2. `Indexes[Index] | None`
3. `Mapping[Any, Index] | None`
4. Any other suggestion?
Option 1 is nice for passing multiple indexes, e.g.,
```python
pd_midx1 = pd.MultiIndex.from_arrays(..., names=(""one"", ""two""))
pd_midx2 = pd.MultiIndex.from_arrays(..., , names=(""three"", ""four""))
indexes1 = PandasMultiIndex.from_pandas_index(pd_midx1, ""x"")
indexes2 = PandasMultiIndex.from_pandas_index(pd_midx2, ""y"")
ds = xr.Dataset(indexes=[indexes1, indexes2])
```
With option 1 it feels odd passing an empty list in order to avoid creating default indexes: `ds = xr.Dataset(indexes=[])`. Not really better in this regard with option 2: `ds = xr.Dataset(indexes=Indexes())`. Option 3 is better IMO: `ds = xr.Dataset(indexes={})`.
Option 3 actually works in all cases since `Indexes[Index]` is a sub-type of `Mapping[Any, Index]`. However, it is not clear from this generic type that any non-empty mapping must be an instance of `Indexes` (because the latter also contains the coordinate variables).
I'm leaning towards option 3. For passing multiple indexes at once we could probably expand the `Indexes` API, e.g., with an `.update()` method.
What do people think?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422543378
https://github.com/pydata/xarray/issues/6392#issuecomment-1290454937,https://api.github.com/repos/pydata/xarray/issues/6392,1290454937,IC_kwDOAMm_X85M6seZ,4160723,2022-10-25T12:19:52Z,2022-10-25T12:19:52Z,MEMBER,"I'm thinking of only accepting one or more instances of [Indexes](https://github.com/pydata/xarray/blob/e678a1d7884a3c24dba22d41b2eef5d7fe5258e7/xarray/core/indexes.py#L1030) as `indexes` argument in the Dataset and DataArray constructors. The only exception is when `fastpath=True` a mapping can be given directly.
- It is much easier to handle: just check that keys returned by `Indexes.variables` do no conflict with the coordinate names in the `coords` argument
- It is slightly safer: it requires the user to explicitly create an `Indexes` object, thus with less chance to accidentally provide coordinate variables and index objects that do not relate to each other (we could probably add some safe guards in the `Indexes` class itself)
- It is more convenient: an Xarray `Index` may provide a factory method that returns an instance of `Indexes` that we just need to pass as `indexes`
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1175329407
https://github.com/pydata/xarray/pull/7185#issuecomment-1285038821,https://api.github.com/repos/pydata/xarray/issues/7185,1285038821,IC_kwDOAMm_X85MmCLl,4160723,2022-10-20T06:59:04Z,2022-10-20T06:59:04Z,MEMBER,🚀 ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1413425793
https://github.com/pydata/xarray/pull/7185#issuecomment-1283994902,https://api.github.com/repos/pydata/xarray/issues/7185,1283994902,IC_kwDOAMm_X85MiDUW,4160723,2022-10-19T13:13:39Z,2022-10-19T13:13:39Z,MEMBER,"LGTM, that's awesome! It will be super handy for quick debugging and experimenting with custom indexes.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1413425793
https://github.com/pydata/xarray/pull/7183#issuecomment-1283897249,https://api.github.com/repos/pydata/xarray/issues/7183,1283897249,IC_kwDOAMm_X85Mhreh,4160723,2022-10-19T11:59:08Z,2022-10-19T11:59:08Z,MEMBER,"Looks all good to me!
Do you want to add a what's new entry here or add it in #7185 with a link to this PR?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1412926287
https://github.com/pydata/xarray/pull/7185#issuecomment-1283103957,https://api.github.com/repos/pydata/xarray/issues/7185,1283103957,IC_kwDOAMm_X85MepzV,4160723,2022-10-18T22:57:16Z,2022-10-18T22:57:16Z,MEMBER,"Thanks @keewis for opening this PR.
I added some commits (hope you don't mind) to fix the CSS. I also grouped the items in the indexes section by unique index with index coordinates separated by line return, so it looks like the coordinate section while the multi-coordinate indexes are clearly visible.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1413425793
https://github.com/pydata/xarray/pull/7182#issuecomment-1283038653,https://api.github.com/repos/pydata/xarray/issues/7182,1283038653,IC_kwDOAMm_X85MeZ29,4160723,2022-10-18T21:40:49Z,2022-10-18T21:40:49Z,MEMBER,"> I wonder if it is possible to create a generic MultiIndex?
Hmm that could be possible but it think there are just too many possible edge cases for something generic like that.
In your specific example
```python
ds.set_xindex(
[""a"", ""b""],
MultiIndex([(""a"", PandasIndex), (""b"", PandasIndex), ([""a"", ""b""], BallTreeIndex)),
)
```
we could probably use the BallTreeIndex for point-wise indexing (i.e., with `ds.sel(a=xr.DataArray(...), b=xr.DataArray(...))`) and use the two PandasIndex instances for other kinds of selection (e.g., with slices, scalars, etc.) so there's no conflict, but I doubt this would be what we want in other cases.
I guess your suggestion is a way around the constraint in the Xarray data model that a coordinate cannot have multiple indexes? I'm afraid there's no easy solution that is generic enough. Maybe some cache to avoid rebuilding the indexes? I.e., `.set_xindex()` doesn't drop the pre-existing index(es) but rather disable them so that it is possible to re-enable them later with another `.set_xindex()` call (`.xindexes` only returns the ""active"" indexes but there may be other ""inactive"" indexes attached to a dataset).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1412901282
https://github.com/pydata/xarray/pull/7183#issuecomment-1282295471,https://api.github.com/repos/pydata/xarray/issues/7183,1282295471,IC_kwDOAMm_X85Mbkav,4160723,2022-10-18T12:19:56Z,2022-10-18T12:19:56Z,MEMBER,Yeah I think we could let the whole line after the 1st column (coordinate names) be customized by the index.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1412926287
https://github.com/pydata/xarray/pull/7183#issuecomment-1282151989,https://api.github.com/repos/pydata/xarray/issues/7183,1282151989,IC_kwDOAMm_X85MbBY1,4160723,2022-10-18T10:11:46Z,2022-10-18T10:11:46Z,MEMBER,"Great @keewis!
One question: should we let `repr_inline` display the class name or should we reserve a column for this and use `repr_inline` for other things? I.e., like variables have a dtype column and another column for values preview or other inline info. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1412926287
https://github.com/pydata/xarray/issues/7162#issuecomment-1282016895,https://api.github.com/repos/pydata/xarray/issues/7162,1282016895,IC_kwDOAMm_X85MagZ_,4160723,2022-10-18T08:35:29Z,2022-10-18T08:49:47Z,MEMBER,"> Indexes.copy_indexes might also require some update that includes the memo argument. But not sure if that will solve the issue here.
That's a possible cause. Alignment may fail early because `.xindexes` returns different mappings of coordinates vs. index objects. It's worth checking if after copying the dataset, `copy.xindexes` returns the same CRSIndex object for its ""x"", ""y"" and ""spatial_ref"" coordinates.
EDIT: checking `copy.xindexes.group_by_index()` is more convenient.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1409811164
https://github.com/pydata/xarray/issues/7162#issuecomment-1282024919,https://api.github.com/repos/pydata/xarray/issues/7162,1282024919,IC_kwDOAMm_X85MaiXX,4160723,2022-10-18T08:41:08Z,2022-10-18T08:41:08Z,MEMBER,"The refactored alignment logic could be improved (cf. #7002). The error raised in the method below is not very helpful.
https://github.com/pydata/xarray/blob/ab726c536464fbf4d8878041f950d2b0ae09b862/xarray/core/alignment.py#L294-L333","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1409811164
https://github.com/pydata/xarray/issues/6807#issuecomment-1277301954,https://api.github.com/repos/pydata/xarray/issues/6807,1277301954,IC_kwDOAMm_X85MIhTC,4160723,2022-10-13T09:22:27Z,2022-10-13T09:22:27Z,MEMBER,"Not really a generic and parallel execution back-end, but [Open-EO](https://openeo.org/) looks like an interesting use case too (it is a framework for managing remote execution of processing tasks on multiple big Earth observation cloud back-ends via a common API). I've suggested the idea of reusing the Xarray API here: https://github.com/Open-EO/openeo-python-client/issues/334.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1308715638
https://github.com/pydata/xarray/pull/7150#issuecomment-1276685925,https://api.github.com/repos/pydata/xarray/issues/7150,1276685925,IC_kwDOAMm_X85MGK5l,4160723,2022-10-12T20:17:09Z,2022-10-12T20:17:09Z,MEMBER,"Thank you @lukasbindreiter! Merging.
I notice that this is your first contribution to Xarray, welcome!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1403144601
https://github.com/pydata/xarray/pull/6795#issuecomment-1276433539,https://api.github.com/repos/pydata/xarray/issues/6795,1276433539,IC_kwDOAMm_X85MFNSD,4160723,2022-10-12T16:19:34Z,2022-10-12T16:19:34Z,MEMBER,"Looks good to me @keewis. Thanks for your work on the indexes repr!
Yes I think we can skip displaying default indexes for now... The question is which indexes are considered as default, i.e., all `PandasIndex` and `PandasMultiIndex` instances (like in this PR) or just the single pandas indexes automatically created for the dimension coordinates? We can decide this later, though, it's not a problem adding more indexes in the text repr later (we'll probably need it when dropping the multi-index dimension coordinate with tuple elements). For the html repr it's easier: we could display all indexes and collapse the section by default.
> but I thought ""dimension coordinates"" (and in particular their indexes) are still used for alignment?
Yes that's a good point. Let's keep ""dimensions without coordinates"".","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1306887842
https://github.com/pydata/xarray/issues/7139#issuecomment-1272966573,https://api.github.com/repos/pydata/xarray/issues/7139,1272966573,IC_kwDOAMm_X85L3-2t,4160723,2022-10-10T08:35:22Z,2022-10-10T08:35:22Z,MEMBER,"Looks like the backend logic needs some updates to make it compatible with the new xarray data model with explicit indexes (i.e., possible indexed coordinates with name != dimension like for multi-index levels now), e.g., here:
https://github.com/pydata/xarray/blob/8eea8bb67bad0b5ac367c082125dd2b2519d4f52/xarray/backends/api.py#L234-L241
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1400949778
https://github.com/pydata/xarray/issues/7148#issuecomment-1272944063,https://api.github.com/repos/pydata/xarray/issues/7148,1272944063,IC_kwDOAMm_X85L35W_,4160723,2022-10-10T08:16:37Z,2022-10-10T08:16:37Z,MEMBER,"Looks like passing a `pandas.MultiIndex` object as `dim` argument to `concat` was forgotten during the explicit indexes refactor. While this can be fixed (could be tricky), we should deprecate it: it is convenient but probably too neat now that multi-indexes levels have their own, ""real"" coordinates (see https://github.com/pydata/xarray/issues/6293#issuecomment-1259228475). It should be preferred to explicitly chain `concat` with `assign_coords` (and `set_index`) like the last line in your example.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1402168223
https://github.com/pydata/xarray/issues/7139#issuecomment-1271555410,https://api.github.com/repos/pydata/xarray/issues/7139,1271555410,IC_kwDOAMm_X85LymVS,4160723,2022-10-07T12:55:17Z,2022-10-07T12:55:17Z,MEMBER,"Hi @lukasbindreiter, could you add the whole error traceback please?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1400949778
https://github.com/pydata/xarray/pull/7105#issuecomment-1271519573,https://api.github.com/repos/pydata/xarray/issues/7105,1271519573,IC_kwDOAMm_X85LydlV,4160723,2022-10-07T12:20:49Z,2022-10-07T12:20:49Z,MEMBER,"Tests should be ok now, although this is not a super clean workaround. IndexVariable still needs some more refactoring anyway.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1390999159
https://github.com/pydata/xarray/issues/7121#issuecomment-1267580535,https://api.github.com/repos/pydata/xarray/issues/7121,1267580535,IC_kwDOAMm_X85Ljb53,4160723,2022-10-04T21:08:20Z,2022-10-04T21:08:20Z,MEMBER,"Hi @veenstrajelmer,
In principle with the recent explicit indexes refactor there is no need anymore to have this restriction. Although we still need to relax this constraint (see #6293 point 2), hopefully this shouldn't be hard work now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1395962467
https://github.com/pydata/xarray/issues/7108#issuecomment-1266073388,https://api.github.com/repos/pydata/xarray/issues/7108,1266073388,IC_kwDOAMm_X85Ldr8s,4160723,2022-10-03T21:28:43Z,2022-10-03T21:28:43Z,MEMBER,"> I suppose re-projecting it on a 0-360 would be the only way around this specific issue.
A custom Xarray index would help, e.g., `PeriodicBoundaryIndex` (#7031) or a `GeographicIndex` leveraging libraries like S2Geometry or H3. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1391699976
https://github.com/pydata/xarray/pull/7105#issuecomment-1266068474,https://api.github.com/repos/pydata/xarray/issues/7105,1266068474,IC_kwDOAMm_X85Ldqv6,4160723,2022-10-03T21:22:42Z,2022-10-03T21:22:42Z,MEMBER,"Yes I agree it would be nice if we can roll back this breaking change. However, it really conflicts with `.xindexes` that returns the same index instance for each of its corresponding coordinate. This roll back seems to mostly break things where we need to be smart while handling multi-index coordinates passed to DataArray / Dataset constructors. This might be tricky to solve. It would probably be easier to do it after #6392.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1390999159
https://github.com/pydata/xarray/issues/2028#issuecomment-1265252754,https://api.github.com/repos/pydata/xarray/issues/2028,1265252754,IC_kwDOAMm_X85LajmS,4160723,2022-10-03T10:38:57Z,2022-10-03T16:45:35Z,MEMBER,"With the last release v2022.09.0, this is now possible via `.set_xindex()`:
```python
a = a.set_xindex(""currency"")
a.sel(currency=""EUR"")
#
# array([20, 30])
# Coordinates:
# * country (country) Maybe we should check `pandas.MultiIndex.is_unique` in `Dataset.unstack()`
Better to check this in `PandasMultiIndex.unstack()` actually.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1390228572
https://github.com/pydata/xarray/issues/7104#issuecomment-1261996160,https://api.github.com/repos/pydata/xarray/issues/7104,1261996160,IC_kwDOAMm_X85LOIiA,4160723,2022-09-29T09:11:05Z,2022-09-29T09:11:05Z,MEMBER,"Thanks for the report @znichollscr.
Maybe we should check `pandas.MultiIndex.is_unique` in `Dataset.unstack()` like in `Dataset.from_dataframe()`?
```python
df = ds.drop_vars(""lat"").to_dataframe()
xr.Dataset.from_dataframe(df)
# ValueError: cannot convert a DataFrame with a non-unique MultiIndex into xarray
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1390228572
https://github.com/pydata/xarray/issues/7069#issuecomment-1261356747,https://api.github.com/repos/pydata/xarray/issues/7069,1261356747,IC_kwDOAMm_X85LLsbL,4160723,2022-09-28T19:12:50Z,2022-09-28T19:12:50Z,MEMBER,"I think we can go ahead with the release. The remaining regressions seem to affect only a limited number of use cases ; it could wait the following release if we we are not waiting too long between the two.
I'd also wait for an announcement about indexes. It has been already announced at the previous release, and it'd probably be better to communicate about it (maybe via a blog post?) after improving the docs and experimenting a bit more with custom indexes...","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1382753751
https://github.com/pydata/xarray/issues/7097#issuecomment-1261049239,https://api.github.com/repos/pydata/xarray/issues/7097,1261049239,IC_kwDOAMm_X85LKhWX,4160723,2022-09-28T15:03:36Z,2022-09-28T15:03:36Z,MEMBER,"Hi @znichollscr, thanks for the report. Indeed it looks like `_coord_names` are not updated properly.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1389148779
https://github.com/pydata/xarray/issues/7099#issuecomment-1261015002,https://api.github.com/repos/pydata/xarray/issues/7099,1261015002,IC_kwDOAMm_X85LKY_a,4160723,2022-09-28T14:39:10Z,2022-09-28T14:39:10Z,MEMBER,"Or use `Indexer` objects to group labels + options? This is slightly different than what you suggest:
```python
class Dataset:
def sel(
self,
indexers: Mapping[Any, Any] | Indexer | Iterable[Indexer],
**indexers_kwargs: Any,
):
...
class Indexer:
def __init__(self, labels=None, options=None, **label_kwargs):
...
```
Let's assume a Dataset with `lat` / `lon` coordinates both sharing the same geographic index + another `time` dimension coordinate, then we could write:
```python
indexers = [
Indexer(lon=[2, 15], lat=[45, 48], options={""foo"": ""bar""}),
Indexer(time=""2022-01-01""),
]
ds.sel(indexers)
```
This could also be used to avoid code duplication when using common selection options for different indexes.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1389295853
https://github.com/pydata/xarray/issues/7099#issuecomment-1260892017,https://api.github.com/repos/pydata/xarray/issues/7099,1260892017,IC_kwDOAMm_X85LJ69x,4160723,2022-09-28T13:11:01Z,2022-09-28T13:11:01Z,MEMBER,"Or we could simply decide that `.sel()` should not accept arbitrary options and handle special cases, e.g., via accessors.
It would actually make sense to have something like `.my_accessor.sel_k_neighbors()`. Not so great to have a separate method just for an optimization option, though.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1389295853
https://github.com/pydata/xarray/issues/6392#issuecomment-1260618693,https://api.github.com/repos/pydata/xarray/issues/6392,1260618693,IC_kwDOAMm_X85LI4PF,4160723,2022-09-28T09:13:00Z,2022-09-28T12:52:01Z,MEMBER,"> How would we handle creating xarray objects from pandas objects where they have a multiindex?
For `pandas.Series` / `pandas.DataFrame` objects, `DataArray.from_series()` / `Dataset.from_dataframe()` already expand multi-index levels as dimensions.
For a `pandas.MultiIndex`, we could do like below but it is a bit tedious:
```python
import pandas as pd
import xarray as xr
from xarray.indexes import PandasMultiIndex
pd_idx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""foo"", ""bar""))
idx = PandasMultiIndex(pd_idx, ""x"")
indexes = {""x"": idx, ""foo"": idx, ""bar"": idx}
coords = idx.create_variables()
ds = xr.Dataset(coords=coords, indexes=indexes)
```
For more convenience, we could add a class method to `PandasMultiIndex`, e.g.,
```python
# this calls PandasMultiIndex.__init__() and PandasMultiIndex.create_variables() internally
indexes, coords = PandasMultiIndex.from_pandas_index(pd_idx, ""x"")
ds = xr.Dataset(coords=coords, indexes=indexes)
```
Instead of `indexes, coords` raw dictionaries, we could return an instance of the [Indexes](https://github.com/pydata/xarray/blob/e678a1d7884a3c24dba22d41b2eef5d7fe5258e7/xarray/core/indexes.py#L1030) class (also returned by `Dataset.xindexes`), which encapsulates the coordinate variables:
```python
xmidx = PandasMultiIndex.from_pandas_index(pd_idx, ""x"")
ds = xr.Dataset(coords=xmidx.variables, indexes=xmidx)
```
For even more convenience, I think it might be reasonable to support special handling of `Indexes` instances given in Dataset / DataArray constructors and in `.update()`, i.e.,
```python
# both cases below will implicitly add the coordinates found in `xmidx`
# (if there's no conflict with other coordinates)
ds = xr.Dataset(indexes=xmidx)
ds2 = xr.Dataset()
ds2.update(xmidx)
```
The same approach could be used for `pandas.IntervalIndex` (as discussed in #4579).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1175329407
https://github.com/pydata/xarray/issues/7099#issuecomment-1260859023,https://api.github.com/repos/pydata/xarray/issues/7099,1260859023,IC_kwDOAMm_X85LJy6P,4160723,2022-09-28T12:50:25Z,2022-09-28T12:50:25Z,MEMBER,"Another difficulty regarding multi-coordinate indexes: ideally options should be set per index, not per coordinate.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1389295853
https://github.com/pydata/xarray/issues/4090#issuecomment-1260806288,https://api.github.com/repos/pydata/xarray/issues/4090,1260806288,IC_kwDOAMm_X85LJmCQ,4160723,2022-09-28T12:06:03Z,2022-09-28T12:06:03Z,MEMBER,"@JimmyGao0204 this is not supported by Xarray itself but the [xoak](https://xoak.readthedocs.io) has been developed for that purpose.
I'm going to close this issue as Xarray now provides everything needed for selecting data using 2D lat/lon coordinates (i.e., advanced indexing, flexible indexes), and it is likely that this specific case will be further maintained in a 3rd party library like `xoak`. Feel free to comment / re-open if you think this should be built-in Xarray.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623804131
https://github.com/pydata/xarray/issues/475#issuecomment-1260794423,https://api.github.com/repos/pydata/xarray/issues/475,1260794423,IC_kwDOAMm_X85LJjI3,4160723,2022-09-28T11:55:04Z,2022-09-28T11:55:04Z,MEMBER,"There hasn't been much activity here since quite some time.
Meanwhile, there has been the development of the [xoak](https://xoak.readthedocs.io/en/latest/) package that supports point-wise indexing of Xarray objects with various indexes (either generic like `scipy.spatial.cKDTree` or more specific like [pys2index](https://github.com/benbovy/pys2index)'s `S2PointIndex` for lat/lon point data). `xoak` leverage Xarray's advanced indexing capabilities and supports selection using both coordinates and indexers with an arbitrary number of dimensions.
With the forthcoming Xarray release, it will be possible to create and assign custom indexes to DataArray / Dataset objects. The plan for `xoak` is then to just provide some custom indexes so that we can perform point-wise selection directly with `Dataset.sel()` instead of `Dataset.xoak.sel()`.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,95114700
https://github.com/pydata/xarray/issues/6573#issuecomment-1260551056,https://api.github.com/repos/pydata/xarray/issues/6573,1260551056,IC_kwDOAMm_X85LInuQ,4160723,2022-09-28T08:17:09Z,2022-09-28T08:17:09Z,MEMBER,"I also like the idea of alignment with some tolerance. There is an open PR #4489, which needs to be reworked in the context of the explicit index refactor.
Alternatively to a new kwarg we could add an index build option, e.g., `ds.set_xindex(""x"", index_cls=PandasIndex, align_tolerance=1e-6)`, but then it is not obvious how to handle different tolerance values given for the indexes to compare. Maybe this could depend on the given `join` method? E.g., pick the smallest tolerance for join=inner, the largest for join=outer, the tolerance of the left index for join=left, etc.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1226272301
https://github.com/pydata/xarray/issues/5874#issuecomment-1260497579,https://api.github.com/repos/pydata/xarray/issues/5874,1260497579,IC_kwDOAMm_X85LIaqr,4160723,2022-09-28T07:26:55Z,2022-09-28T07:26:55Z,MEMBER,Closed in #6971.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1029088776
https://github.com/pydata/xarray/issues/4579#issuecomment-1259615513,https://api.github.com/repos/pydata/xarray/issues/4579,1259615513,IC_kwDOAMm_X85LFDUZ,4160723,2022-09-27T14:45:19Z,2022-09-27T14:46:41Z,MEMBER,"Perhaps Xarray has been too clever so far regarding how it handles pandas objects passed directly as coordinate data? `pandas.MultiIndex` objects are handled in a specific way too, which is often hard to deal with.
Expanding on @max-sixty's suggestion, we could:
- treat all coordinate data as duck arrays, i.e., in the example above handle `da1` just like `da2` (no more special cases for pandas objects)
- provide an `xarray.indexes.PandasIntervalIndex` wrapper, which would inherit from `xarray.indexes.PandasIndex` with a few addtionnal options and features, e.g., like the ones @dcherian suggests in https://github.com/pydata/xarray/discussions/6783#discussioncomment-3149033
- build an interval index from an existing coordinate using , e.g., `da.set_xindex(""x"", PandasIntervalIndex, closed=""right"")`
- figure out how to assign both a coordinate and an index from an existing `pandas.IntervalIndex` object in a convenient but more explicit way
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,741806260
https://github.com/pydata/xarray/issues/5646#issuecomment-1259441952,https://api.github.com/repos/pydata/xarray/issues/5646,1259441952,IC_kwDOAMm_X85LEY8g,4160723,2022-09-27T12:34:20Z,2022-09-27T12:34:20Z,MEMBER,"This is fixed in v2022.6.0
```python
xr.testing.assert_allclose(b, c)
# AssertionError: Left and right DataArray objects are not close
#
# Coordinates only on the left object:
# * x (z) int64 0
# * y (z) int64 0
# Coordinates only on the right object:
# * not-y (z) int64 0
# * not-x (z) int64 0
print(b == c, ""\n"")
# ValueError: cannot re-index or align objects with conflicting indexes found for the following coordinates: 'z' (2 conflicting indexes)
# Conflicting indexes may occur when
# - they relate to different sets of coordinate and/or dimension names
# - they don't have the same type
# - they may be used to reindex data along common dimensions
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,955617411
https://github.com/pydata/xarray/issues/2280#issuecomment-1259415933,https://api.github.com/repos/pydata/xarray/issues/2280,1259415933,IC_kwDOAMm_X85LESl9,4160723,2022-09-27T12:12:05Z,2022-09-27T12:12:05Z,MEMBER,This is fixed in v2022.6.0. Xarray's `PandasMultiIndex` wrapper keeps track of the level coordinate dtypes.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,340316108
https://github.com/pydata/xarray/issues/907#issuecomment-1259415318,https://api.github.com/repos/pydata/xarray/issues/907,1259415318,IC_kwDOAMm_X85LEScW,4160723,2022-09-27T12:11:35Z,2022-09-27T12:11:35Z,MEMBER,This is fixed in v2022.6.0. Xarray's `PandasMultiIndex` wrapper keeps track of the level coordinate dtypes. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166441031
https://github.com/pydata/xarray/pull/6971#issuecomment-1259349072,https://api.github.com/repos/pydata/xarray/issues/6971,1259349072,IC_kwDOAMm_X85LECRQ,4160723,2022-09-27T11:14:07Z,2022-09-27T11:14:07Z,MEMBER,"In the last commit I added the `xarray.indexes` namespace from which we can import `Index`, `PandasIndex` and `PandasMultiIndex`.
Thanks everyone for the feedback and review!
I think this is ready to merge, if we agree to address the `coord_names` typing issue in another PR?","{""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1357296406