id,node_id,number,state,locked,title,user,body,created_at,updated_at,closed_at,merged_at,merge_commit_sha,assignee,milestone,draft,head,base,author_association,auto_merge,repo,url,merged_by
64042989,MDExOlB1bGxSZXF1ZXN0NjQwNDI5ODk=,802,closed,0,Multi-index indexing,4160723,"Follows #767.

This is incomplete (it still needs some tests and documentation updates), but it is working for both `Dataset` and `DataArray` objects. I also don't know if it is fully compatible with lazy indexing (Dask). 

Using the example from #767:

```
In [4]: da.sel(band_wavenumber={'band': 'foo'})
Out[4]:
<xarray.DataArray (wavenumber: 2)>
array([ 0.00017,  0.00014])
Coordinates:
  * wavenumber  (wavenumber) float64 4.05e+03 4.05e+03
```

As shown in this example, similarily to pandas, it automatically renames the dimension and assigns a new coordinate when the selection doesn't return a `pd.MultiIndex` (here it returns a `pd.FloatIndex`).

In some cases this behavior may be unwanted (??), so I added a `drop_level` keyword argument (if `False` it keeps the multi-index and doesn't change the dimension/coordinate names):

```
In [5]: da.sel(band_wavenumber={'band': 'foo'}, drop_level=False)
Out[5]:
<xarray.DataArray (band_wavenumber: 2)>
array([ 0.00017,  0.00014])
Coordinates:
  * band_wavenumber  (band_wavenumber) object ('foo', 4050.2) ('foo', 4050.3)
```

Note that it also works with `DataArray.loc`, but (for now) in that case it always returns the multi-index:

```
In [6]: da.loc[{'band_wavenumber': {'band': 'foo'}}]
Out[6]:
<xarray.DataArray (band_wavenumber: 2)>
array([ 0.00017,  0.00014])
Coordinates:
  * band_wavenumber  (band_wavenumber) object ('foo', 4050.2) ('foo', 4050.3)
```

This is however inconsistent with `Dataset.sel` and `Dataset.loc` that both apply `drop_level=True` by default, due to their different implementation. Two solutions: (1) make `DataArray.loc` apply drop_level by default, or (2) use `drop_level=False` by default everywhere.
",2016-03-24T14:39:38Z,2016-07-19T10:48:56Z,2016-07-19T01:15:42Z,2016-07-19T01:15:41Z,7a9e84b5708d3e8ec270a7415f9b5e54d30f13f7,,,0,712497c3997e72a36cafc8fb9eaafbecc76af5dc,80abe5dede7bf8a2949139f8ba083a6d74d4e3db,MEMBER,,13221727,https://github.com/pydata/xarray/pull/802,
73465410,MDExOlB1bGxSZXF1ZXN0NzM0NjU0MTA=,879,closed,0,Multi-index repr,4160723,"Another item of #719.

An example:

``` python
>>> index = pd.MultiIndex.from_product((list('ab'), range(10)))
>>> index.names= ('a_long_level_name', 'level_1')
>>> data = xr.DataArray(range(20), [('x', index)])
>>> data
<xarray.DataArray (x: 20)>
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])
Coordinates:
  * x                    (x) object MultiIndex
    - a_long_level_name  object 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'b' ...
    - level_1            int64 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
```

To be consistent with the displayed coordinates and/or data variables, it displays the actual used level values. Using the `pandas.MultiIndex.get_level_values` method would be expensive for big indexes, so I re-implemented it in xarray so that we can truncate the computation to the first _x_ values, which is very cheap.

It still needs testing.

Maybe it would be nice to align the level values.
",2016-06-11T10:58:13Z,2016-09-02T09:34:49Z,2016-08-31T21:40:59Z,,,,,0,4e7793a8d4fb0d5062ad8aab5578aaf3fec43577,450ac8fb16bec935a18ff3155673dff82208d3fe,MEMBER,,13221727,https://github.com/pydata/xarray/pull/879,
73554612,MDExOlB1bGxSZXF1ZXN0NzM1NTQ2MTI=,881,closed,0,Fix variable copy with multi-index ,4160723,"Fixes #769.
",2016-06-13T10:38:46Z,2016-08-01T14:17:17Z,2016-06-16T21:01:07Z,2016-06-16T21:01:07Z,065ea6a3695a58ad6256f79b7712b67a8da6377c,,,0,9ea8832959a54fed81e7194c18cc024ba0fe9bd1,450ac8fb16bec935a18ff3155673dff82208d3fe,MEMBER,,13221727,https://github.com/pydata/xarray/pull/881,
77953522,MDExOlB1bGxSZXF1ZXN0Nzc5NTM1MjI=,903,closed,0,fixed multi-index copy test,4160723,,2016-07-19T10:37:36Z,2016-08-01T14:16:15Z,2016-07-19T14:47:58Z,2016-07-19T14:47:58Z,e8566940a97cd5a11fdbe796cb5f8b0f00864624,,,0,c863df76651fbc0bae1a02819c7db28eef4f4ae5,7a9e84b5708d3e8ec270a7415f9b5e54d30f13f7,MEMBER,,13221727,https://github.com/pydata/xarray/pull/903,
80229493,MDExOlB1bGxSZXF1ZXN0ODAyMjk0OTM=,947,closed,0,Multi-index levels as coordinates,4160723,"Implements 2, 4 and 5 in #719.

Demo:

```
In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: import xarray as xr

In [4]: index = pd.MultiIndex.from_product((list('ab'), range(2)),
   ...:                                    names= ('level_1', 'level_2'))

In [5]: da = xr.DataArray(np.random.rand(4, 4), coords={'x': index},
   ...:                   dims=('x', 'y'), name='test')

In [6]: da
Out[6]: 
<xarray.DataArray 'test' (x: 4, y: 4)>
array([[ 0.15036153,  0.68974802,  0.40082234,  0.94451318],
       [ 0.26732938,  0.49598123,  0.8679231 ,  0.6149102 ],
       [ 0.3313594 ,  0.93857424,  0.73023367,  0.44069622],
       [ 0.81304837,  0.81244159,  0.37274953,  0.86405196]])
Coordinates:
  * level_1  (x) object 'a' 'a' 'b' 'b'
  * level_2  (x) int64 0 1 0 1
  * y        (y) int64 0 1 2 3

In [7]: da['level_1']
Out[7]: 
<xarray.DataArray 'level_1' (x: 4)>
array(['a', 'a', 'b', 'b'], dtype=object)
Coordinates:
  * level_1  (x) object 'a' 'a' 'b' 'b'
  * level_2  (x) int64 0 1 0 1

In [8]: da.sel(x='a', level_2=1)
Out[8]: 
<xarray.DataArray 'test' (y: 4)>
array([ 0.26732938,  0.49598123,  0.8679231 ,  0.6149102 ])
Coordinates:
    x        object ('a', 1)
  * y        (y) int64 0 1 2 3

In [9]: da.sel(level_2=1)
Out[9]: 
<xarray.DataArray 'test' (level_1: 2, y: 4)>
array([[ 0.26732938,  0.49598123,  0.8679231 ,  0.6149102 ],
       [ 0.81304837,  0.81244159,  0.37274953,  0.86405196]])
Coordinates:
  * level_1  (level_1) object 'a' 'b'
  * y        (y) int64 0 1 2 3
```

Some notes about the implementation:
- I slightly modified `Coordinate` so that it allows setting different values for the names of the coordinate and its dimension. There is no breaking change.
- I also added a `Coordinate.get_level_coords` method to get independent, single-index coordinates objects from a MultiIndex coordinate.

Remaining issues:
- `Coordinate.get_level_coords` calls `pandas.MultiIndex.get_level_values` for each level and is itself called each time when indexing and for repr. This can be very costly!! It would be nice to return some kind of lazy index object instead of computing the actual level values.
- repr replace a MultiIndex coordinate by its level coordinates. That can be confusing in some cases (see below). Maybe we can set a different marker than `*`  for level coordinates.

```
In [6]: [name for name in da.coords]
Out[6]: ['x', 'y']

In [7]: da.coords.keys()
Out[7]: 
KeysView(Coordinates:
  * level_1  (x) object 'a' 'a' 'b' 'b'
  * level_2  (x) int64 0 1 0 1
  * y        (y) int64 0 1 2 3)
```
- `DataArray.level_1` doesn't return another `DataArray` object:

```
In [10]: da.level_1
Out[10]: 
<xarray.Coordinate 'level_1' (x: 4)>
array(['a', 'a', 'b', 'b'], dtype=object)
```
- Maybe we need to test the uniqueness of level names at `DataArray` or `Dataset` creation.

Of course still needs proper tests and docs... 
",2016-08-05T11:34:49Z,2016-09-14T15:25:28Z,2016-09-14T03:34:51Z,2016-09-14T03:34:51Z,41654ef5e9da8cd15f3b68f8384f8c45c7fc16e9,,,0,a447767e8d611d945dc864910a427ef7e3f4db11,3ecfa66613aaefdea8beb15edbd392b9f9d815c6,MEMBER,,13221727,https://github.com/pydata/xarray/pull/947,
87715303,MDExOlB1bGxSZXF1ZXN0ODc3MTUzMDM=,1028,closed,0,"Add `set_index`, `reset_index` and `reorder_levels` methods",4160723,"Another item in #719.

I added tests and updated the docs, so this is ready for review.
",2016-10-03T13:22:24Z,2023-08-30T09:28:26Z,2016-12-27T17:03:00Z,2016-12-27T17:03:00Z,7ad254409f97dfe932855445602faaf7324f3d5e,,,0,c58cb470baf53d1c67971540e1d7c02dbafd212a,34fd2b6cb94dfb824c5371c37b6eb5e70a88260f,MEMBER,,13221727,https://github.com/pydata/xarray/pull/1028,
121942631,MDExOlB1bGxSZXF1ZXN0MTIxOTQyNjMx,1422,closed,0,xarray.core.variable.as_variable part of the public API,4160723," - [x] Closes #1303 
 - [x] Tests added / passed
 - [x] Passes ``git diff upstream/master | flake8 --diff`` (if we ignore messages for .rst files and ""imported but not used"" messages for `xarray.__init__.py`)
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API

Make `xarray.core.variable.as_variable` part of the public API and accessible as a top-level function: `xarray.as_variable`.

I changed the docstrings to follow the numpydoc format more closely.

I also removed the `copy=False` keyword arguments as apparently it was unused. ",2017-05-23T08:44:08Z,2017-06-10T18:33:34Z,2017-06-02T17:55:12Z,2017-06-02T17:55:12Z,b8771934a2ef24fd3ce5a93fc2accb3f6fa12e4e,,,0,37343de03666f6cac03ce68a7fed60b866338ee7,6b18d77b5581be4d91cb12da95a530f92ab867b5,MEMBER,,13221727,https://github.com/pydata/xarray/pull/1422,
135298867,MDExOlB1bGxSZXF1ZXN0MTM1Mjk4ODY3,1507,closed,0,Detailed report for testing.assert_equal and testing.assert_identical,4160723," - ~~Closes #xxxx~~
 - [x] Tests added / passed
 - [x] Passes ``git diff upstream/master | flake8 --diff``
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API

~~In addition to `Dataset` repr, the error message also shows the output of `Dataset.info()` for both datasets.~~

~~This may not be the most elegant solution, but it is helpful when datasets only differ by their attributes attached to coordinates or data variables (not shown in repr). I'm open to any suggestion.~~

The report shows the differences for dimensions, data values (``Variable`` and ``DataArray``), coordinates, data variables and attributes (the latter only for ``testing.assert_identical``).

There is currently not much tests for `xarray.testing` functions, but I'm willing to add more if needed.

Not sure if it's worth a what's new entry (EDIT: added one).",2017-08-11T09:38:23Z,2019-10-25T15:07:39Z,2019-01-18T09:16:31Z,2019-01-18T09:16:31Z,1d0a2bc4970d9e7337fe307f4519bd936f7d7d89,,,0,443e59365e5440979421644e50491f7dd323ab95,f13536c965d02bb2845da31e909899a90754b375,MEMBER,,13221727,https://github.com/pydata/xarray/pull/1507,
153118247,MDExOlB1bGxSZXF1ZXN0MTUzMTE4MjQ3,1723,closed,0,Fix unexpected behavior of .set_index() since pandas 0.21.0,4160723," - [x] Closes #1722 
 - [x] Tests added / passed
 - [x] Passes ``git diff upstream/master **/*py | flake8 --diff``
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
",2017-11-16T18:37:20Z,2019-10-25T15:07:18Z,2017-11-17T00:54:51Z,2017-11-17T00:54:51Z,1a012080e0910f3295d0fc26806ae18885f56751,,,0,eda038be4f7e4298806ed1e3f92c8fc7bf287a21,8267fdb1093bba3934a172cf71128470698279cd,MEMBER,,13221727,https://github.com/pydata/xarray/pull/1723,
162426756,MDExOlB1bGxSZXF1ZXN0MTYyNDI2NzU2,1820,closed,0,WIP: html repr,4160723," - [x] Closes #1627 
 - [ ] Tests added
 - [ ] Tests passed
 - [ ] Passes ``git diff upstream/master **/*py | flake8 --diff``
 - [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API

This is work in progress, although the basic functionality is there. You can see a preview here:
http://nbviewer.jupyter.org/gist/benbovy/3009f342fb283bd0288125a1f7883ef2

TODO:

- [ ] Add support for Multi-indexes
- [ ] Probably good to have some opt-in or fail back system in case where we (or users) know that the rendering will not work
- [ ] Add some tests

Nice to have (keep this for later):

- Clean-up CSS code and HTML template (track CSS [subgrid support](https://caniuse.com/#feat=css-subgrid) in browsers, this may simplify a lot the things here).
- Dynamically adapt cell widths (given the length of the names of variables and dimensions). Currently all cells have a fixed width. This is tricky, though, as we don't use a monospace font here.
- Integration with jupyterlab/notebook themes (CSS classes) and maybe allow custom CSS.
- Integration of Dask arrays HTML repr (+ integration of repr for other array backends).
- Maybe find a way (if possible) to include CSS only once in the notebook (currently it is included each time a xarray object is displayed in an output cell, which is not very nice).
- Review the rules for collapsing the `Coordinates`, `Data variables` and `Attributes` sections (maybe expose them as global options).
- Maybe also define some rules to collapse automatically the data section (DataArray and Variable) when the data repr is too long.
- Maybe add rich representation for `Dataset.coords` and `Dataset.data_vars` as well?


<details>
<summary>Other thoughts (old)</summary>

A big challenge here is to provide both robust and flexible styling (CSS):

- I have tested the current styling in jupyterlab (0.30.6, light theme), notebook (5.2.2) and nbviewer: despite some slight differences it looks quite good!
- However, the current CSS code is a bit fragile (I had to add a lot of `!important`). Probably this could be a bit cleaned and optimized (unfortunately my CSS skills are limited).  
- Also, with the jupyterlab's dark theme it looks ugly. We probably need to use jupyterlab CSS variables so that our CSS scheme is compatible with the theme machinery, but at the same time we need to support other front-ends. So we probably need to maintain different stylings (i.e., multiple CSS files, one of them picked-up depending on the front-end), though I don't know if it's easy to automatically detect the front-end (choosing a default style is difficult too).
-  The notebook rendering on Github seems to disable style tags (no style is applied to the output, see https://gist.github.com/benbovy/3009f342fb283bd0288125a1f7883ef2). Output is not readable at all in this case, so it might be useful to allow turning off rich output as an option.
</details>



",2018-01-11T16:33:07Z,2019-10-25T15:06:58Z,2019-10-24T16:48:46Z,,e360d3fc81209d7586de95bc044feb3d4a508657,,,0,17de08ba4cc2eb7e3326c1451c1257c911a17958,bb87a9441d22b390e069d0fde58f297a054fd98a,MEMBER,,13221727,https://github.com/pydata/xarray/pull/1820,
171631545,MDExOlB1bGxSZXF1ZXN0MTcxNjMxNTQ1,1946,closed,0,DOC: add main sections to toc,4160723,"Not a big change, but adds a little more clarity IMO.

I'm open to any suggestion for better section names and/or organization. Also I let ""What's new"" at the top, but not sure if ""Getting started"" is the right section.",2018-02-27T11:13:17Z,2018-02-27T21:16:18Z,2018-02-27T19:04:24Z,2018-02-27T19:04:24Z,4ee244078ea90084624c1b6d006f50285f8f2d21,,,0,0fe80d06242b7a7392c9c96598dd9c557ca667ad,243093cf814ffaae2a9ce08215632500fbebcf52,MEMBER,,13221727,https://github.com/pydata/xarray/pull/1946,
207277486,MDExOlB1bGxSZXF1ZXN0MjA3Mjc3NDg2,2357,closed,0,DOC: move xarray related projects to top-level TOC section,4160723,"Make xarray-related projects more discoverable, as it has been suggested in xarray mailing-list.
",2018-08-09T10:57:47Z,2018-08-11T13:41:24Z,2018-08-10T20:13:08Z,2018-08-10T20:13:08Z,846e28f8862b150352512f8e3d05bcb9db57a1a3,,,0,5bd1b794860b8c8e276d4918bfd40c6bad6e1411,04458670782c0b6fdba7e7021055155b2a6f284a,MEMBER,,13221727,https://github.com/pydata/xarray/pull/2357,
332552507,MDExOlB1bGxSZXF1ZXN0MzMyNTUyNTA3,3448,closed,0,Add license for the icons used in the html repr,4160723,,2019-10-25T14:57:20Z,2019-10-25T15:48:52Z,2019-10-25T15:40:46Z,2019-10-25T15:40:46Z,63cc85759ac25605c8398d904d055df5dc538b94,,,0,372f61d954f4b90222c636757665e747502c38d6,bb0a5a2b1c71f7c2622543406ccc82ddbb290ece,MEMBER,,13221727,https://github.com/pydata/xarray/pull/3448,
416544318,MDExOlB1bGxSZXF1ZXN0NDE2NTQ0MzE4,4053,closed,0,Fix html repr in untrusted notebooks (plain text fallback),4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Closes #4041
 - [x] Tests added
 - [x] Passes `isort -rc . && black . && mypy . && flake8`
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API

This is not very elegant (actually plain text repr is already included in the notebook as `text/plain` mime type but it is ignored when `text/html` mime type is present), but it seems to work. I haven't found a better workaround.

I don't really know if this can be properly tested (I only added a basic test).

Steps to test this fix:

- To ""untrust"" a notebook: open an existing notebook with a simple editor, manually edit one output cell with a xarray object repr, and save the ipynb file.
- Open this notebook with the Notebook app, you should see the plain text repr.
",2020-05-12T07:38:22Z,2022-03-29T07:10:07Z,2020-05-20T17:06:40Z,2020-05-20T17:06:40Z,cb90d5542bd6868d5548ae8efb5815c249c2c329,,,0,39299e9f8e71b34ba4587800658204f5b66d9576,3e5dd6ef32b9c69806af69a3a5168edcf3b2e21f,MEMBER,,13221727,https://github.com/pydata/xarray/pull/4053,
582224148,MDExOlB1bGxSZXF1ZXN0NTgyMjI0MTQ4,4979,closed,0,Flexible indexes refactoring notes,4160723,"As a preliminary step before I take on the refactoring and implementation of flexible indexes in Xarray for the next few months, I reviewed the status of https://github.com/pydata/xarray/projects/1 and started compiling partially implemented or planned changes, thoughts, etc. into a single document that may serve as a basis for further discussion and implementation work.

It's still very much work in progress (I will update it regularly in the forthcoming days) and it is very open to discussion (we can use this PR for that)!

I'm not sure if Xarray's root folder is a good place for this document, though. We could move this into a new repository in `xarray-contrib` (that could also host other enhancement proposals) if that's necessary.

I'm looking forward to getting started on this and to getting your thoughts/feedback! 

",2021-03-01T16:57:32Z,2022-03-29T07:09:31Z,2021-03-17T16:47:29Z,2021-03-17T16:47:29Z,d9ba56c22f22ae48ecc53629c2d49f1ae02fcbcb,,,0,6efcdfe893594fcf493e17f693df1d4816b686ba,48378c4b11c5c2672ff91396d4284743165b4fbe,MEMBER,,13221727,https://github.com/pydata/xarray/pull/4979,
608110624,MDExOlB1bGxSZXF1ZXN0NjA4MTEwNjI0,5102,closed,0,Flexible indexes: add Index base class and xindexes properties,4160723,"This PR clears up the path for flexible indexes:

- it adds a new ~~`IndexAdapter`~~ `Index` base class that is meant to be inherited by all xarray-compatible indexes (built-in or 3rd-party)
- `PandasIndexAdapter` now inherits from ~~`IndexAdapter`~~ `Index`
- the `xarray_obj.xindexes` properties return `Index` (`PandasIndexAdapter`) instances. `xarray_obj.indexes` properties still return `pandas.Index` instances.

~~The latter is a breaking change, although I'm not sure if the `indexes` property has been made public yet.~~

This is still work in progress, there are many broken tests that are not fixed yet. (EDIT: all tests should be fixed now).

There's a lot of dirty fixes to avoid circular dependencies and in the many places where we still need direct access to the `pandas.Index` objects, but I'd expect that these will be cleaned-up further in the refactoring.",2021-04-02T16:18:07Z,2022-03-29T07:10:07Z,2021-05-11T08:21:26Z,2021-05-11T08:21:26Z,6e14df62f0b01d8ca5b04bd0ed2b5ee45444265d,,,0,ce59dece723ca49eaae69779dee5da2aa30d0286,234b40a37e484a795e6b12916315c80d70570b27,MEMBER,,13221727,https://github.com/pydata/xarray/pull/5102,
645933827,MDExOlB1bGxSZXF1ZXN0NjQ1OTMzODI3,5322,closed,0,Internal refactor of label-based data selection,4160723,"Xarray label-based data selection now relies on a newly added `xarray.Index.query(self, labels: Dict[Hashable, Any]) -> Tuple[Any, Optional[None, Index]]` method where:

- `labels` is a always a dictionary with coordinate name(s) as key(s) and the corresponding selection label(s) as values
- When calling `.sel` with some coordinate(s)/label(s) pairs, those are first grouped by index so that only the relevant pairs are passed to an `Index.query`
- the returned tuple contains the positional indexers and (optionally) a new index object

For a simple `pd.Index`, `labels` always corresponds to a 1-item dictionary like `{'coord_name': label_values}`, which is not very useful in this case, but this format is useful for `pd.MultiIndex` and will likely be for other, custom indexes.

Moving the label->positional indexer conversion logic into `PandasIndex.query()`, I've tried to separate `pd.Index` vs `pd.MultiIndex` concerns by adding a new `PandasMultiIndex` wrapper class (it will probably be useful for other things as well) and refactor the complex logic that was implemented in `convert_label_indexer`. Hopefully it is a bit clearer now.

Working towards a more flexible/generic system, we still need to figure out how to:

- pass index query extra arguments like `method` and `tolerance` for `pd.Index` but in a more generic way
- handle several positional indexers over multiple dimensions possibly returned by a custom ""meta-index"" (e.g., staggered grid index)
- handle the case of positional indexers returned from querying >1 indexes along the same dimension (e.g., multiple coordinates along `x` with a simple `pd.Index`)
- pandas indexes don't need information like the names or shapes of their corresponding coordinate(s) to perform label-based selection, but this kind of information will probably be needed for other indexes (we actually need it for advanced point-wise selection using tree-based indexes in [xoak](https://github.com/xarray-contrib/xoak)).

This could be done in follow-up PRs..

Side note: I've initially tried to return from `xindexes` items for multi-index levels as well (not only index dimensions), but it's probably wiser to save this for later (when we'll tackle the multi-index virtual coordinate refactoring) as there are many places in Xarray where this is clearly not expected.

Happy to hear your thoughts @pydata/xarray.",2021-05-17T14:52:49Z,2022-03-29T07:10:07Z,2021-06-08T09:35:54Z,2021-06-08T09:35:54Z,9daf9b13648c9a02bddee3640b80fe95ea1fff61,,,0,fda484988c074bfd371ed490641a383c9429c43a,2b38adc1bdd1dd97934fb061d174149c73066f19,MEMBER,,13221727,https://github.com/pydata/xarray/pull/5322,
655109484,MDExOlB1bGxSZXF1ZXN0NjU1MTA5NDg0,5385,closed,0,Cast PandasIndex to pd.(Multi)Index,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #5384
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
",2021-05-27T15:15:41Z,2022-03-29T07:09:31Z,2021-05-28T08:28:11Z,2021-05-28T08:28:11Z,2b38adc1bdd1dd97934fb061d174149c73066f19,,,0,b81931cf852432b7a7857aec4b38566d7e3e0b6e,a6a1e48b57499f91db7e7c15593aadc7930020e8,MEMBER,,13221727,https://github.com/pydata/xarray/pull/5385,
697307477,MDExOlB1bGxSZXF1ZXN0Njk3MzA3NDc3,5636,closed,0,Refactor index vs. coordinate variable(s),4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #5553
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`

This implements option 3 (sort of) described in https://github.com/pydata/xarray/issues/5553#issue-933551030:

- the goal is to avoid wrapping an `xarray.Index` into an `xarray.Variable` and keep those two concepts distinct from each other.
- the `xarray.Index.from_variables` class constructor accepts a dictionary of `xarray.Variable` objects as argument and may (or should?) also return corresponding `xarray.IndexVariable` objects to ensure immutability.
- for `PandasIndex`,  the new returned `xarray.IndexVariable` wraps the underlying `pd.Index` via a `PandasIndexingAdapter` (this reverts some changes made in #5102).
- for `PandasMultiIndex`, this PR adds `PandasMultiIndexingAdapter` so that we can wrap the pandas multi-index in separate coordinate variables objects: one for the dimension + one for each level. The level coordinates data internally hold a reference to the dimension coordinate data to avoid indexing the same underlying `pd.MultiIndex` for each of those coordinates (`PandasMultiIndexingAdapter.__getitem__` is memoized for that purpose).

This is very much work in progress, I need to update (or revert) all related parts of Xarray's internals, update tests, etc. At this stage any comment on the approach described above is welcome. ",2021-07-26T19:54:25Z,2023-08-30T09:21:55Z,2021-08-09T07:56:56Z,2021-08-09T07:56:56Z,4bb9d9c6df77137f05e85c7cc6508fe7a93dc0e4,,,0,e5f2502c07bd7ad449f9f6acfd0e6ac3ede92fb9,8b95da8e21a9d31de9f79cb0506720595f49e1dd,MEMBER,,13221727,https://github.com/pydata/xarray/pull/5636,
709187466,MDExOlB1bGxSZXF1ZXN0NzA5MTg3NDY2,5692,closed,0,Explicit indexes,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes many issues:
  - [x] closes #1366 
  - [x] closes #1408 
  - [x] closes #2489 
  - [x] closes #3432 
  - [x] closes #4542 
  - [x] closes #4955
  - [x] closes #5202
  - [x] closes #5645 
  - [x] closes #5691 
  - [x] closes #5697
  - [x] closes #5700 
  - [x] closes #5727
  - [x] closes #5953
  - [x] closes #6183
  - [x] closes #6313
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- New functions/methods are listed in `api.rst` (new `Index` and `Indexes` API not public yet)

Follow-up on #5636 (work in progress), supersedes #2195.

This is likely to be going big, sorry in advance! It'll be safer to make a release before merging this PR.

Current progress:

- [x] create (default) indexes using the `Index` classes
  - [x] refactor default indexes created when 1st accessing `.xindexes` or `.indexes`
- [x] support for non-default indexes (no public API yet)
- [x] remove multi-index virtual coordinates (replace it by regular coordinates)
- [x] refactor internal (text / html) formatting functions
- [x] internal refactor of location-based selection (`.isel()`)
- [x] internal refactor of label-based selection (`.sel()`)
- [x] internal refactor of `.rename()`
  - Some changes in behavior (see comments below)
    - see #4108
    - see #4107
    - see #4417
- [x] internal refactor of `set_index` / `reset_index`
- [x] internal refactor of `stack` / `unstack`
    - Some changes in behavior (see comments below) 
- [x] internal refactor of `Dataset.to_stacked_array`
- [x] internal refactor of `swap_dims`
- [x] internal refactor of `expand_dims`
- [x] internal refactor of alignment
- [x] internal refactor of `reindex` and `reindex_like`
- [x] internal refactor of `interp` and `interp_like`
- [x] internal refactor of merge
- [x] internal refactor of concat
- [x] internal refactor of computation
- [x] internal refactor of copy
- [x] internal refactor of `update`, `assign`, `__setitem__`, `del`, `drop_vars`, etc.
    - updates must not corrupt multi-coordinate indexes 
- [x] internal refactor of `set_coords` and `reset_coords`
- internal refactor of `drop_sel` and `drop_isel` (maybe later)
- [x] internal refactor of `pad`
- [x] internal refactor of `shift`
- [x] internal refactor of `roll`

TODO:

- [x] Uniformize Index API with Xarray's API
    - [x] rename `Index.query()` -> `Index.sel()`?
    - [x] rename `PandasMultiIndex.from_product()` -> `PandasMultiIndex.stack()`? Add `Index.stack()` and `Index.unstack()`.
    - [x] remove `Index.union()` and `Index.intersection()`
- [x] Use `Index.create_variables()` internally
    - [x] remove `PandasIndex.from_pandas_index()` and `PandasMultiIndex.from_pandas_index()` (use constructor + `.create_variables()` instead)
- [x] Review where `.xindexes` is used and use private API instead (`._indexes`) if possible for speed
    - [x] requires that `_indexes` always returns a mapping
- [x] Use `from __future__ import annotations` in `indexes.py`
- [x] Re-activate default indexes invariant check (with opt-out for some tests)


In next PRs:

- custom `Index.__repr__` and `Index._repr_inline_`
- add an `Indexes` section in `DataArray` / `Dataset` reprs
- update public API (`set_index`, `reset_index`, `drop_indexes`, `Dataset` and `DataArray` constructors, etc.)
- allow multi-dimensional variables with `name` in `var.dims`",2021-08-11T15:57:41Z,2023-08-30T09:26:37Z,2022-03-17T17:11:44Z,2022-03-17T17:11:40Z,3ead17ea9e99283e2511b65b9d864d1c7b10b3c4,,,0,77fdaf0e3a268d1d1fbdb6c7aef9abfd07bf0d32,29a87cc110f1a1ff7b21c308ba7277963b51ada3,MEMBER,,13221727,https://github.com/pydata/xarray/pull/5692,
884210772,PR_kwDOAMm_X840s_xU,6385,closed,0,Fix concat with scalar coordinate,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6384
- [x] Tests added

",2022-03-20T16:46:48Z,2022-03-29T07:09:30Z,2022-03-21T04:49:23Z,2022-03-21T04:49:22Z,83f238a05a82fc85dcd7346f758ba3bea0416181,,,0,a91e6ee2728bb5b2768184d4e0cf1c261113f93e,073512ed3f997c0589af97eaf3d4b20796b18cf8,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6385,
884214603,PR_kwDOAMm_X840tAtL,6386,closed,0,Fix Dataset groupby returning a DataArray,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6379
- [x] Tests added
",2022-03-20T17:06:13Z,2022-03-29T07:09:30Z,2022-03-20T18:55:27Z,2022-03-20T18:55:26Z,fed852073eee883c0ed1e13e28e508ff0cf9d5c1,,,0,f4e8d48c4040f9165622baf48322771c376af39c,073512ed3f997c0589af97eaf3d4b20796b18cf8,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6386,
884218819,PR_kwDOAMm_X840tBvD,6387,closed,0,Fix concat with variable or dataarray as dim (propagate attrs),4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6380
- [x] Tests added
",2022-03-20T17:27:41Z,2022-03-29T07:09:29Z,2022-03-20T18:53:46Z,2022-03-20T18:53:46Z,03b6ba1e779b0d1829ca7b2e8f5da4d9c39ece6f,,,0,cd2ab9e1d605d6469178b24a39a14634f97b5c22,073512ed3f997c0589af97eaf3d4b20796b18cf8,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6387,
884252480,PR_kwDOAMm_X840tJ9A,6388,closed,0,isel: convert IndexVariable to Variable if index is dropped,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6381
- [x] Tests added
",2022-03-20T20:29:58Z,2022-03-29T07:10:08Z,2022-03-21T04:47:48Z,2022-03-21T04:47:47Z,067b2e86e6311e9c37e0def0c83cdb9a1a367a74,,,0,626f27966a52a5162f026ac042ccd18ec1592a22,fed852073eee883c0ed1e13e28e508ff0cf9d5c1,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6388,
884259571,PR_kwDOAMm_X840tLrz,6389,closed,0,Re-index: fix missing variable metadata,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6382
- [x] Tests added
",2022-03-20T21:11:38Z,2022-03-29T07:09:31Z,2022-03-21T07:53:05Z,2022-03-21T07:53:04Z,c604ee1fe852d51560100df6af79b4c28660f6b5,,,0,86b920ac931c9a78b067e08a84e3c587ec905047,fed852073eee883c0ed1e13e28e508ff0cf9d5c1,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6389,
884923775,PR_kwDOAMm_X840vt1_,6394,closed,0,Fix DataArray groupby returning a Dataset,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6393
- [x] Tests added
",2022-03-21T14:43:21Z,2022-03-29T07:09:30Z,2022-03-21T15:26:20Z,2022-03-21T15:26:20Z,321c5608a3be3cd4b6a4de3b658d1e2d164c0409,,,0,6123ae884795d08db6f4de736e5d52ef90648991,c604ee1fe852d51560100df6af79b4c28660f6b5,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6394,
886017261,PR_kwDOAMm_X840z4zt,6400,closed,0,Speed-up multi-index html repr + add display_values_threshold option,4160723,"This adds `PandasMultiIndexingAdapter._repr_html_` that can greatly speed-up the html repr of Xarray objects with
multi-indexes.

This optimized `_repr_html_` implementation is now used for formatting the array detailed view of all multi-index coordinates in the html repr, instead of converting the full index and each levels to numpy arrays before formatting them.

```python
import xarray as xr

ds = xr.tutorial.load_dataset(""air_temperature"")
da = ds[""air""].stack(z=[...])

da.shape 

# (3869000,)

%timeit -n 1 -r 1 da._repr_html_()

# 9.96 ms !
```

<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #5529
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
",2022-03-22T12:57:37Z,2022-03-29T07:10:22Z,2022-03-29T07:05:32Z,2022-03-29T07:05:32Z,d8fc34660f409d4c6a7ce9fe126d126e4f76c7fd,,,0,b8f732c61a86be5d1e8efbf3a906f9a5f69c31fd,728b648d5c7c3e22fe3704ba163012840408bf66,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6400,
891741295,PR_kwDOAMm_X841JuRv,6418,closed,0,Fix concat with scalar coordinate (dtype),4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6416
- [x] Tests added
",2022-03-28T12:22:50Z,2022-03-29T07:06:46Z,2022-03-28T16:05:01Z,2022-03-28T16:05:01Z,009b15461bf1ad4567e57742e44db4efa4e44cc7,,,0,5711dc21ff0711559214bde147cf3a20f6880f8e,728b648d5c7c3e22fe3704ba163012840408bf66,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6418,
900624195,PR_kwDOAMm_X841rm9D,6443,closed,0,Fix concat with scalar coordinate (wrong index type),4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6434
- [x] Tests added
",2022-04-05T19:16:30Z,2022-12-08T09:36:50Z,2022-04-06T01:19:48Z,2022-04-06T01:19:47Z,facafac359c39c3e940391a3829869b4a3df5d70,,,0,185b79199d25ff83dfdea944fa200342afc5e144,2eef20b74c69792bad11e5bfda2958dc8365513c,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6443,
998719144,PR_kwDOAMm_X847hz6o,6800,closed,0,"(scipy 2022 branch) Add an ""options"" argument to Index.from_variables()",4160723,"It allows passing options to the constructor of a custom `Index` subclass, in case there's any relevant build options to expose to users. This could for example be the distance metric chosen for an index based on `sklearn.neighbors.BallTree`, or the CRS definition for a geospatial index.

The `**options` arguments of `Dataset.set_xindex()` are passed through.

An alternative way would be to pass options via coordinate metadata, like the `spatial_ref` coordinate in rioxarray. Perhaps both alternatives may co-exist?

This PR also adds type annotations to `set_xindex()`.
",2022-07-17T20:01:00Z,2022-12-08T09:38:50Z,2022-09-02T13:54:46Z,,f4b214279bd34fe6c5bdebfb7f8f76e63e53d40c,,,0,46e19d493a18fc81f44129ff65441925080297b3,a5f068e0f6cb4d5ba8de5e10844ae2bfc4a56655,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6800,
1013692836,PR_kwDOAMm_X848a7mk,6857,closed,0,Fix aligned index variable metadata side effect,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6852
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
",2022-08-01T10:57:16Z,2022-12-08T09:36:49Z,2022-08-31T07:16:14Z,2022-08-31T07:16:14Z,4880012ddee9e43e3e18e95551876e9c182feafb,,,0,c39cdaa63f0d55b34cca1d04a24b1621801cc8e6,434f9e8929942afc2380eab52a07e77d30cc7885,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6857,
1042357878,PR_kwDOAMm_X84-IR52,6971,closed,0,Add set_xindex and drop_indexes methods,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6849
- [x] Supersedes #6800
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [x] New functions/methods are listed in `api.rst`

This PR adds Dataset and DataArray `.set_xindex` and `.drop_indexes` methods (the latter is also discussed in #4366). I've cherry picked the relevant commits in the `scipy22` branch and added a few more commits. This PR also allows passing build options to any `Index`.

Some comments and open questions:

- Should we make the `index_cls` argument of `set_xindex` optional?
  - I.e., `set_index(coord_names, index_cls=None, **options)` where a pandas index is created by default (or a pandas multi-index if several coordinate names are given), provided that the coordinate(s) are valid 1-d candidates.
  - This would be redundant with the existing `set_index` method, but this would be convenient if we later depreciate it.

- Should we depreciate `set_index` and `reset_index`? I think we should, but probably not at this point yet.

- There's a special case for multi-indexes where `set_xindex([""foo"", ""bar""], PandasMultiIndex)` adds a dimension coordinate in addition to the ""foo"" and ""bar"" level coordinates so that it is consistent with the rest of Xarray. I find it a bit annoying, though. Probably another motivation for depreciating this dimension coordinate.

- In this PR I also imported the `Index` base class in Xarray's root namespace. 
  - It is needed for custom indexes and it's just a little more convenient than importing it from `xarray.core.indexes`.
  - Should we do the same for `PandasIndex` and `PandasMultiIndex` subclasses? Maybe if one wants to create a custom index inheriting from it. `PandasMultiIndex` factory methods could be also useful if we depreciate passing `pd.MultiIndex` objects as DataArray / Dataset coordinates.
",2022-08-31T12:54:35Z,2022-12-08T09:38:13Z,2022-09-28T07:25:15Z,2022-09-28T07:25:15Z,e678a1d7884a3c24dba22d41b2eef5d7fe5258e7,,,0,b598447ba2e9c98bb1186719dc9bc6be95e13042,a042ae69c0444912f94bb4f29c93fa05046893ed,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6971,
1043726871,PR_kwDOAMm_X84-NgIX,6975,closed,0,Add documentation on custom indexes,4160723,"This PR documents the API of the `Index` base class and adds a guide for creating custom indexes (reworked from https://hackmd.io/Zxw_zCa7Rbynx_iJu6Y3LA). Hopefully it will help anyone experimenting with this feature.

@pydata/xarray your feedback would be very much appreciated! I've been into this for quite some time, so there may be things that seem obvious to me but that you can still find very confusing or non-intuitive. It would then deserve some extra or better explanation.

More specifically, I'm open to any suggestion on how to better illustrate this with clear and succinct examples.

There are other parts of the documentation that still need to be updated regarding the indexes refactor (e.g., ""dimension"" coordinates, `xindexes` property, set/drop indexes, etc.). But I suggest to do that in separate PRs and focus here on creating custom indexes.",2022-09-01T13:20:00Z,2023-08-30T09:10:34Z,2023-07-17T23:23:22Z,2023-07-17T23:23:22Z,7234603781768728b3fd544cdcaca991466d4a44,,,0,07814bc579a0687ddc4deef0a1825c16ba02333e,647376d1d2db3210c142d8204c1c3a7431b85b9a,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6975,
1046566934,PR_kwDOAMm_X84-YVgW,6992,closed,0,Review (re)set_index,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes
  - [x] fixes #6946
  - [x] fixes #6989
  - [x] fixes #6959
  - [x] fixes #6969
  - [x] fixes #7036
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`

Restore behavior prior to the explicit indexes refactor (i.e., refactored but without breaking changes).

TODO:

- [x] review `set_index`
- [x] review `reset_index`

For `reset_index`, the only behavior that is not restored here is the coordinate renamed with a `_` suffix when dropping a single index. This was originally to prevent any coordinate with no index matching a dimension name, which is now irrelevant. That is a quite dirty workaround and I don't know who is relying on it (no complaints yet), but I'm open to restore it if needed (esp. considering that we may later deprecate `reset_index` completely in favor of `drop_indexes` #6971).",2022-09-05T15:07:43Z,2023-08-30T09:05:10Z,2022-09-27T10:35:38Z,2022-09-27T10:35:38Z,a042ae69c0444912f94bb4f29c93fa05046893ed,,,0,ca01949cb889ee38aae33560b02de1f7625fd921,45c0a114e2b7b27b83c9618bc05b36afac82183c,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6992,
1047776643,PR_kwDOAMm_X84-c82D,6999,closed,0,Raise UserWarning when rename creates a new dimension coord,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6607
- [x] Closes #4107 
- [x] Closes #6229
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`

Current implemented ""fix"": raise a `UserWarning` and suggest using `swap_dims` (*)

Alternatively, we could:

- revert the breaking change (i.e., create the index again) and raise a `DeprecationWarning` instead
- raise an error instead of a warning

I don't have strong opinions on this, I'm happy to implement another alternative. The downside of reverting the breaking change now is that unfortunately it will introduce a breaking change in the next release., while workarounds are pretty straightforward.

(*) from https://github.com/pydata/xarray/issues/6607#issuecomment-1126587818, doing `ds.set_coords(['lon']).rename(x='lon').set_index(lon='lon')` is working too. With #6971, `.set_xindex('lon')` could work as well.
",2022-09-06T16:16:17Z,2022-12-08T09:38:13Z,2022-09-27T09:33:40Z,2022-09-27T09:33:40Z,45c0a114e2b7b27b83c9618bc05b36afac82183c,,,0,486f9b876c212cc3f2df7dd1438d1832ce5df03b,1f4be33365573da19a684dd7f2fc97ace5d28710,MEMBER,,13221727,https://github.com/pydata/xarray/pull/6999,
1048613040,PR_kwDOAMm_X84-gJCw,7003,closed,0,Misc. fixes for Indexes with pd.Index objects,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6987
- [x] Tests added
",2022-09-07T11:05:02Z,2022-12-08T09:36:51Z,2022-09-23T07:30:38Z,2022-09-23T07:30:38Z,9d1499e22e2748eeaf088e6a2abc5c34053bf37c,,,0,54271bd4cda67c5f5b8703095798c122b7e96b0c,5bec4662a7dd4330eca6412c477ca3f238323ed2,MEMBER,,13221727,https://github.com/pydata/xarray/pull/7003,
1048884296,PR_kwDOAMm_X84-hLRI,7004,open,0,Rework PandasMultiIndex.sel internals,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6838
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`

This PR hopefully improves how are handled the labels that are provided for multi-index level coordinates in `.sel()`.

More specifically, slices are handled in a cleaner way and it is now allowed to provide array-like labels.

`PandasMultiIndex.sel()` relies on the underlying `pandas.MultiIndex` methods like this:

- use ``get_loc`` when all levels are provided with each a scalar label (no slice, no array)
  - always drops the index and returns scalar coordinates for each multi-index level
- use ``get_loc_level`` when only a subset of levels are provided with scalar labels only
  - may collapse one or more levels of the multi-index (dropped levels result in scalar coordinates)
  - if only one level remains: renames the dimension and the corresponding dimension coordinate
- use ``get_locs`` for all other cases.
  - always keeps the multi-index and its coordinates (even if only one item or one level is selected)

This yields a predictable behavior: as soon as one of the provided labels is a slice or array-like, the multi-index and all its level coordinates are kept in the result.

Some cases illustrated below (I compare this PR with an older release due to the errors reported in #6838):

```python
import xarray as xr
import pandas as pd

midx = pd.MultiIndex.from_product([list(""abc""), range(4)], names=(""one"", ""two""))
ds = xr.Dataset(coords={""x"": midx})    
# <xarray.Dataset>
# Dimensions:  (x: 12)
# Coordinates:
#   * x        (x) object MultiIndex
#   * one      (x) object 'a' 'a' 'a' 'a' 'b' 'b' 'b' 'b' 'c' 'c' 'c' 'c'
#   * two      (x) int64 0 1 2 3 0 1 2 3 0 1 2 3
# Data variables:
#     *empty*
```

```python
ds.sel(one=""a"", two=0)

# this PR
#
# <xarray.Dataset>
# Dimensions:  ()
# Coordinates:
#     x        object ('a', 0)
#     one      <U1 'a'
#     two      int64 0
# Data variables:
#     *empty*
# 

# v2022.3.0
# 
# <xarray.Dataset>
# Dimensions:  ()
# Coordinates:
#     x        object ('a', 0)
# Data variables:
#     *empty*
# 
```

```python
ds.sel(one=""a"")

# this PR:
#
# <xarray.Dataset>
# Dimensions:  (two: 4)
# Coordinates:
#  * two      (two) int64 0 1 2 3
#    one      <U1 'a'
# Data variables:
#    *empty*
#

# v2022.3.0
# 
# <xarray.Dataset>
# Dimensions:  (two: 4)
# Coordinates:
#   * two      (two) int64 0 1 2 3
# Data variables:
#     *empty*
# 
```

```python
ds.sel(one=slice(""a"", ""b""))

# this PR
# 
# <xarray.Dataset>
# Dimensions:  (x: 8)
# Coordinates:
#   * x        (x) object MultiIndex
#   * one      (x) object 'a' 'a' 'a' 'a' 'b' 'b' 'b' 'b'
#   * two      (x) int64 0 1 2 3 0 1 2 3
# Data variables:
#     *empty*
# 

# v2022.3.0
# 
# <xarray.Dataset>
# Dimensions:  (two: 8)
# Coordinates:
#   * two      (two) int64 0 1 2 3 0 1 2 3
# Data variables:
#     *empty*
# 
```

```python
ds.sel(one=""a"", two=slice(1, 1))

# this PR
# 
# <xarray.Dataset>
# Dimensions:  (x: 1)
# Coordinates:
#   * x        (x) object MultiIndex
#   * one      (x) object 'a'
#   * two      (x) int64 1
# Data variables:
#     *empty*
# 

# v2022.3.0
# 
# <xarray.Dataset>
# Dimensions:  (x: 1)
# Coordinates:
#   * x        (x) MultiIndex
#   - one      (x) object 'a'
#   - two      (x) int64 1
# Data variables:
#     *empty*
# 
```

```python
ds.sel(one=[""b"", ""c""], two=[0, 2])

# this PR
# 
# <xarray.Dataset>
# Dimensions:  (x: 4)
# Coordinates:
#   * x        (x) object MultiIndex
#   * one      (x) object 'b' 'b' 'c' 'c'
#   * two      (x) int64 0 2 0 2
# Data variables:
#     *empty*
# 

# v2022.3.0
# 
# ValueError: Vectorized selection is not available along coordinate 'one' (multi-index level)
# 
```






",2022-09-07T14:57:29Z,2022-09-22T20:38:41Z,,,0a4b1aafbe66a857de627cf180eba8713ca9a85d,,,0,00baaddefae0a189874ca64d9f4be4d2d83cc744,5bec4662a7dd4330eca6412c477ca3f238323ed2,MEMBER,,13221727,https://github.com/pydata/xarray/pull/7004,
1070271669,PR_kwDOAMm_X84_ywy1,7101,closed,0,Fix Dataset.assign_coords overwriting multi-index,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #7097
- [x] Tests added

@dcherian the `DeprecationWarning` was ignored by default for `.assign_coords()` because of https://github.com/pydata/xarray/pull/6798#discussion_r924653224. I changed it to `FutureWarning` so that it is shown for both `.assign()` and `.assign_coords()`.
",2022-09-28T16:21:48Z,2022-12-08T09:36:50Z,2022-09-28T18:02:16Z,2022-09-28T18:02:16Z,513ee34f16cc8f9250a72952e33bf9b4c95d33d1,,,0,ee9b027c0e41de15fc4960dde9e4c551d7d2a9df,e678a1d7884a3c24dba22d41b2eef5d7fe5258e7,MEMBER,,13221727,https://github.com/pydata/xarray/pull/7101,
1071450326,PR_kwDOAMm_X84_3QjW,7105,closed,0,Fix to_index(): return multiindex level as single index,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6836
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
",2022-09-29T14:44:22Z,2022-12-08T09:36:51Z,2022-10-12T14:12:48Z,2022-10-12T14:12:48Z,f93b467db5e35ca94fefa518c32ee9bf93232475,,,0,e9a75b746d68fba12216a1f455252cd9fa4c3ebf,50ea159bfd0872635ebf4281e741f3c87f0bef6b,MEMBER,,13221727,https://github.com/pydata/xarray/pull/7105,
1090510499,PR_kwDOAMm_X85A_96j,7182,open,0,add MultiPandasIndex helper class,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [ ] Closes #xxxx
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`

This PR adds a `xarray.indexes.MultiPandasIndex` helper class for building custom, meta-indexes that encapsulate multiple `PandasIndex` instances. Unlike `PandasMultiIndex`, the meta-index classes inheriting from this helper class may encapsulate loosely coupled (pandas) indexes, with coordinates of arbitrary dimensions (each coordinate must be 1-dimensional but an Xarray index may be created from coordinates with differing dimensions).

Early prototype in this [notebook](https://notebooksharing.space/view/3d599addf8bd6b06a6acc241453da95e28c61dea4281ecd194fbe8464c9b296f#displayOptions=)

TODO / TO FIX:

- How to allow custom `__init__` options in subclasses be passed to all the `type(self)(new_indexes)` calls inside the `MultiPandasIndex` ""base"" class? This could be done via `**kwargs` passed through... However, mypy will certainly complain (Liskov Substitution Principle).
- Is `MultiPandasIndex` a good name for this helper class?",2022-10-18T09:42:58Z,2023-08-23T16:30:28Z,,,6633615eca663c879bba4e9a144050c4aaa7555f,,,1,e4d753c3bf3ffdc30864510885c68fdb2e8349a2,ab726c536464fbf4d8878041f950d2b0ae09b862,MEMBER,,13221727,https://github.com/pydata/xarray/pull/7182,
1098978950,PR_kwDOAMm_X85BgRaG,7214,closed,0,Pass indexes directly to the DataArray and Dataset constructors,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6392
- [x] Closes #6633 ? 
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`

From https://github.com/pydata/xarray/issues/6392#issuecomment-1290454937:

I'm thinking of only accepting one or more instances of [Indexes](https://github.com/pydata/xarray/blob/e678a1d7884a3c24dba22d41b2eef5d7fe5258e7/xarray/core/indexes.py#L1030) as indexes argument in the Dataset and DataArray constructors. The only exception is when `fastpath=True` a mapping can be given directly. Also, when an empty collection of indexes is passed this skips the creation of default pandas indexes for dimension coordinates.

- It is much easier to handle: just check that keys returned by `Indexes.variables` do no conflict with the coordinate names in the `coords` argument
- It is slightly safer: it requires the user to explicitly create an `Indexes` object, thus with less chance to accidentally provide coordinate variables and index objects that do not relate to each other (we could probably add some safe guards in the `Indexes` class itself)
- It is more convenient: an Xarray `Index` may provide a factory method that returns an instance of `Indexes` that we just need to pass as indexes, and we could also do something like `ds = xr.Dataset(indexes=other_ds.xindexes)`

",2022-10-25T14:16:44Z,2023-08-30T09:11:56Z,2023-07-18T11:52:11Z,,b3a3fd5a537d8000baf8ece3093a60ea14406ecc,,,1,ddd505e6af5270e143ee814485d5b4665456d77f,6e77f5e8942206b3e0ab08c3621ade1499d8235b,MEMBER,,13221727,https://github.com/pydata/xarray/pull/7214,
1142893563,PR_kwDOAMm_X85EHyv7,7347,closed,0,Fix assign_coords resetting all dimension coords to default index,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #7346
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
",2022-12-02T08:19:01Z,2022-12-08T09:36:49Z,2022-12-02T16:32:40Z,2022-12-02T16:32:40Z,8938d390a969a94275a4d943033a85935acbce2b,,,0,23d9889d11b181c94db2b5e8fe33073a1328be1f,92e7cb5b21a6dee7f7333c66e41233205c543bc1,MEMBER,,13221727,https://github.com/pydata/xarray/pull/7347,
1154470307,PR_kwDOAMm_X85Ez9Gj,7368,closed,0,"Expose ""Coordinates"" as part of Xarray's public API",4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #7214
- [x] Closes #6392
- [x] xref #6633
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [x] New functions/methods are listed in `api.rst`

This is a rework of #7214. It follows the suggestions made in https://github.com/pydata/xarray/pull/7214#issuecomment-1295283938, https://github.com/pydata/xarray/pull/7214#issuecomment-1297046405 and https://github.com/pydata/xarray/pull/7214#issuecomment-1293774799:

- No `indexes` argument is added to `Dataset.__init__`, and the `indexes` argument of `DataArray.__init__` is kept private (i.e., valid only if fastpath=True)
- When a `Coordinates` object is passed to a new Dataset or DataArray via the `coords` argument, both coordinate variables and indexes are copied/extracted and added to the new object
- This PR also adds ~~an `IndexedCoordinates` subclass~~ `Coordinates` public constructors used to create Xarray coordinates and indexes from non-Xarray objects. For example, the `Coordinates.from_pandas_multiindex()` class method creates a new set of index and coordinates from an existing `pd.MultiIndex`.

EDIT: `IndexCoordinates` has been merged with `Coordinates`

EDIT2: it ended up as a pretty big refactor with the promotion of `Coordinates` has a 2nd-class Xarray container that supports alignment like Dataset and DataArray. It is still quite advanced API, useful for passing coordinate variables and indexes around. Internally, `Coordinates` objects are still ""virtual"" containers (i.e., proxies for coordinate variables and indexes stored in their corresponding DataArray or Dataset objects). For now, a ""stand-alone"" `Coordinates` object created from scratch wraps a Dataset with no data variables.

Some examples of usage:

```python
import pandas as pd
import xarray as xr

midx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""one"", ""two""))

coords = xr.Coordinates.from_pandas_multiindex(midx, ""x"")
# Coordinates:
#   * x        (x) object MultiIndex
#   * one      (x) object 'a' 'a' 'b' 'b'
#   * two      (x) int64 1 2 1 2

ds = xr.Dataset(coords=coords)
# <xarray.Dataset>
# Dimensions:  (x: 4)
# Coordinates:
#   * x        (x) object MultiIndex
#   * one      (x) object 'a' 'a' 'b' 'b'
#   * two      (x) int64 1 2 1 2
# Data variables:
#     *empty*

ds_to_be_deprecated = xr.Dataset(coords={""x"": midx})
ds_to_be_deprecated.identical(ds)
# True

da = xr.DataArray([1, 2, 3, 4], dims=""x"", coords=ds.coords)
# <xarray.DataArray (x: 4)>
# array([1, 2, 3, 4])
# Coordinates:
#   * x        (x) object MultiIndex
#   * one      (x) object 'a' 'a' 'b' 'b'
#   * two      (x) int64 1 2 1 2
```

TODO:

- [x] update `assign_coords` too so it has the same behavior if a `Coordinates` object is passed?
- [x] How to avoid building any default index? It seems silly to add or use the `indexes` argument just for that purpose? ~~We could address that later.~~ Solution: wrap the coordinates dict in a Coordinates objects, e.g., `ds = xr.Dataset(coords=xr.Coordinates(coords_dict))`.

@shoyer, @dcherian, anyone -- what do you think about the approach proposed here? I'd like to check that with you before going further with tests, docs, etc.
 ",2022-12-08T16:59:29Z,2023-08-30T09:11:57Z,2023-07-21T20:40:03Z,2023-07-21T20:40:03Z,4441f9915fa978ad5b276096ab67ba49602a09d2,,,0,4ef5f17db6d2aefd91fb02485ab7a815fe460b47,6b1ff6d13bf360df786500dfa7d62556d23e6df9,MEMBER,,13221727,https://github.com/pydata/xarray/pull/7368,
1166747288,PR_kwDOAMm_X85FiyaY,7382,closed,0,Some alignment optimizations,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Benchmark added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`

May fix some performance regressions, e.g., see https://github.com/pydata/xarray/issues/7376#issuecomment-1352989233.

@ravwojdyla with this PR `ds.assign(foo=~ds[""d3""])` in your example should be much faster (on par with version 2022.3.0).",2022-12-15T12:54:56Z,2023-08-30T09:05:24Z,2023-01-05T21:25:55Z,2023-01-05T21:25:55Z,d6d24507793af9bcaed79d7f8d3ac910e176f1ce,,,0,95be2d07403a8e061df19f682db42ad273c62745,b93dae4079daf0fc4c042fd0d699c16624430cdc,MEMBER,,13221727,https://github.com/pydata/xarray/pull/7382,
1465015830,PR_kwDOAMm_X85XUl4W,8051,open,0,Allow setting (or skipping) new indexes in open_dataset,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #6633
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`

This PR introduces a new boolean parameter `set_indexes=True` to `xr.open_dataset()`, which may be used to skip the creation of default (pandas) indexes when opening a dataset.

Currently works with the Zarr backend:

```python
import numpy as np
import xarray as xr

# example dataset (real dataset may be much larger)
arr = np.random.random(size=1_000_000)
xr.Dataset({""x"": arr}).to_zarr(""dataset.zarr"")


xr.open_dataset(""dataset.zarr"", set_indexes=False, engine=""zarr"")
# <xarray.Dataset>
# Dimensions:  (x: 1000000)
# Coordinates:
#     x        (x) float64 ...
# Data variables:
#     *empty*


xr.open_zarr(""dataset.zarr"", set_indexes=False)
# <xarray.Dataset>
# Dimensions:  (x: 1000000)
# Coordinates:
#     x        (x) float64 ...
# Data variables:
#     *empty*
```


I'll add it to the other Xarray backends as well, but I'd like to get your thoughts about the API first.

1. Do we want to add yet another keyword parameter to `xr.open_dataset()`? There are already many...
2. Do we want to add this parameter to the `BackendEntrypoint.open_dataset()` API?
  - I'm afraid we must do it if we want this parameter in `xr.open_dataset()`
  - this would also make it possible skipping the creation of custom indexes (if any) in custom IO backends
  - con: if we require `set_indexes` in the signature in addition to the `drop_variables` parameter, this is a breaking change for all existing 3rd-party backends. Or should we group `set_indexes` with the other xarray decoder kwargs? This would feel a bit odd to me as setting indexes is different from decoding data.
3. Or should we leave this up to the backends?
  - pros: no breaking change, more flexible (3rd party backends may want to offer more control like choosing between custom indexes and default pandas indexes or skipping the creation of indexes by default)
  - cons: less discoverable, consistency is not enforced across 3rd party backends (although for such advanced case this is probably OK), not available by default in every backend.

Currently 1 and 2 are implemented in this PR, although as I write this comment I think that I would prefer 3. I guess this depends on whether we prefer `open_***` vs. `xr.open_dataset(engine=""***"")` and unless I missed something there is still no real consensus about that? (e.g., #7496).

",2023-08-07T10:53:46Z,2024-02-03T19:12:48Z,,,0b37c66130416f202c3b8ee2302ee9ea517bdadd,,,0,eae983bb6b7ee916e5c8956b6af42c2207ad48d1,c9ba2be2690564594a89eb93fb5d5c4ae7a9253c,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8051,
1482940936,PR_kwDOAMm_X85YY-II,8094,closed,0,Refactor update coordinates to better handle multi-coordinate indexes,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #7563
- [x] Closes #8039
- [x] Closes #8056
- [x] Closes #7885
- [x] Closes #7921
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`

This refactor should better handle multi-coordinate indexes when updating (or assigning) new coordinates.

It also fixes, better isolates and better warns a bunch of deprecated pandas multi-index special cases (i.e., directly passing `pd.MultiIndex` objects or updating a multi-index dimension coordinate). I very much look forward to seeing support for those cases dropped :).
",2023-08-21T13:57:38Z,2023-08-30T09:06:28Z,2023-08-29T14:23:29Z,2023-08-29T14:23:29Z,1fedfd86604f87538d1953b01d6990c2c89fcbf3,,,0,748ee246821f5c308fc52e29c5d6b1d5f628cacf,42d42bab5811702e56c638b9489665d3c505a0c1,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8094,
1486052929,PR_kwDOAMm_X85Yk15B,8102,closed,0,Add `Coordinates.assign()` method,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [x] New functions/methods are listed in `api.rst`

This is consistent with the Dataset and DataArray `assign` methods (now that `Coordinates` is also exposed as public API).

This allows writing:

```python
midx = pd.MultiIndex.from_arrays([[""a"", ""a"", ""b"", ""b""], [0, 1, 0, 1]])
midx_coords = xr.Coordinates.from_pandas_multiindex(midx, ""x"")

ds = xr.Dataset(coords=midx_coords.assign(y=[1, 2]))
```

which is quite common (at least in the tests) and a bit nicer than

```python
ds = xr.Dataset(coords=midx_coords.merge({""y"": [1, 2]}).coords)
```",2023-08-23T09:15:51Z,2023-09-01T13:28:16Z,2023-09-01T13:28:16Z,2023-09-01T13:28:16Z,71177d481eb0c3547cb850a4b3e866af6d4fded7,,,0,6f1dfed9dac9bcecb6b9b8bd1abd20d5cb388f68,1043a9e13574e859ec08d19425341b2e359d2802,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8102,
1486710446,PR_kwDOAMm_X85YnWau,8104,closed,0,Fix merge with compat=minimal (coord names),4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #7405
- [x] Closes #7588 
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`


",2023-08-23T16:20:48Z,2023-08-30T09:11:18Z,2023-08-30T07:57:35Z,2023-08-30T07:57:35Z,b136fcb679e9e70fd44b60688d96e75d4e3f8dcb,,,0,613eb1337d38f6b92434feaffb12b4f99e597cf0,1fedfd86604f87538d1953b01d6990c2c89fcbf3,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8104,
1487073982,PR_kwDOAMm_X85YovK-,8107,closed,0,Better default behavior of the Coordinates constructor,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`

After working more on `Coordinates` I realize that the default behavior of its constructor could be more consistent with other Xarray objects. This PR changes this default behavior such that:

- Pandas indexes are created for dimension coordinates if `indexes=None` (default). To create dimension coordinates with no index, just pass `indexes={}`.
- If another `Coordinates` object is passed as input, its indexes are also added to the new created object. Since we don't support alignment / merge here, the following call raises an error: `xr.Coordinates(coords=xr.Coordinates(...), indexes={...})`.

This PR introduces a breaking change since `Coordinates` are now exposed in v2023.8.0, which has just been released. It is a bit unfortunate but I think it may be OK for a fresh feature, especially if the next release will be soon after this one.",2023-08-23T21:42:51Z,2024-02-04T18:32:42Z,2023-08-31T07:35:47Z,2023-08-31T07:35:47Z,0f9f790c7e887bbfd13f4026fd1d37e4cd599ff1,,,0,bce000cff6be4cf9d42454da4c370685e9dad051,42d42bab5811702e56c638b9489665d3c505a0c1,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8107,
1487590692,PR_kwDOAMm_X85YqtUk,8109,closed,0,Better error message when trying to set an index from a scalar coordinate,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #4091
- [x] Tests added

The message suggests using `.expand_dims()`.",2023-08-24T08:18:13Z,2023-08-30T09:27:27Z,2023-08-30T07:13:15Z,2023-08-30T07:13:15Z,e5a38f6837ae9b9aa28a4bd063620a1cd802e093,,,0,a1d70aa0aca1fb33b611a23697e5af04b34b2c7c,42d42bab5811702e56c638b9489665d3c505a0c1,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8109,
1488345780,PR_kwDOAMm_X85Ytlq0,8111,open,0,Alignment: allow flexible index coordinate order,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #7002
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`

This PR relaxes some of the rules used in alignment for finding the indexes to compare or join together. Those indexes must still be of the same type and must relate to the same set of coordinates (and dimensions), but the order of coordinates is now ignored.

It is up to the index to implement the equal / join logic if it needs to care about that order.

Regarding `pandas.MultiIndex`, it seems that the level names are ignored when comparing indexes:

```python
midx = pd.MultiIndex.from_product([[""a"", ""b""], [0, 1]], names=(""one"", ""two"")))
midx2 = pd.MultiIndex.from_product([[""a"", ""b""], [0, 1]], names=(""two"", ""one""))

midx.equals(midx2)  # True
```

However, in Xarray the names of the multi-index levels (and their order) matter since each level has its own xarray coordinate. In this PR, `PandasMultiIndex.equals()` and `PandasMultiIndex.join()` thus check that the level names match. ",2023-08-24T16:18:49Z,2023-09-28T15:58:38Z,,,79103728908c37d32bc902cd7bcc583363ce9bd9,,,0,0645c4b813908104c27ace51fce16ac053c6e1e8,42d42bab5811702e56c638b9489665d3c505a0c1,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8111,
1492188700,PR_kwDOAMm_X85Y8P4c,8118,open,0,Add Coordinates `set_xindex()` and `drop_indexes()` methods,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- Complements #8102
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`

I don't think that we need to copy most API from Dataset / DataArray to `Coordinates`, but I find it convenient to have some relevant methods there too. For example, building Coordinates from scratch (with custom indexes) before passing the whole coords + indexes bundle around:

```python
import dask.array as da
import numpy as np
import xarray as xr

coords = (
    xr.Coordinates(
        coords={""x"": da.arange(100_000_000), ""y"": np.arange(100)},
        indexes={},
    )
    .set_xindex(""x"", DaskIndex)
    .set_xindex(""y"", xr.indexes.PandasIndex)
)

ds = xr.Dataset(coords=coords)

# <xarray.Dataset>
# Dimensions:  (x: 100000000, y: 100)
# Coordinates:
#   * x        (x) int64 dask.array<chunksize=(16777216,), meta=np.ndarray>
#   * y        (y) int64 0 1 2 3 4 5 6 7 8 9 10 ... 90 91 92 93 94 95 96 97 98 99
# Data variables:
#     *empty*
# Indexes:
#     x        DaskIndex
```

 ",2023-08-28T14:28:24Z,2023-09-19T01:53:18Z,,,664b100ba033d892b0894c82c49c18fc71b3f7be,,,0,13ebc667add99d53fe5619de8206ce745e453829,828ea08aa74d390519f43919a0e8851e29091d00,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8118,
1496182200,PR_kwDOAMm_X85ZLe24,8124,open,0,More flexible index variables,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [ ] Closes #xxxx
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`

The goal of this PR is to provide a more general solution to indexed coordinate variables, i.e., support arbitrary dimensions and/or duck arrays for those variables while at the same time prevent them from being updated in a way that would invalidate their index.

This would solve problems like the one mentioned here: https://github.com/pydata/xarray/issues/1650#issuecomment-1697237429

@shoyer I've tried to implement what you have suggested in https://github.com/pydata/xarray/pull/4979#discussion_r589798510. It would be nice indeed if eventually we could get rid of `IndexVariable`. It won't be easy to deprecate it until we finish the index refactor (i.e., all methods listed in #6293), though. Also, I didn't find an easy way to  refactor that class as it has been designed too closely around a 1-d variable backed by a `pandas.Index`. 

So the approach implemented in this PR is to keep using `IndexVariable` for PandasIndex until we can deprecate / remove it later, and for the other cases use `Variable` with data wrapped in a custom `IndexedCoordinateArray` object.

The latter solution (wrapper) doesn't always work nicely, though. For example, several methods of `Variable` expect that `self._data` directly returns a duck array (e.g., a dask array or a chunked duck array). A wrapped duck array will result in unexpected behavior there. We could probably add some checks / indirection or extend the wrapper API... But I wonder if there wouldn't be a more elegant approach?

More generally, which operations should we allow / forbid / skip for an indexed coordinate variable?

- Set array items in-place? Do not allow.
- Replace data? Do not allow.
- (Re)Chunk?
- Load lazy data?
- ... ?

(Note: we could add `Index.chunk()` and `Index.load()` methods in order to allow an Xarray index implement custom logic for the two latter cases like, e.g., convert a DaskIndex to a PandasIndex during load, see #8128).

cc @andersy005 (some changes made here may conflict with what you are refactoring in #8075).

",2023-08-30T21:45:12Z,2023-08-31T16:02:20Z,,,8b84dc392e5443f9ada245cb6a6f31d8f19327df,,,1,09f3ed0acd119fcefa07652bbc40dff96db2f66c,0f9f790c7e887bbfd13f4026fd1d37e4cd599ff1,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8124,
1497266410,PR_kwDOAMm_X85ZPnjq,8128,open,0,Add Index.load() and Index.chunk() methods,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [ ] Closes #xxxx
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`

As mentioned in #8124, it gives more control to custom Xarray indexes on what best to do when the Dataset / DataArray `load()` and `chunk()` counterpart methods are called.

`PandasIndex.load()` and `PandasIndex.chunk()` always return self (no action required).

For a DaskIndex, we might want to return a PandasIndex (or another non-lazy index) from `load()` and rebuild  a DaskIndex object from `chunk()` (rechunk).",2023-08-31T14:16:27Z,2023-08-31T15:49:06Z,,,a1842563887f8375fb3a03824189a75e6f080c96,,,1,4506cb600caba75f163c088171f590b67f59264b,1043a9e13574e859ec08d19425341b2e359d2802,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8128,
1500283634,PR_kwDOAMm_X85ZbILy,8140,open,0,Deprecate passing pd.MultiIndex implicitly,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- Follow-up #8094
- [x] Closes #6481
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`

This PR should normally raise a warning *each time* when indexed coordinates are created implicitly from a `pd.MultiIndex` object.

I updated the tests to create coordinates explicitly using `Coordinates.from_pandas_multiindex()`.

I also refactored some parts where a `pd.MultiIndex` could still be passed and promoted internally, with the exception of:

- `swap_dims()`: it should raise a warning! Right now the warning message is a bit confusing for this case, but instead of adding a special case we should probably deprecate the whole method? As it is suggested as a TODO comment... This method was to circumvent the limitations of dimension coordinates, which isn't needed anymore (`rename_dims` and/or `set_xindex` is equivalent and less confusing).
- `xr.DataArray(pandas_obj_with_multiindex, dims=...)`: I guess it should raise a warning too?
- `da.stack(z=...).groupby(""z"")`: it shoudn't raise a warning, but this requires a (heavy?) refactoring of groupby. During building the ""grouper"" objects, `grouper.group1d` or `grouper.unique_coord` may still be built by extracting only the multi-index dimension coordinate. I'd greatly appreciate if anyone familiar with the groupby implementation could help me with this! @dcherian ?",2023-09-03T14:01:18Z,2023-11-15T20:15:00Z,,,ddb96c1f3a6fc2bcddea2432af311c5cbfcfc492,,,0,ef7dae0893f6701a203f8ec3c2e655bff7944b91,e2b6f3468ef829b8a83637965d34a164bf3bca78,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8140,
1500744603,PR_kwDOAMm_X85Zc4ub,8141,closed,0,Fix doctests: pandas 2.1 MultiIndex repr with nan,4160723,,2023-09-04T07:08:55Z,2023-09-05T08:35:37Z,2023-09-05T08:35:36Z,2023-09-05T08:35:36Z,f13da94db8ab4b564938a5e67435ac709698f1c9,,,0,445e6c923d112d584c714df3bf3ba2fbab004d3e,e9c1962f31a7b5fd7a98ee4c2adf2ac147aabbcf,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8141,
1500931269,PR_kwDOAMm_X85ZdmTF,8142,closed,0,Dirty workaround for mypy 1.5 error,4160723,"I wanted to fix the following error with mypy 1.5:

```
xarray/core/dataset.py:505: error: Definition of ""__eq__"" in base class ""DatasetOpsMixin"" is incompatible with definition in base class ""Mapping""  [misc]
```

Which looks similar to https://github.com/python/mypy/issues/9319. It is weird that here it worked with mypy versions < 1.5, though.

I don't know if there is a better fix, but I thought that redefining `__eq__` in `Dataset` would be a bit less dirty workaround than adding `type: ignore` in the class declaration.


",2023-09-04T09:21:18Z,2023-09-07T16:04:55Z,2023-09-07T08:21:12Z,2023-09-07T08:21:12Z,e2b6f3468ef829b8a83637965d34a164bf3bca78,,,0,46bd88fbea07d52f06eab5d11ca3f72b547af263,f13da94db8ab4b564938a5e67435ac709698f1c9,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8142,
1501219392,PR_kwDOAMm_X85ZespA,8143,open,0,Deprecate the multi-index dimension coordinate,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`

This PR adds a `future_no_mindex_dim_coord=False` option that, if set to True, enables the future behavior of `PandasMultiIndex` (i.e., no added dimension coordinate with tuple values):

```python
import xarray as xr

ds = xr.Dataset(coords={""x"": [""a"", ""b""], ""y"": [1, 2]})

ds.stack(z=[""x"", ""y""])

# <xarray.Dataset>
# Dimensions:  (z: 4)
# Coordinates:
#   * z        (z) object MultiIndex
#   * x        (z) <U1 'a' 'a' 'b' 'b'
#   * y        (z) int64 1 2 1 2
# Data variables:
#     *empty*

with xr.set_options(future_no_mindex_dim_coord=True):
    ds.stack(z=[""x"", ""y""])

# <xarray.Dataset>
# Dimensions:  (z: 4)
# Coordinates:
#   * x        (z) <U1 'a' 'a' 'b' 'b'
#   * y        (z) int64 1 2 1 2
# Dimensions without coordinates: z
# Data variables:
#     *empty*
```

There are a few other things that we'll need to adapt or deprecate:

- Dropping multi-index dimension coordinate *de-facto* allows having several multi-indexes along the same dimension. Normally `stack` should already take this into account, but there may be other places where this is not yet supported or where we should raise an explicit error.
- Deprecate `Dataset.reorder_levels`: API is not compatible with the absence of dimension coordinate and several multi-indexes along the same dimension. I think it is OK to deprecate such edge case, which alternatively could be done by extracting the pandas index, updating it and then re-assign it to a the dataset with `assign_coords(xr.Coordinates.from_pandas_multiindex(...))`
- The text-based repr: in the example above, `Dimensions without coordinate: z` doesn't make much sense
- ... ?

I started updating the tests, although this will be much easier once #8140 is merged. This is something that we could also easily split into multiple PRs. It is probably OK if some features are (temporarily) breaking badly when setting `future_no_mindex_dim_coord=True`.

",2023-09-04T12:32:36Z,2023-09-04T12:32:48Z,,,d0709f6d90e3f71d78e562c15b1662a423d8e3e9,,,0,87d5bf72e766b101db32dc65e6a79957368812ee,71177d481eb0c3547cb850a4b3e866af6d4fded7,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8143,
1509661685,PR_kwDOAMm_X85Z-5v1,8170,open,0,Dataset.from_dataframe: optionally keep multi-index unexpanded,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #8166
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`

I added both the `unstack` and `dim` arguments but we can change that.

- [ ] update `DataArray.from_series()`",2023-09-11T06:20:17Z,2023-09-11T06:20:17Z,,,d3c6c4785be4a17946c88907176833e8bdabcd67,,,1,1afef691db8879526212a504bb42dbfc6f81878a,2951ce0215f14a8a79ecd0b5fc73a02a34b9b86b,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8170,
1696970326,PR_kwDOAMm_X85lJbZW,8672,closed,0,Fix multiindex level serialization after reset_index,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #8628
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
",2024-01-26T10:40:42Z,2024-02-23T01:22:17Z,2024-01-31T17:42:29Z,2024-01-31T17:42:29Z,f9f4c730254073f0f5a8fce65f4bbaa0eefec5fd,,,0,72f319f5c4259c19aabf223faa1d9a51ba035887,ca4f12133e9643c197facd17b54d5040a1bda002,MEMBER,"{""enabled_by"": {""login"": ""dcherian"", ""id"": 2448579, ""node_id"": ""MDQ6VXNlcjI0NDg1Nzk="", ""avatar_url"": ""https://avatars.githubusercontent.com/u/2448579?v=4"", ""gravatar_id"": """", ""url"": ""https://api.github.com/users/dcherian"", ""html_url"": ""https://github.com/dcherian"", ""followers_url"": ""https://api.github.com/users/dcherian/followers"", ""following_url"": ""https://api.github.com/users/dcherian/following{/other_user}"", ""gists_url"": ""https://api.github.com/users/dcherian/gists{/gist_id}"", ""starred_url"": ""https://api.github.com/users/dcherian/starred{/owner}{/repo}"", ""subscriptions_url"": ""https://api.github.com/users/dcherian/subscriptions"", ""organizations_url"": ""https://api.github.com/users/dcherian/orgs"", ""repos_url"": ""https://api.github.com/users/dcherian/repos"", ""events_url"": ""https://api.github.com/users/dcherian/events{/privacy}"", ""received_events_url"": ""https://api.github.com/users/dcherian/received_events"", ""type"": ""User"", ""site_admin"": false}, ""merge_method"": ""squash"", ""commit_title"": ""Fix multiindex level serialization after reset_index (#8672)"", ""commit_message"": ""* fix serialize multi-index level coord after reset\r\n\r\n* add regression test\r\n\r\n* update what's new\r\n\r\n---------\r\n\r\nCo-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>""}",13221727,https://github.com/pydata/xarray/pull/8672,
1797701340,PR_kwDOAMm_X85rJr7c,8888,open,0,to_base_variable: coerce multiindex data to numpy array,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #8887, and probably supersedes #8809
- [x] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- ~~New functions/methods are listed in `api.rst`~~

@slevang this should also make work your test case added in #8809. I haven't added it here, instead I added a basic check that should be enough.

I don't really understand why the serialization backends (zarr?) do not seem to work with the `PandasMultiIndexingAdapter.__array__()` implementation, which should normally coerce the multi-index levels into numpy arrays as needed. Anyway, I guess that coercing it early like in this PR doesn't hurt and may avoid the confusion of a non-indexed, isolated coordinate variable that still wraps a pandas.MultiIndex. ",2024-03-29T10:10:42Z,2024-03-29T15:54:19Z,,,0f5c78efff8fdc024de20a178acf3ae7ac62f84e,,,0,dd9c3b4ad88b6694b6e737e86e80ad1dcfa1527c,2120808bbe45f3d4f0b6a01cd43bac4df4039092,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8888,
1808774743,PR_kwDOAMm_X85rz7ZX,8911,open,0,Refactor swap dims,4160723,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [ ] Attempt at fixing #8646
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`

I've tried here re-implementing `swap_dims` using `rename_dims`, `drop_indexes` and `set_xindex`. This fixes the example in #8646 but unfortunately this fails at handling the pandas multi-index special case (i.e., a single non-dimension coordinate wrapping a `pd.MultiIndex` that is promoted to a dimension coordinate in `swap-dims` auto-magically results in a `PandasMultiIndex` with both dimension and level coordinates).

 ",2024-04-05T08:45:49Z,2024-04-17T16:46:34Z,,,36231f3beea60c788054877f91689d3469f84cbc,,,1,4102b9f67e5c28b85a154cf7ff0749e1f8f1a258,56182f73c56bc619a18a9ee707ef6c19d54c58a2,MEMBER,,13221727,https://github.com/pydata/xarray/pull/8911,