home / github

Menu
  • Search all tables
  • GraphQL API

pull_requests

Table actions
  • GraphQL API for pull_requests

62 rows where user = 4160723

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: state, draft, created_at (date), updated_at (date), closed_at (date), merged_at (date)

id ▼ node_id number state locked title user body created_at updated_at closed_at merged_at merge_commit_sha assignee milestone draft head base author_association auto_merge repo url merged_by
64042989 MDExOlB1bGxSZXF1ZXN0NjQwNDI5ODk= 802 closed 0 Multi-index indexing benbovy 4160723 Follows #767. This is incomplete (it still needs some tests and documentation updates), but it is working for both `Dataset` and `DataArray` objects. I also don't know if it is fully compatible with lazy indexing (Dask). Using the example from #767: ``` In [4]: da.sel(band_wavenumber={'band': 'foo'}) Out[4]: <xarray.DataArray (wavenumber: 2)> array([ 0.00017, 0.00014]) Coordinates: * wavenumber (wavenumber) float64 4.05e+03 4.05e+03 ``` As shown in this example, similarily to pandas, it automatically renames the dimension and assigns a new coordinate when the selection doesn't return a `pd.MultiIndex` (here it returns a `pd.FloatIndex`). In some cases this behavior may be unwanted (??), so I added a `drop_level` keyword argument (if `False` it keeps the multi-index and doesn't change the dimension/coordinate names): ``` In [5]: da.sel(band_wavenumber={'band': 'foo'}, drop_level=False) Out[5]: <xarray.DataArray (band_wavenumber: 2)> array([ 0.00017, 0.00014]) Coordinates: * band_wavenumber (band_wavenumber) object ('foo', 4050.2) ('foo', 4050.3) ``` Note that it also works with `DataArray.loc`, but (for now) in that case it always returns the multi-index: ``` In [6]: da.loc[{'band_wavenumber': {'band': 'foo'}}] Out[6]: <xarray.DataArray (band_wavenumber: 2)> array([ 0.00017, 0.00014]) Coordinates: * band_wavenumber (band_wavenumber) object ('foo', 4050.2) ('foo', 4050.3) ``` This is however inconsistent with `Dataset.sel` and `Dataset.loc` that both apply `drop_level=True` by default, due to their different implementation. Two solutions: (1) make `DataArray.loc` apply drop_level by default, or (2) use `drop_level=False` by default everywhere. 2016-03-24T14:39:38Z 2016-07-19T10:48:56Z 2016-07-19T01:15:42Z 2016-07-19T01:15:41Z 7a9e84b5708d3e8ec270a7415f9b5e54d30f13f7     0 712497c3997e72a36cafc8fb9eaafbecc76af5dc 80abe5dede7bf8a2949139f8ba083a6d74d4e3db MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/802  
73465410 MDExOlB1bGxSZXF1ZXN0NzM0NjU0MTA= 879 closed 0 Multi-index repr benbovy 4160723 Another item of #719. An example: ``` python >>> index = pd.MultiIndex.from_product((list('ab'), range(10))) >>> index.names= ('a_long_level_name', 'level_1') >>> data = xr.DataArray(range(20), [('x', index)]) >>> data <xarray.DataArray (x: 20)> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) Coordinates: * x (x) object MultiIndex - a_long_level_name object 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'b' ... - level_1 int64 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ``` To be consistent with the displayed coordinates and/or data variables, it displays the actual used level values. Using the `pandas.MultiIndex.get_level_values` method would be expensive for big indexes, so I re-implemented it in xarray so that we can truncate the computation to the first _x_ values, which is very cheap. It still needs testing. Maybe it would be nice to align the level values. 2016-06-11T10:58:13Z 2016-09-02T09:34:49Z 2016-08-31T21:40:59Z         0 4e7793a8d4fb0d5062ad8aab5578aaf3fec43577 450ac8fb16bec935a18ff3155673dff82208d3fe MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/879  
73554612 MDExOlB1bGxSZXF1ZXN0NzM1NTQ2MTI= 881 closed 0 Fix variable copy with multi-index benbovy 4160723 Fixes #769. 2016-06-13T10:38:46Z 2016-08-01T14:17:17Z 2016-06-16T21:01:07Z 2016-06-16T21:01:07Z 065ea6a3695a58ad6256f79b7712b67a8da6377c     0 9ea8832959a54fed81e7194c18cc024ba0fe9bd1 450ac8fb16bec935a18ff3155673dff82208d3fe MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/881  
77953522 MDExOlB1bGxSZXF1ZXN0Nzc5NTM1MjI= 903 closed 0 fixed multi-index copy test benbovy 4160723   2016-07-19T10:37:36Z 2016-08-01T14:16:15Z 2016-07-19T14:47:58Z 2016-07-19T14:47:58Z e8566940a97cd5a11fdbe796cb5f8b0f00864624     0 c863df76651fbc0bae1a02819c7db28eef4f4ae5 7a9e84b5708d3e8ec270a7415f9b5e54d30f13f7 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/903  
80229493 MDExOlB1bGxSZXF1ZXN0ODAyMjk0OTM= 947 closed 0 Multi-index levels as coordinates benbovy 4160723 Implements 2, 4 and 5 in #719. Demo: ``` In [1]: import numpy as np In [2]: import pandas as pd In [3]: import xarray as xr In [4]: index = pd.MultiIndex.from_product((list('ab'), range(2)), ...: names= ('level_1', 'level_2')) In [5]: da = xr.DataArray(np.random.rand(4, 4), coords={'x': index}, ...: dims=('x', 'y'), name='test') In [6]: da Out[6]: <xarray.DataArray 'test' (x: 4, y: 4)> array([[ 0.15036153, 0.68974802, 0.40082234, 0.94451318], [ 0.26732938, 0.49598123, 0.8679231 , 0.6149102 ], [ 0.3313594 , 0.93857424, 0.73023367, 0.44069622], [ 0.81304837, 0.81244159, 0.37274953, 0.86405196]]) Coordinates: * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 0 1 0 1 * y (y) int64 0 1 2 3 In [7]: da['level_1'] Out[7]: <xarray.DataArray 'level_1' (x: 4)> array(['a', 'a', 'b', 'b'], dtype=object) Coordinates: * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 0 1 0 1 In [8]: da.sel(x='a', level_2=1) Out[8]: <xarray.DataArray 'test' (y: 4)> array([ 0.26732938, 0.49598123, 0.8679231 , 0.6149102 ]) Coordinates: x object ('a', 1) * y (y) int64 0 1 2 3 In [9]: da.sel(level_2=1) Out[9]: <xarray.DataArray 'test' (level_1: 2, y: 4)> array([[ 0.26732938, 0.49598123, 0.8679231 , 0.6149102 ], [ 0.81304837, 0.81244159, 0.37274953, 0.86405196]]) Coordinates: * level_1 (level_1) object 'a' 'b' * y (y) int64 0 1 2 3 ``` Some notes about the implementation: - I slightly modified `Coordinate` so that it allows setting different values for the names of the coordinate and its dimension. There is no breaking change. - I also added a `Coordinate.get_level_coords` method to get independent, single-index coordinates objects from a MultiIndex coordinate. Remaining issues: - `Coordinate.get_level_coords` calls `pandas.MultiIndex.get_level_values` for each level and is itself called each time when indexing and for repr. This can be very costly!! It would be … 2016-08-05T11:34:49Z 2016-09-14T15:25:28Z 2016-09-14T03:34:51Z 2016-09-14T03:34:51Z 41654ef5e9da8cd15f3b68f8384f8c45c7fc16e9     0 a447767e8d611d945dc864910a427ef7e3f4db11 3ecfa66613aaefdea8beb15edbd392b9f9d815c6 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/947  
87715303 MDExOlB1bGxSZXF1ZXN0ODc3MTUzMDM= 1028 closed 0 Add `set_index`, `reset_index` and `reorder_levels` methods benbovy 4160723 Another item in #719. I added tests and updated the docs, so this is ready for review. 2016-10-03T13:22:24Z 2023-08-30T09:28:26Z 2016-12-27T17:03:00Z 2016-12-27T17:03:00Z 7ad254409f97dfe932855445602faaf7324f3d5e     0 c58cb470baf53d1c67971540e1d7c02dbafd212a 34fd2b6cb94dfb824c5371c37b6eb5e70a88260f MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/1028  
121942631 MDExOlB1bGxSZXF1ZXN0MTIxOTQyNjMx 1422 closed 0 xarray.core.variable.as_variable part of the public API benbovy 4160723 - [x] Closes #1303 - [x] Tests added / passed - [x] Passes ``git diff upstream/master | flake8 --diff`` (if we ignore messages for .rst files and "imported but not used" messages for `xarray.__init__.py`) - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API Make `xarray.core.variable.as_variable` part of the public API and accessible as a top-level function: `xarray.as_variable`. I changed the docstrings to follow the numpydoc format more closely. I also removed the `copy=False` keyword arguments as apparently it was unused. 2017-05-23T08:44:08Z 2017-06-10T18:33:34Z 2017-06-02T17:55:12Z 2017-06-02T17:55:12Z b8771934a2ef24fd3ce5a93fc2accb3f6fa12e4e     0 37343de03666f6cac03ce68a7fed60b866338ee7 6b18d77b5581be4d91cb12da95a530f92ab867b5 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/1422  
135298867 MDExOlB1bGxSZXF1ZXN0MTM1Mjk4ODY3 1507 closed 0 Detailed report for testing.assert_equal and testing.assert_identical benbovy 4160723 - ~~Closes #xxxx~~ - [x] Tests added / passed - [x] Passes ``git diff upstream/master | flake8 --diff`` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API ~~In addition to `Dataset` repr, the error message also shows the output of `Dataset.info()` for both datasets.~~ ~~This may not be the most elegant solution, but it is helpful when datasets only differ by their attributes attached to coordinates or data variables (not shown in repr). I'm open to any suggestion.~~ The report shows the differences for dimensions, data values (``Variable`` and ``DataArray``), coordinates, data variables and attributes (the latter only for ``testing.assert_identical``). There is currently not much tests for `xarray.testing` functions, but I'm willing to add more if needed. Not sure if it's worth a what's new entry (EDIT: added one). 2017-08-11T09:38:23Z 2019-10-25T15:07:39Z 2019-01-18T09:16:31Z 2019-01-18T09:16:31Z 1d0a2bc4970d9e7337fe307f4519bd936f7d7d89     0 443e59365e5440979421644e50491f7dd323ab95 f13536c965d02bb2845da31e909899a90754b375 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/1507  
153118247 MDExOlB1bGxSZXF1ZXN0MTUzMTE4MjQ3 1723 closed 0 Fix unexpected behavior of .set_index() since pandas 0.21.0 benbovy 4160723 - [x] Closes #1722 - [x] Tests added / passed - [x] Passes ``git diff upstream/master **/*py | flake8 --diff`` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API 2017-11-16T18:37:20Z 2019-10-25T15:07:18Z 2017-11-17T00:54:51Z 2017-11-17T00:54:51Z 1a012080e0910f3295d0fc26806ae18885f56751     0 eda038be4f7e4298806ed1e3f92c8fc7bf287a21 8267fdb1093bba3934a172cf71128470698279cd MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/1723  
162426756 MDExOlB1bGxSZXF1ZXN0MTYyNDI2NzU2 1820 closed 0 WIP: html repr benbovy 4160723 - [x] Closes #1627 - [ ] Tests added - [ ] Tests passed - [ ] Passes ``git diff upstream/master **/*py | flake8 --diff`` - [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API This is work in progress, although the basic functionality is there. You can see a preview here: http://nbviewer.jupyter.org/gist/benbovy/3009f342fb283bd0288125a1f7883ef2 TODO: - [ ] Add support for Multi-indexes - [ ] Probably good to have some opt-in or fail back system in case where we (or users) know that the rendering will not work - [ ] Add some tests Nice to have (keep this for later): - Clean-up CSS code and HTML template (track CSS [subgrid support](https://caniuse.com/#feat=css-subgrid) in browsers, this may simplify a lot the things here). - Dynamically adapt cell widths (given the length of the names of variables and dimensions). Currently all cells have a fixed width. This is tricky, though, as we don't use a monospace font here. - Integration with jupyterlab/notebook themes (CSS classes) and maybe allow custom CSS. - Integration of Dask arrays HTML repr (+ integration of repr for other array backends). - Maybe find a way (if possible) to include CSS only once in the notebook (currently it is included each time a xarray object is displayed in an output cell, which is not very nice). - Review the rules for collapsing the `Coordinates`, `Data variables` and `Attributes` sections (maybe expose them as global options). - Maybe also define some rules to collapse automatically the data section (DataArray and Variable) when the data repr is too long. - Maybe add rich representation for `Dataset.coords` and `Dataset.data_vars` as well? <details> <summary>Other thoughts (old)</summary> A big challenge here is to provide both robust and flexible styling (CSS): - I have tested the current styling in jupyterlab (0.30.6, light theme), notebook (5.2.2) and nbviewer: despite some slight differences it looks quite good! - However, the current CSS code is a bit… 2018-01-11T16:33:07Z 2019-10-25T15:06:58Z 2019-10-24T16:48:46Z   e360d3fc81209d7586de95bc044feb3d4a508657     0 17de08ba4cc2eb7e3326c1451c1257c911a17958 bb87a9441d22b390e069d0fde58f297a054fd98a MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/1820  
171631545 MDExOlB1bGxSZXF1ZXN0MTcxNjMxNTQ1 1946 closed 0 DOC: add main sections to toc benbovy 4160723 Not a big change, but adds a little more clarity IMO. I'm open to any suggestion for better section names and/or organization. Also I let "What's new" at the top, but not sure if "Getting started" is the right section. 2018-02-27T11:13:17Z 2018-02-27T21:16:18Z 2018-02-27T19:04:24Z 2018-02-27T19:04:24Z 4ee244078ea90084624c1b6d006f50285f8f2d21     0 0fe80d06242b7a7392c9c96598dd9c557ca667ad 243093cf814ffaae2a9ce08215632500fbebcf52 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/1946  
207277486 MDExOlB1bGxSZXF1ZXN0MjA3Mjc3NDg2 2357 closed 0 DOC: move xarray related projects to top-level TOC section benbovy 4160723 Make xarray-related projects more discoverable, as it has been suggested in xarray mailing-list. 2018-08-09T10:57:47Z 2018-08-11T13:41:24Z 2018-08-10T20:13:08Z 2018-08-10T20:13:08Z 846e28f8862b150352512f8e3d05bcb9db57a1a3     0 5bd1b794860b8c8e276d4918bfd40c6bad6e1411 04458670782c0b6fdba7e7021055155b2a6f284a MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/2357  
332552507 MDExOlB1bGxSZXF1ZXN0MzMyNTUyNTA3 3448 closed 0 Add license for the icons used in the html repr benbovy 4160723   2019-10-25T14:57:20Z 2019-10-25T15:48:52Z 2019-10-25T15:40:46Z 2019-10-25T15:40:46Z 63cc85759ac25605c8398d904d055df5dc538b94     0 372f61d954f4b90222c636757665e747502c38d6 bb0a5a2b1c71f7c2622543406ccc82ddbb290ece MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/3448  
416544318 MDExOlB1bGxSZXF1ZXN0NDE2NTQ0MzE4 4053 closed 0 Fix html repr in untrusted notebooks (plain text fallback) benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #4041 - [x] Tests added - [x] Passes `isort -rc . && black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API This is not very elegant (actually plain text repr is already included in the notebook as `text/plain` mime type but it is ignored when `text/html` mime type is present), but it seems to work. I haven't found a better workaround. I don't really know if this can be properly tested (I only added a basic test). Steps to test this fix: - To "untrust" a notebook: open an existing notebook with a simple editor, manually edit one output cell with a xarray object repr, and save the ipynb file. - Open this notebook with the Notebook app, you should see the plain text repr. 2020-05-12T07:38:22Z 2022-03-29T07:10:07Z 2020-05-20T17:06:40Z 2020-05-20T17:06:40Z cb90d5542bd6868d5548ae8efb5815c249c2c329     0 39299e9f8e71b34ba4587800658204f5b66d9576 3e5dd6ef32b9c69806af69a3a5168edcf3b2e21f MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/4053  
582224148 MDExOlB1bGxSZXF1ZXN0NTgyMjI0MTQ4 4979 closed 0 Flexible indexes refactoring notes benbovy 4160723 As a preliminary step before I take on the refactoring and implementation of flexible indexes in Xarray for the next few months, I reviewed the status of https://github.com/pydata/xarray/projects/1 and started compiling partially implemented or planned changes, thoughts, etc. into a single document that may serve as a basis for further discussion and implementation work. It's still very much work in progress (I will update it regularly in the forthcoming days) and it is very open to discussion (we can use this PR for that)! I'm not sure if Xarray's root folder is a good place for this document, though. We could move this into a new repository in `xarray-contrib` (that could also host other enhancement proposals) if that's necessary. I'm looking forward to getting started on this and to getting your thoughts/feedback! 2021-03-01T16:57:32Z 2022-03-29T07:09:31Z 2021-03-17T16:47:29Z 2021-03-17T16:47:29Z d9ba56c22f22ae48ecc53629c2d49f1ae02fcbcb     0 6efcdfe893594fcf493e17f693df1d4816b686ba 48378c4b11c5c2672ff91396d4284743165b4fbe MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/4979  
608110624 MDExOlB1bGxSZXF1ZXN0NjA4MTEwNjI0 5102 closed 0 Flexible indexes: add Index base class and xindexes properties benbovy 4160723 This PR clears up the path for flexible indexes: - it adds a new ~~`IndexAdapter`~~ `Index` base class that is meant to be inherited by all xarray-compatible indexes (built-in or 3rd-party) - `PandasIndexAdapter` now inherits from ~~`IndexAdapter`~~ `Index` - the `xarray_obj.xindexes` properties return `Index` (`PandasIndexAdapter`) instances. `xarray_obj.indexes` properties still return `pandas.Index` instances. ~~The latter is a breaking change, although I'm not sure if the `indexes` property has been made public yet.~~ This is still work in progress, there are many broken tests that are not fixed yet. (EDIT: all tests should be fixed now). There's a lot of dirty fixes to avoid circular dependencies and in the many places where we still need direct access to the `pandas.Index` objects, but I'd expect that these will be cleaned-up further in the refactoring. 2021-04-02T16:18:07Z 2022-03-29T07:10:07Z 2021-05-11T08:21:26Z 2021-05-11T08:21:26Z 6e14df62f0b01d8ca5b04bd0ed2b5ee45444265d     0 ce59dece723ca49eaae69779dee5da2aa30d0286 234b40a37e484a795e6b12916315c80d70570b27 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/5102  
645933827 MDExOlB1bGxSZXF1ZXN0NjQ1OTMzODI3 5322 closed 0 Internal refactor of label-based data selection benbovy 4160723 Xarray label-based data selection now relies on a newly added `xarray.Index.query(self, labels: Dict[Hashable, Any]) -> Tuple[Any, Optional[None, Index]]` method where: - `labels` is a always a dictionary with coordinate name(s) as key(s) and the corresponding selection label(s) as values - When calling `.sel` with some coordinate(s)/label(s) pairs, those are first grouped by index so that only the relevant pairs are passed to an `Index.query` - the returned tuple contains the positional indexers and (optionally) a new index object For a simple `pd.Index`, `labels` always corresponds to a 1-item dictionary like `{'coord_name': label_values}`, which is not very useful in this case, but this format is useful for `pd.MultiIndex` and will likely be for other, custom indexes. Moving the label->positional indexer conversion logic into `PandasIndex.query()`, I've tried to separate `pd.Index` vs `pd.MultiIndex` concerns by adding a new `PandasMultiIndex` wrapper class (it will probably be useful for other things as well) and refactor the complex logic that was implemented in `convert_label_indexer`. Hopefully it is a bit clearer now. Working towards a more flexible/generic system, we still need to figure out how to: - pass index query extra arguments like `method` and `tolerance` for `pd.Index` but in a more generic way - handle several positional indexers over multiple dimensions possibly returned by a custom "meta-index" (e.g., staggered grid index) - handle the case of positional indexers returned from querying >1 indexes along the same dimension (e.g., multiple coordinates along `x` with a simple `pd.Index`) - pandas indexes don't need information like the names or shapes of their corresponding coordinate(s) to perform label-based selection, but this kind of information will probably be needed for other indexes (we actually need it for advanced point-wise selection using tree-based indexes in [xoak](https://github.com/xarray-contrib/xoak)). This could be done in follow-up PRs.. Side note: I'… 2021-05-17T14:52:49Z 2022-03-29T07:10:07Z 2021-06-08T09:35:54Z 2021-06-08T09:35:54Z 9daf9b13648c9a02bddee3640b80fe95ea1fff61     0 fda484988c074bfd371ed490641a383c9429c43a 2b38adc1bdd1dd97934fb061d174149c73066f19 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/5322  
655109484 MDExOlB1bGxSZXF1ZXN0NjU1MTA5NDg0 5385 closed 0 Cast PandasIndex to pd.(Multi)Index benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #5384 - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` 2021-05-27T15:15:41Z 2022-03-29T07:09:31Z 2021-05-28T08:28:11Z 2021-05-28T08:28:11Z 2b38adc1bdd1dd97934fb061d174149c73066f19     0 b81931cf852432b7a7857aec4b38566d7e3e0b6e a6a1e48b57499f91db7e7c15593aadc7930020e8 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/5385  
697307477 MDExOlB1bGxSZXF1ZXN0Njk3MzA3NDc3 5636 closed 0 Refactor index vs. coordinate variable(s) benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #5553 - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` This implements option 3 (sort of) described in https://github.com/pydata/xarray/issues/5553#issue-933551030: - the goal is to avoid wrapping an `xarray.Index` into an `xarray.Variable` and keep those two concepts distinct from each other. - the `xarray.Index.from_variables` class constructor accepts a dictionary of `xarray.Variable` objects as argument and may (or should?) also return corresponding `xarray.IndexVariable` objects to ensure immutability. - for `PandasIndex`, the new returned `xarray.IndexVariable` wraps the underlying `pd.Index` via a `PandasIndexingAdapter` (this reverts some changes made in #5102). - for `PandasMultiIndex`, this PR adds `PandasMultiIndexingAdapter` so that we can wrap the pandas multi-index in separate coordinate variables objects: one for the dimension + one for each level. The level coordinates data internally hold a reference to the dimension coordinate data to avoid indexing the same underlying `pd.MultiIndex` for each of those coordinates (`PandasMultiIndexingAdapter.__getitem__` is memoized for that purpose). This is very much work in progress, I need to update (or revert) all related parts of Xarray's internals, update tests, etc. At this stage any comment on the approach described above is welcome. 2021-07-26T19:54:25Z 2023-08-30T09:21:55Z 2021-08-09T07:56:56Z 2021-08-09T07:56:56Z 4bb9d9c6df77137f05e85c7cc6508fe7a93dc0e4     0 e5f2502c07bd7ad449f9f6acfd0e6ac3ede92fb9 8b95da8e21a9d31de9f79cb0506720595f49e1dd MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/5636  
709187466 MDExOlB1bGxSZXF1ZXN0NzA5MTg3NDY2 5692 closed 0 Explicit indexes benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes many issues: - [x] closes #1366 - [x] closes #1408 - [x] closes #2489 - [x] closes #3432 - [x] closes #4542 - [x] closes #4955 - [x] closes #5202 - [x] closes #5645 - [x] closes #5691 - [x] closes #5697 - [x] closes #5700 - [x] closes #5727 - [x] closes #5953 - [x] closes #6183 - [x] closes #6313 - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - New functions/methods are listed in `api.rst` (new `Index` and `Indexes` API not public yet) Follow-up on #5636 (work in progress), supersedes #2195. This is likely to be going big, sorry in advance! It'll be safer to make a release before merging this PR. Current progress: - [x] create (default) indexes using the `Index` classes - [x] refactor default indexes created when 1st accessing `.xindexes` or `.indexes` - [x] support for non-default indexes (no public API yet) - [x] remove multi-index virtual coordinates (replace it by regular coordinates) - [x] refactor internal (text / html) formatting functions - [x] internal refactor of location-based selection (`.isel()`) - [x] internal refactor of label-based selection (`.sel()`) - [x] internal refactor of `.rename()` - Some changes in behavior (see comments below) - see #4108 - see #4107 - see #4417 - [x] internal refactor of `set_index` / `reset_index` - [x] internal refactor of `stack` / `unstack` - Some changes in behavior (see comments below) - [x] internal refactor of `Dataset.to_stacked_array` - [x] internal refactor of `swap_dims` - [x] internal refactor of `expand_dims` - [x] internal refactor of alignment - [x] internal refactor of `reindex` and `reindex_like` - [x] internal refactor of `interp` and `interp_like` - [x] internal refactor of merge - [x] internal refactor of concat - [x] internal refactor of compu… 2021-08-11T15:57:41Z 2023-08-30T09:26:37Z 2022-03-17T17:11:44Z 2022-03-17T17:11:40Z 3ead17ea9e99283e2511b65b9d864d1c7b10b3c4     0 77fdaf0e3a268d1d1fbdb6c7aef9abfd07bf0d32 29a87cc110f1a1ff7b21c308ba7277963b51ada3 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/5692  
884210772 PR_kwDOAMm_X840s_xU 6385 closed 0 Fix concat with scalar coordinate benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6384 - [x] Tests added 2022-03-20T16:46:48Z 2022-03-29T07:09:30Z 2022-03-21T04:49:23Z 2022-03-21T04:49:22Z 83f238a05a82fc85dcd7346f758ba3bea0416181     0 a91e6ee2728bb5b2768184d4e0cf1c261113f93e 073512ed3f997c0589af97eaf3d4b20796b18cf8 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6385  
884214603 PR_kwDOAMm_X840tAtL 6386 closed 0 Fix Dataset groupby returning a DataArray benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6379 - [x] Tests added 2022-03-20T17:06:13Z 2022-03-29T07:09:30Z 2022-03-20T18:55:27Z 2022-03-20T18:55:26Z fed852073eee883c0ed1e13e28e508ff0cf9d5c1     0 f4e8d48c4040f9165622baf48322771c376af39c 073512ed3f997c0589af97eaf3d4b20796b18cf8 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6386  
884218819 PR_kwDOAMm_X840tBvD 6387 closed 0 Fix concat with variable or dataarray as dim (propagate attrs) benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6380 - [x] Tests added 2022-03-20T17:27:41Z 2022-03-29T07:09:29Z 2022-03-20T18:53:46Z 2022-03-20T18:53:46Z 03b6ba1e779b0d1829ca7b2e8f5da4d9c39ece6f     0 cd2ab9e1d605d6469178b24a39a14634f97b5c22 073512ed3f997c0589af97eaf3d4b20796b18cf8 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6387  
884252480 PR_kwDOAMm_X840tJ9A 6388 closed 0 isel: convert IndexVariable to Variable if index is dropped benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6381 - [x] Tests added 2022-03-20T20:29:58Z 2022-03-29T07:10:08Z 2022-03-21T04:47:48Z 2022-03-21T04:47:47Z 067b2e86e6311e9c37e0def0c83cdb9a1a367a74     0 626f27966a52a5162f026ac042ccd18ec1592a22 fed852073eee883c0ed1e13e28e508ff0cf9d5c1 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6388  
884259571 PR_kwDOAMm_X840tLrz 6389 closed 0 Re-index: fix missing variable metadata benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6382 - [x] Tests added 2022-03-20T21:11:38Z 2022-03-29T07:09:31Z 2022-03-21T07:53:05Z 2022-03-21T07:53:04Z c604ee1fe852d51560100df6af79b4c28660f6b5     0 86b920ac931c9a78b067e08a84e3c587ec905047 fed852073eee883c0ed1e13e28e508ff0cf9d5c1 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6389  
884923775 PR_kwDOAMm_X840vt1_ 6394 closed 0 Fix DataArray groupby returning a Dataset benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6393 - [x] Tests added 2022-03-21T14:43:21Z 2022-03-29T07:09:30Z 2022-03-21T15:26:20Z 2022-03-21T15:26:20Z 321c5608a3be3cd4b6a4de3b658d1e2d164c0409     0 6123ae884795d08db6f4de736e5d52ef90648991 c604ee1fe852d51560100df6af79b4c28660f6b5 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6394  
886017261 PR_kwDOAMm_X840z4zt 6400 closed 0 Speed-up multi-index html repr + add display_values_threshold option benbovy 4160723 This adds `PandasMultiIndexingAdapter._repr_html_` that can greatly speed-up the html repr of Xarray objects with multi-indexes. This optimized `_repr_html_` implementation is now used for formatting the array detailed view of all multi-index coordinates in the html repr, instead of converting the full index and each levels to numpy arrays before formatting them. ```python import xarray as xr ds = xr.tutorial.load_dataset("air_temperature") da = ds["air"].stack(z=[...]) da.shape # (3869000,) %timeit -n 1 -r 1 da._repr_html_() # 9.96 ms ! ``` <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #5529 - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` 2022-03-22T12:57:37Z 2022-03-29T07:10:22Z 2022-03-29T07:05:32Z 2022-03-29T07:05:32Z d8fc34660f409d4c6a7ce9fe126d126e4f76c7fd     0 b8f732c61a86be5d1e8efbf3a906f9a5f69c31fd 728b648d5c7c3e22fe3704ba163012840408bf66 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6400  
891741295 PR_kwDOAMm_X841JuRv 6418 closed 0 Fix concat with scalar coordinate (dtype) benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6416 - [x] Tests added 2022-03-28T12:22:50Z 2022-03-29T07:06:46Z 2022-03-28T16:05:01Z 2022-03-28T16:05:01Z 009b15461bf1ad4567e57742e44db4efa4e44cc7     0 5711dc21ff0711559214bde147cf3a20f6880f8e 728b648d5c7c3e22fe3704ba163012840408bf66 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6418  
900624195 PR_kwDOAMm_X841rm9D 6443 closed 0 Fix concat with scalar coordinate (wrong index type) benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6434 - [x] Tests added 2022-04-05T19:16:30Z 2022-12-08T09:36:50Z 2022-04-06T01:19:48Z 2022-04-06T01:19:47Z facafac359c39c3e940391a3829869b4a3df5d70     0 185b79199d25ff83dfdea944fa200342afc5e144 2eef20b74c69792bad11e5bfda2958dc8365513c MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6443  
998719144 PR_kwDOAMm_X847hz6o 6800 closed 0 (scipy 2022 branch) Add an "options" argument to Index.from_variables() benbovy 4160723 It allows passing options to the constructor of a custom `Index` subclass, in case there's any relevant build options to expose to users. This could for example be the distance metric chosen for an index based on `sklearn.neighbors.BallTree`, or the CRS definition for a geospatial index. The `**options` arguments of `Dataset.set_xindex()` are passed through. An alternative way would be to pass options via coordinate metadata, like the `spatial_ref` coordinate in rioxarray. Perhaps both alternatives may co-exist? This PR also adds type annotations to `set_xindex()`. 2022-07-17T20:01:00Z 2022-12-08T09:38:50Z 2022-09-02T13:54:46Z   f4b214279bd34fe6c5bdebfb7f8f76e63e53d40c     0 46e19d493a18fc81f44129ff65441925080297b3 a5f068e0f6cb4d5ba8de5e10844ae2bfc4a56655 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6800  
1013692836 PR_kwDOAMm_X848a7mk 6857 closed 0 Fix aligned index variable metadata side effect benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6852 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` 2022-08-01T10:57:16Z 2022-12-08T09:36:49Z 2022-08-31T07:16:14Z 2022-08-31T07:16:14Z 4880012ddee9e43e3e18e95551876e9c182feafb     0 c39cdaa63f0d55b34cca1d04a24b1621801cc8e6 434f9e8929942afc2380eab52a07e77d30cc7885 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6857  
1042357878 PR_kwDOAMm_X84-IR52 6971 closed 0 Add set_xindex and drop_indexes methods benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6849 - [x] Supersedes #6800 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` This PR adds Dataset and DataArray `.set_xindex` and `.drop_indexes` methods (the latter is also discussed in #4366). I've cherry picked the relevant commits in the `scipy22` branch and added a few more commits. This PR also allows passing build options to any `Index`. Some comments and open questions: - Should we make the `index_cls` argument of `set_xindex` optional? - I.e., `set_index(coord_names, index_cls=None, **options)` where a pandas index is created by default (or a pandas multi-index if several coordinate names are given), provided that the coordinate(s) are valid 1-d candidates. - This would be redundant with the existing `set_index` method, but this would be convenient if we later depreciate it. - Should we depreciate `set_index` and `reset_index`? I think we should, but probably not at this point yet. - There's a special case for multi-indexes where `set_xindex(["foo", "bar"], PandasMultiIndex)` adds a dimension coordinate in addition to the "foo" and "bar" level coordinates so that it is consistent with the rest of Xarray. I find it a bit annoying, though. Probably another motivation for depreciating this dimension coordinate. - In this PR I also imported the `Index` base class in Xarray's root namespace. - It is needed for custom indexes and it's just a little more convenient than importing it from `xarray.core.indexes`. - Should we do the same for `PandasIndex` and `PandasMultiIndex` subclasses? Maybe if one wants to create a custom index inheriting from it. `PandasMultiIndex` factory methods could be also useful if we depreciate passing `pd.MultiIndex` objects as DataArray / Dataset coordinates. 2022-08-31T12:54:35Z 2022-12-08T09:38:13Z 2022-09-28T07:25:15Z 2022-09-28T07:25:15Z e678a1d7884a3c24dba22d41b2eef5d7fe5258e7     0 b598447ba2e9c98bb1186719dc9bc6be95e13042 a042ae69c0444912f94bb4f29c93fa05046893ed MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6971  
1043726871 PR_kwDOAMm_X84-NgIX 6975 closed 0 Add documentation on custom indexes benbovy 4160723 This PR documents the API of the `Index` base class and adds a guide for creating custom indexes (reworked from https://hackmd.io/Zxw_zCa7Rbynx_iJu6Y3LA). Hopefully it will help anyone experimenting with this feature. @pydata/xarray your feedback would be very much appreciated! I've been into this for quite some time, so there may be things that seem obvious to me but that you can still find very confusing or non-intuitive. It would then deserve some extra or better explanation. More specifically, I'm open to any suggestion on how to better illustrate this with clear and succinct examples. There are other parts of the documentation that still need to be updated regarding the indexes refactor (e.g., "dimension" coordinates, `xindexes` property, set/drop indexes, etc.). But I suggest to do that in separate PRs and focus here on creating custom indexes. 2022-09-01T13:20:00Z 2023-08-30T09:10:34Z 2023-07-17T23:23:22Z 2023-07-17T23:23:22Z 7234603781768728b3fd544cdcaca991466d4a44     0 07814bc579a0687ddc4deef0a1825c16ba02333e 647376d1d2db3210c142d8204c1c3a7431b85b9a MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6975  
1046566934 PR_kwDOAMm_X84-YVgW 6992 closed 0 Review (re)set_index benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes - [x] fixes #6946 - [x] fixes #6989 - [x] fixes #6959 - [x] fixes #6969 - [x] fixes #7036 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` Restore behavior prior to the explicit indexes refactor (i.e., refactored but without breaking changes). TODO: - [x] review `set_index` - [x] review `reset_index` For `reset_index`, the only behavior that is not restored here is the coordinate renamed with a `_` suffix when dropping a single index. This was originally to prevent any coordinate with no index matching a dimension name, which is now irrelevant. That is a quite dirty workaround and I don't know who is relying on it (no complaints yet), but I'm open to restore it if needed (esp. considering that we may later deprecate `reset_index` completely in favor of `drop_indexes` #6971). 2022-09-05T15:07:43Z 2023-08-30T09:05:10Z 2022-09-27T10:35:38Z 2022-09-27T10:35:38Z a042ae69c0444912f94bb4f29c93fa05046893ed     0 ca01949cb889ee38aae33560b02de1f7625fd921 45c0a114e2b7b27b83c9618bc05b36afac82183c MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6992  
1047776643 PR_kwDOAMm_X84-c82D 6999 closed 0 Raise UserWarning when rename creates a new dimension coord benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6607 - [x] Closes #4107 - [x] Closes #6229 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` Current implemented "fix": raise a `UserWarning` and suggest using `swap_dims` (*) Alternatively, we could: - revert the breaking change (i.e., create the index again) and raise a `DeprecationWarning` instead - raise an error instead of a warning I don't have strong opinions on this, I'm happy to implement another alternative. The downside of reverting the breaking change now is that unfortunately it will introduce a breaking change in the next release., while workarounds are pretty straightforward. (*) from https://github.com/pydata/xarray/issues/6607#issuecomment-1126587818, doing `ds.set_coords(['lon']).rename(x='lon').set_index(lon='lon')` is working too. With #6971, `.set_xindex('lon')` could work as well. 2022-09-06T16:16:17Z 2022-12-08T09:38:13Z 2022-09-27T09:33:40Z 2022-09-27T09:33:40Z 45c0a114e2b7b27b83c9618bc05b36afac82183c     0 486f9b876c212cc3f2df7dd1438d1832ce5df03b 1f4be33365573da19a684dd7f2fc97ace5d28710 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/6999  
1048613040 PR_kwDOAMm_X84-gJCw 7003 closed 0 Misc. fixes for Indexes with pd.Index objects benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6987 - [x] Tests added 2022-09-07T11:05:02Z 2022-12-08T09:36:51Z 2022-09-23T07:30:38Z 2022-09-23T07:30:38Z 9d1499e22e2748eeaf088e6a2abc5c34053bf37c     0 54271bd4cda67c5f5b8703095798c122b7e96b0c 5bec4662a7dd4330eca6412c477ca3f238323ed2 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/7003  
1048884296 PR_kwDOAMm_X84-hLRI 7004 open 0 Rework PandasMultiIndex.sel internals benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6838 - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` This PR hopefully improves how are handled the labels that are provided for multi-index level coordinates in `.sel()`. More specifically, slices are handled in a cleaner way and it is now allowed to provide array-like labels. `PandasMultiIndex.sel()` relies on the underlying `pandas.MultiIndex` methods like this: - use ``get_loc`` when all levels are provided with each a scalar label (no slice, no array) - always drops the index and returns scalar coordinates for each multi-index level - use ``get_loc_level`` when only a subset of levels are provided with scalar labels only - may collapse one or more levels of the multi-index (dropped levels result in scalar coordinates) - if only one level remains: renames the dimension and the corresponding dimension coordinate - use ``get_locs`` for all other cases. - always keeps the multi-index and its coordinates (even if only one item or one level is selected) This yields a predictable behavior: as soon as one of the provided labels is a slice or array-like, the multi-index and all its level coordinates are kept in the result. Some cases illustrated below (I compare this PR with an older release due to the errors reported in #6838): ```python import xarray as xr import pandas as pd midx = pd.MultiIndex.from_product([list("abc"), range(4)], names=("one", "two")) ds = xr.Dataset(coords={"x": midx}) # <xarray.Dataset> # Dimensions: (x: 12) # Coordinates: # * x (x) object MultiIndex # * one (x) object 'a' 'a' 'a' 'a' 'b' 'b' 'b' 'b' 'c' 'c' 'c' 'c' # * two (x) int64 0 1 2 3 0 1 2 3 0 1 2 3 # Data variables: # *empty* ``` ```python ds.sel(one="a", two=0) # this PR # # <xarray.Dataset> # Dimensions: () # Coordinates: # x object ('a', 0) # one <U1 'a' # t… 2022-09-07T14:57:29Z 2022-09-22T20:38:41Z     0a4b1aafbe66a857de627cf180eba8713ca9a85d     0 00baaddefae0a189874ca64d9f4be4d2d83cc744 5bec4662a7dd4330eca6412c477ca3f238323ed2 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/7004  
1070271669 PR_kwDOAMm_X84_ywy1 7101 closed 0 Fix Dataset.assign_coords overwriting multi-index benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #7097 - [x] Tests added @dcherian the `DeprecationWarning` was ignored by default for `.assign_coords()` because of https://github.com/pydata/xarray/pull/6798#discussion_r924653224. I changed it to `FutureWarning` so that it is shown for both `.assign()` and `.assign_coords()`. 2022-09-28T16:21:48Z 2022-12-08T09:36:50Z 2022-09-28T18:02:16Z 2022-09-28T18:02:16Z 513ee34f16cc8f9250a72952e33bf9b4c95d33d1     0 ee9b027c0e41de15fc4960dde9e4c551d7d2a9df e678a1d7884a3c24dba22d41b2eef5d7fe5258e7 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/7101  
1071450326 PR_kwDOAMm_X84_3QjW 7105 closed 0 Fix to_index(): return multiindex level as single index benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6836 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` 2022-09-29T14:44:22Z 2022-12-08T09:36:51Z 2022-10-12T14:12:48Z 2022-10-12T14:12:48Z f93b467db5e35ca94fefa518c32ee9bf93232475     0 e9a75b746d68fba12216a1f455252cd9fa4c3ebf 50ea159bfd0872635ebf4281e741f3c87f0bef6b MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/7105  
1090510499 PR_kwDOAMm_X85A_96j 7182 open 0 add MultiPandasIndex helper class benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [ ] Closes #xxxx - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` This PR adds a `xarray.indexes.MultiPandasIndex` helper class for building custom, meta-indexes that encapsulate multiple `PandasIndex` instances. Unlike `PandasMultiIndex`, the meta-index classes inheriting from this helper class may encapsulate loosely coupled (pandas) indexes, with coordinates of arbitrary dimensions (each coordinate must be 1-dimensional but an Xarray index may be created from coordinates with differing dimensions). Early prototype in this [notebook](https://notebooksharing.space/view/3d599addf8bd6b06a6acc241453da95e28c61dea4281ecd194fbe8464c9b296f#displayOptions=) TODO / TO FIX: - How to allow custom `__init__` options in subclasses be passed to all the `type(self)(new_indexes)` calls inside the `MultiPandasIndex` "base" class? This could be done via `**kwargs` passed through... However, mypy will certainly complain (Liskov Substitution Principle). - Is `MultiPandasIndex` a good name for this helper class? 2022-10-18T09:42:58Z 2023-08-23T16:30:28Z     6633615eca663c879bba4e9a144050c4aaa7555f     1 e4d753c3bf3ffdc30864510885c68fdb2e8349a2 ab726c536464fbf4d8878041f950d2b0ae09b862 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/7182  
1098978950 PR_kwDOAMm_X85BgRaG 7214 closed 0 Pass indexes directly to the DataArray and Dataset constructors benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6392 - [x] Closes #6633 ? - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` From https://github.com/pydata/xarray/issues/6392#issuecomment-1290454937: I'm thinking of only accepting one or more instances of [Indexes](https://github.com/pydata/xarray/blob/e678a1d7884a3c24dba22d41b2eef5d7fe5258e7/xarray/core/indexes.py#L1030) as indexes argument in the Dataset and DataArray constructors. The only exception is when `fastpath=True` a mapping can be given directly. Also, when an empty collection of indexes is passed this skips the creation of default pandas indexes for dimension coordinates. - It is much easier to handle: just check that keys returned by `Indexes.variables` do no conflict with the coordinate names in the `coords` argument - It is slightly safer: it requires the user to explicitly create an `Indexes` object, thus with less chance to accidentally provide coordinate variables and index objects that do not relate to each other (we could probably add some safe guards in the `Indexes` class itself) - It is more convenient: an Xarray `Index` may provide a factory method that returns an instance of `Indexes` that we just need to pass as indexes, and we could also do something like `ds = xr.Dataset(indexes=other_ds.xindexes)` 2022-10-25T14:16:44Z 2023-08-30T09:11:56Z 2023-07-18T11:52:11Z   b3a3fd5a537d8000baf8ece3093a60ea14406ecc     1 ddd505e6af5270e143ee814485d5b4665456d77f 6e77f5e8942206b3e0ab08c3621ade1499d8235b MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/7214  
1142893563 PR_kwDOAMm_X85EHyv7 7347 closed 0 Fix assign_coords resetting all dimension coords to default index benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #7346 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` 2022-12-02T08:19:01Z 2022-12-08T09:36:49Z 2022-12-02T16:32:40Z 2022-12-02T16:32:40Z 8938d390a969a94275a4d943033a85935acbce2b     0 23d9889d11b181c94db2b5e8fe33073a1328be1f 92e7cb5b21a6dee7f7333c66e41233205c543bc1 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/7347  
1154470307 PR_kwDOAMm_X85Ez9Gj 7368 closed 0 Expose "Coordinates" as part of Xarray's public API benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #7214 - [x] Closes #6392 - [x] xref #6633 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` This is a rework of #7214. It follows the suggestions made in https://github.com/pydata/xarray/pull/7214#issuecomment-1295283938, https://github.com/pydata/xarray/pull/7214#issuecomment-1297046405 and https://github.com/pydata/xarray/pull/7214#issuecomment-1293774799: - No `indexes` argument is added to `Dataset.__init__`, and the `indexes` argument of `DataArray.__init__` is kept private (i.e., valid only if fastpath=True) - When a `Coordinates` object is passed to a new Dataset or DataArray via the `coords` argument, both coordinate variables and indexes are copied/extracted and added to the new object - This PR also adds ~~an `IndexedCoordinates` subclass~~ `Coordinates` public constructors used to create Xarray coordinates and indexes from non-Xarray objects. For example, the `Coordinates.from_pandas_multiindex()` class method creates a new set of index and coordinates from an existing `pd.MultiIndex`. EDIT: `IndexCoordinates` has been merged with `Coordinates` EDIT2: it ended up as a pretty big refactor with the promotion of `Coordinates` has a 2nd-class Xarray container that supports alignment like Dataset and DataArray. It is still quite advanced API, useful for passing coordinate variables and indexes around. Internally, `Coordinates` objects are still "virtual" containers (i.e., proxies for coordinate variables and indexes stored in their corresponding DataArray or Dataset objects). For now, a "stand-alone" `Coordinates` object created from scratch wraps a Dataset with no data variables. Some examples of usage: ```python import pandas as pd import xarray as xr midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("one", "two")) coords = xr.Coordinates.from_pandas_multiinde… 2022-12-08T16:59:29Z 2023-08-30T09:11:57Z 2023-07-21T20:40:03Z 2023-07-21T20:40:03Z 4441f9915fa978ad5b276096ab67ba49602a09d2     0 4ef5f17db6d2aefd91fb02485ab7a815fe460b47 6b1ff6d13bf360df786500dfa7d62556d23e6df9 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/7368  
1166747288 PR_kwDOAMm_X85FiyaY 7382 closed 0 Some alignment optimizations benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Benchmark added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` May fix some performance regressions, e.g., see https://github.com/pydata/xarray/issues/7376#issuecomment-1352989233. @ravwojdyla with this PR `ds.assign(foo=~ds["d3"])` in your example should be much faster (on par with version 2022.3.0). 2022-12-15T12:54:56Z 2023-08-30T09:05:24Z 2023-01-05T21:25:55Z 2023-01-05T21:25:55Z d6d24507793af9bcaed79d7f8d3ac910e176f1ce     0 95be2d07403a8e061df19f682db42ad273c62745 b93dae4079daf0fc4c042fd0d699c16624430cdc MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/7382  
1465015830 PR_kwDOAMm_X85XUl4W 8051 open 0 Allow setting (or skipping) new indexes in open_dataset benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6633 - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` This PR introduces a new boolean parameter `set_indexes=True` to `xr.open_dataset()`, which may be used to skip the creation of default (pandas) indexes when opening a dataset. Currently works with the Zarr backend: ```python import numpy as np import xarray as xr # example dataset (real dataset may be much larger) arr = np.random.random(size=1_000_000) xr.Dataset({"x": arr}).to_zarr("dataset.zarr") xr.open_dataset("dataset.zarr", set_indexes=False, engine="zarr") # <xarray.Dataset> # Dimensions: (x: 1000000) # Coordinates: # x (x) float64 ... # Data variables: # *empty* xr.open_zarr("dataset.zarr", set_indexes=False) # <xarray.Dataset> # Dimensions: (x: 1000000) # Coordinates: # x (x) float64 ... # Data variables: # *empty* ``` I'll add it to the other Xarray backends as well, but I'd like to get your thoughts about the API first. 1. Do we want to add yet another keyword parameter to `xr.open_dataset()`? There are already many... 2. Do we want to add this parameter to the `BackendEntrypoint.open_dataset()` API? - I'm afraid we must do it if we want this parameter in `xr.open_dataset()` - this would also make it possible skipping the creation of custom indexes (if any) in custom IO backends - con: if we require `set_indexes` in the signature in addition to the `drop_variables` parameter, this is a breaking change for all existing 3rd-party backends. Or should we group `set_indexes` with the other xarray decoder kwargs? This would feel a bit odd to me as setting indexes is different from decoding data. 3. Or should we leave this up to the backends? - pros: no breaking change, more flexible (3rd party backends may want to offer more control like choosing between cus… 2023-08-07T10:53:46Z 2024-02-03T19:12:48Z     0b37c66130416f202c3b8ee2302ee9ea517bdadd     0 eae983bb6b7ee916e5c8956b6af42c2207ad48d1 c9ba2be2690564594a89eb93fb5d5c4ae7a9253c MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8051  
1482940936 PR_kwDOAMm_X85YY-II 8094 closed 0 Refactor update coordinates to better handle multi-coordinate indexes benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #7563 - [x] Closes #8039 - [x] Closes #8056 - [x] Closes #7885 - [x] Closes #7921 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` This refactor should better handle multi-coordinate indexes when updating (or assigning) new coordinates. It also fixes, better isolates and better warns a bunch of deprecated pandas multi-index special cases (i.e., directly passing `pd.MultiIndex` objects or updating a multi-index dimension coordinate). I very much look forward to seeing support for those cases dropped :). 2023-08-21T13:57:38Z 2023-08-30T09:06:28Z 2023-08-29T14:23:29Z 2023-08-29T14:23:29Z 1fedfd86604f87538d1953b01d6990c2c89fcbf3     0 748ee246821f5c308fc52e29c5d6b1d5f628cacf 42d42bab5811702e56c638b9489665d3c505a0c1 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8094  
1486052929 PR_kwDOAMm_X85Yk15B 8102 closed 0 Add `Coordinates.assign()` method benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` This is consistent with the Dataset and DataArray `assign` methods (now that `Coordinates` is also exposed as public API). This allows writing: ```python midx = pd.MultiIndex.from_arrays([["a", "a", "b", "b"], [0, 1, 0, 1]]) midx_coords = xr.Coordinates.from_pandas_multiindex(midx, "x") ds = xr.Dataset(coords=midx_coords.assign(y=[1, 2])) ``` which is quite common (at least in the tests) and a bit nicer than ```python ds = xr.Dataset(coords=midx_coords.merge({"y": [1, 2]}).coords) ``` 2023-08-23T09:15:51Z 2023-09-01T13:28:16Z 2023-09-01T13:28:16Z 2023-09-01T13:28:16Z 71177d481eb0c3547cb850a4b3e866af6d4fded7     0 6f1dfed9dac9bcecb6b9b8bd1abd20d5cb388f68 1043a9e13574e859ec08d19425341b2e359d2802 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8102  
1486710446 PR_kwDOAMm_X85YnWau 8104 closed 0 Fix merge with compat=minimal (coord names) benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #7405 - [x] Closes #7588 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` 2023-08-23T16:20:48Z 2023-08-30T09:11:18Z 2023-08-30T07:57:35Z 2023-08-30T07:57:35Z b136fcb679e9e70fd44b60688d96e75d4e3f8dcb     0 613eb1337d38f6b92434feaffb12b4f99e597cf0 1fedfd86604f87538d1953b01d6990c2c89fcbf3 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8104  
1487073982 PR_kwDOAMm_X85YovK- 8107 closed 0 Better default behavior of the Coordinates constructor benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` After working more on `Coordinates` I realize that the default behavior of its constructor could be more consistent with other Xarray objects. This PR changes this default behavior such that: - Pandas indexes are created for dimension coordinates if `indexes=None` (default). To create dimension coordinates with no index, just pass `indexes={}`. - If another `Coordinates` object is passed as input, its indexes are also added to the new created object. Since we don't support alignment / merge here, the following call raises an error: `xr.Coordinates(coords=xr.Coordinates(...), indexes={...})`. This PR introduces a breaking change since `Coordinates` are now exposed in v2023.8.0, which has just been released. It is a bit unfortunate but I think it may be OK for a fresh feature, especially if the next release will be soon after this one. 2023-08-23T21:42:51Z 2024-02-04T18:32:42Z 2023-08-31T07:35:47Z 2023-08-31T07:35:47Z 0f9f790c7e887bbfd13f4026fd1d37e4cd599ff1     0 bce000cff6be4cf9d42454da4c370685e9dad051 42d42bab5811702e56c638b9489665d3c505a0c1 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8107  
1487590692 PR_kwDOAMm_X85YqtUk 8109 closed 0 Better error message when trying to set an index from a scalar coordinate benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #4091 - [x] Tests added The message suggests using `.expand_dims()`. 2023-08-24T08:18:13Z 2023-08-30T09:27:27Z 2023-08-30T07:13:15Z 2023-08-30T07:13:15Z e5a38f6837ae9b9aa28a4bd063620a1cd802e093     0 a1d70aa0aca1fb33b611a23697e5af04b34b2c7c 42d42bab5811702e56c638b9489665d3c505a0c1 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8109  
1488345780 PR_kwDOAMm_X85Ytlq0 8111 open 0 Alignment: allow flexible index coordinate order benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #7002 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` This PR relaxes some of the rules used in alignment for finding the indexes to compare or join together. Those indexes must still be of the same type and must relate to the same set of coordinates (and dimensions), but the order of coordinates is now ignored. It is up to the index to implement the equal / join logic if it needs to care about that order. Regarding `pandas.MultiIndex`, it seems that the level names are ignored when comparing indexes: ```python midx = pd.MultiIndex.from_product([["a", "b"], [0, 1]], names=("one", "two"))) midx2 = pd.MultiIndex.from_product([["a", "b"], [0, 1]], names=("two", "one")) midx.equals(midx2) # True ``` However, in Xarray the names of the multi-index levels (and their order) matter since each level has its own xarray coordinate. In this PR, `PandasMultiIndex.equals()` and `PandasMultiIndex.join()` thus check that the level names match. 2023-08-24T16:18:49Z 2023-09-28T15:58:38Z     79103728908c37d32bc902cd7bcc583363ce9bd9     0 0645c4b813908104c27ace51fce16ac053c6e1e8 42d42bab5811702e56c638b9489665d3c505a0c1 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8111  
1492188700 PR_kwDOAMm_X85Y8P4c 8118 open 0 Add Coordinates `set_xindex()` and `drop_indexes()` methods benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - Complements #8102 - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` I don't think that we need to copy most API from Dataset / DataArray to `Coordinates`, but I find it convenient to have some relevant methods there too. For example, building Coordinates from scratch (with custom indexes) before passing the whole coords + indexes bundle around: ```python import dask.array as da import numpy as np import xarray as xr coords = ( xr.Coordinates( coords={"x": da.arange(100_000_000), "y": np.arange(100)}, indexes={}, ) .set_xindex("x", DaskIndex) .set_xindex("y", xr.indexes.PandasIndex) ) ds = xr.Dataset(coords=coords) # <xarray.Dataset> # Dimensions: (x: 100000000, y: 100) # Coordinates: # * x (x) int64 dask.array<chunksize=(16777216,), meta=np.ndarray> # * y (y) int64 0 1 2 3 4 5 6 7 8 9 10 ... 90 91 92 93 94 95 96 97 98 99 # Data variables: # *empty* # Indexes: # x DaskIndex ``` 2023-08-28T14:28:24Z 2023-09-19T01:53:18Z     664b100ba033d892b0894c82c49c18fc71b3f7be     0 13ebc667add99d53fe5619de8206ce745e453829 828ea08aa74d390519f43919a0e8851e29091d00 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8118  
1496182200 PR_kwDOAMm_X85ZLe24 8124 open 0 More flexible index variables benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [ ] Closes #xxxx - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` The goal of this PR is to provide a more general solution to indexed coordinate variables, i.e., support arbitrary dimensions and/or duck arrays for those variables while at the same time prevent them from being updated in a way that would invalidate their index. This would solve problems like the one mentioned here: https://github.com/pydata/xarray/issues/1650#issuecomment-1697237429 @shoyer I've tried to implement what you have suggested in https://github.com/pydata/xarray/pull/4979#discussion_r589798510. It would be nice indeed if eventually we could get rid of `IndexVariable`. It won't be easy to deprecate it until we finish the index refactor (i.e., all methods listed in #6293), though. Also, I didn't find an easy way to refactor that class as it has been designed too closely around a 1-d variable backed by a `pandas.Index`. So the approach implemented in this PR is to keep using `IndexVariable` for PandasIndex until we can deprecate / remove it later, and for the other cases use `Variable` with data wrapped in a custom `IndexedCoordinateArray` object. The latter solution (wrapper) doesn't always work nicely, though. For example, several methods of `Variable` expect that `self._data` directly returns a duck array (e.g., a dask array or a chunked duck array). A wrapped duck array will result in unexpected behavior there. We could probably add some checks / indirection or extend the wrapper API... But I wonder if there wouldn't be a more elegant approach? More generally, which operations should we allow / forbid / skip for an indexed coordinate variable? - Set array items in-place? Do not allow. - Replace data? Do not allow. - (Re)Chunk? - Load lazy data? - ... ? (Note: we could add `Index.chunk()` and `Index.load()` metho… 2023-08-30T21:45:12Z 2023-08-31T16:02:20Z     8b84dc392e5443f9ada245cb6a6f31d8f19327df     1 09f3ed0acd119fcefa07652bbc40dff96db2f66c 0f9f790c7e887bbfd13f4026fd1d37e4cd599ff1 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8124  
1497266410 PR_kwDOAMm_X85ZPnjq 8128 open 0 Add Index.load() and Index.chunk() methods benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [ ] Closes #xxxx - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` As mentioned in #8124, it gives more control to custom Xarray indexes on what best to do when the Dataset / DataArray `load()` and `chunk()` counterpart methods are called. `PandasIndex.load()` and `PandasIndex.chunk()` always return self (no action required). For a DaskIndex, we might want to return a PandasIndex (or another non-lazy index) from `load()` and rebuild a DaskIndex object from `chunk()` (rechunk). 2023-08-31T14:16:27Z 2023-08-31T15:49:06Z     a1842563887f8375fb3a03824189a75e6f080c96     1 4506cb600caba75f163c088171f590b67f59264b 1043a9e13574e859ec08d19425341b2e359d2802 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8128  
1500283634 PR_kwDOAMm_X85ZbILy 8140 open 0 Deprecate passing pd.MultiIndex implicitly benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - Follow-up #8094 - [x] Closes #6481 - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` This PR should normally raise a warning *each time* when indexed coordinates are created implicitly from a `pd.MultiIndex` object. I updated the tests to create coordinates explicitly using `Coordinates.from_pandas_multiindex()`. I also refactored some parts where a `pd.MultiIndex` could still be passed and promoted internally, with the exception of: - `swap_dims()`: it should raise a warning! Right now the warning message is a bit confusing for this case, but instead of adding a special case we should probably deprecate the whole method? As it is suggested as a TODO comment... This method was to circumvent the limitations of dimension coordinates, which isn't needed anymore (`rename_dims` and/or `set_xindex` is equivalent and less confusing). - `xr.DataArray(pandas_obj_with_multiindex, dims=...)`: I guess it should raise a warning too? - `da.stack(z=...).groupby("z")`: it shoudn't raise a warning, but this requires a (heavy?) refactoring of groupby. During building the "grouper" objects, `grouper.group1d` or `grouper.unique_coord` may still be built by extracting only the multi-index dimension coordinate. I'd greatly appreciate if anyone familiar with the groupby implementation could help me with this! @dcherian ? 2023-09-03T14:01:18Z 2023-11-15T20:15:00Z     ddb96c1f3a6fc2bcddea2432af311c5cbfcfc492     0 ef7dae0893f6701a203f8ec3c2e655bff7944b91 e2b6f3468ef829b8a83637965d34a164bf3bca78 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8140  
1500744603 PR_kwDOAMm_X85Zc4ub 8141 closed 0 Fix doctests: pandas 2.1 MultiIndex repr with nan benbovy 4160723   2023-09-04T07:08:55Z 2023-09-05T08:35:37Z 2023-09-05T08:35:36Z 2023-09-05T08:35:36Z f13da94db8ab4b564938a5e67435ac709698f1c9     0 445e6c923d112d584c714df3bf3ba2fbab004d3e e9c1962f31a7b5fd7a98ee4c2adf2ac147aabbcf MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8141  
1500931269 PR_kwDOAMm_X85ZdmTF 8142 closed 0 Dirty workaround for mypy 1.5 error benbovy 4160723 I wanted to fix the following error with mypy 1.5: ``` xarray/core/dataset.py:505: error: Definition of "__eq__" in base class "DatasetOpsMixin" is incompatible with definition in base class "Mapping" [misc] ``` Which looks similar to https://github.com/python/mypy/issues/9319. It is weird that here it worked with mypy versions < 1.5, though. I don't know if there is a better fix, but I thought that redefining `__eq__` in `Dataset` would be a bit less dirty workaround than adding `type: ignore` in the class declaration. 2023-09-04T09:21:18Z 2023-09-07T16:04:55Z 2023-09-07T08:21:12Z 2023-09-07T08:21:12Z e2b6f3468ef829b8a83637965d34a164bf3bca78     0 46bd88fbea07d52f06eab5d11ca3f72b547af263 f13da94db8ab4b564938a5e67435ac709698f1c9 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8142  
1501219392 PR_kwDOAMm_X85ZespA 8143 open 0 Deprecate the multi-index dimension coordinate benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` This PR adds a `future_no_mindex_dim_coord=False` option that, if set to True, enables the future behavior of `PandasMultiIndex` (i.e., no added dimension coordinate with tuple values): ```python import xarray as xr ds = xr.Dataset(coords={"x": ["a", "b"], "y": [1, 2]}) ds.stack(z=["x", "y"]) # <xarray.Dataset> # Dimensions: (z: 4) # Coordinates: # * z (z) object MultiIndex # * x (z) <U1 'a' 'a' 'b' 'b' # * y (z) int64 1 2 1 2 # Data variables: # *empty* with xr.set_options(future_no_mindex_dim_coord=True): ds.stack(z=["x", "y"]) # <xarray.Dataset> # Dimensions: (z: 4) # Coordinates: # * x (z) <U1 'a' 'a' 'b' 'b' # * y (z) int64 1 2 1 2 # Dimensions without coordinates: z # Data variables: # *empty* ``` There are a few other things that we'll need to adapt or deprecate: - Dropping multi-index dimension coordinate *de-facto* allows having several multi-indexes along the same dimension. Normally `stack` should already take this into account, but there may be other places where this is not yet supported or where we should raise an explicit error. - Deprecate `Dataset.reorder_levels`: API is not compatible with the absence of dimension coordinate and several multi-indexes along the same dimension. I think it is OK to deprecate such edge case, which alternatively could be done by extracting the pandas index, updating it and then re-assign it to a the dataset with `assign_coords(xr.Coordinates.from_pandas_multiindex(...))` - The text-based repr: in the example above, `Dimensions without coordinate: z` doesn't make much sense - ... ? I started updating the tests, although this will be much easier once #8140 is merged. This is something that we could also easily split into multiple PRs. It is probably OK if some features are (t… 2023-09-04T12:32:36Z 2023-09-04T12:32:48Z     d0709f6d90e3f71d78e562c15b1662a423d8e3e9     0 87d5bf72e766b101db32dc65e6a79957368812ee 71177d481eb0c3547cb850a4b3e866af6d4fded7 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8143  
1509661685 PR_kwDOAMm_X85Z-5v1 8170 open 0 Dataset.from_dataframe: optionally keep multi-index unexpanded benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #8166 - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` I added both the `unstack` and `dim` arguments but we can change that. - [ ] update `DataArray.from_series()` 2023-09-11T06:20:17Z 2023-09-11T06:20:17Z     d3c6c4785be4a17946c88907176833e8bdabcd67     1 1afef691db8879526212a504bb42dbfc6f81878a 2951ce0215f14a8a79ecd0b5fc73a02a34b9b86b MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8170  
1696970326 PR_kwDOAMm_X85lJbZW 8672 closed 0 Fix multiindex level serialization after reset_index benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #8628 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` 2024-01-26T10:40:42Z 2024-02-23T01:22:17Z 2024-01-31T17:42:29Z 2024-01-31T17:42:29Z f9f4c730254073f0f5a8fce65f4bbaa0eefec5fd     0 72f319f5c4259c19aabf223faa1d9a51ba035887 ca4f12133e9643c197facd17b54d5040a1bda002 MEMBER
{
    "enabled_by": {
        "login": "dcherian",
        "id": 2448579,
        "node_id": "MDQ6VXNlcjI0NDg1Nzk=",
        "avatar_url": "https://avatars.githubusercontent.com/u/2448579?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/dcherian",
        "html_url": "https://github.com/dcherian",
        "followers_url": "https://api.github.com/users/dcherian/followers",
        "following_url": "https://api.github.com/users/dcherian/following{/other_user}",
        "gists_url": "https://api.github.com/users/dcherian/gists{/gist_id}",
        "starred_url": "https://api.github.com/users/dcherian/starred{/owner}{/repo}",
        "subscriptions_url": "https://api.github.com/users/dcherian/subscriptions",
        "organizations_url": "https://api.github.com/users/dcherian/orgs",
        "repos_url": "https://api.github.com/users/dcherian/repos",
        "events_url": "https://api.github.com/users/dcherian/events{/privacy}",
        "received_events_url": "https://api.github.com/users/dcherian/received_events",
        "type": "User",
        "site_admin": false
    },
    "merge_method": "squash",
    "commit_title": "Fix multiindex level serialization after reset_index (#8672)",
    "commit_message": "* fix serialize multi-index level coord after reset\r\n\r\n* add regression test\r\n\r\n* update what's new\r\n\r\n---------\r\n\r\nCo-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>"
}
xarray 13221727 https://github.com/pydata/xarray/pull/8672  
1797701340 PR_kwDOAMm_X85rJr7c 8888 open 0 to_base_variable: coerce multiindex data to numpy array benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #8887, and probably supersedes #8809 - [x] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - ~~New functions/methods are listed in `api.rst`~~ @slevang this should also make work your test case added in #8809. I haven't added it here, instead I added a basic check that should be enough. I don't really understand why the serialization backends (zarr?) do not seem to work with the `PandasMultiIndexingAdapter.__array__()` implementation, which should normally coerce the multi-index levels into numpy arrays as needed. Anyway, I guess that coercing it early like in this PR doesn't hurt and may avoid the confusion of a non-indexed, isolated coordinate variable that still wraps a pandas.MultiIndex. 2024-03-29T10:10:42Z 2024-03-29T15:54:19Z     0f5c78efff8fdc024de20a178acf3ae7ac62f84e     0 dd9c3b4ad88b6694b6e737e86e80ad1dcfa1527c 2120808bbe45f3d4f0b6a01cd43bac4df4039092 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8888  
1808774743 PR_kwDOAMm_X85rz7ZX 8911 open 0 Refactor swap dims benbovy 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [ ] Attempt at fixing #8646 - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` I've tried here re-implementing `swap_dims` using `rename_dims`, `drop_indexes` and `set_xindex`. This fixes the example in #8646 but unfortunately this fails at handling the pandas multi-index special case (i.e., a single non-dimension coordinate wrapping a `pd.MultiIndex` that is promoted to a dimension coordinate in `swap-dims` auto-magically results in a `PandasMultiIndex` with both dimension and level coordinates). 2024-04-05T08:45:49Z 2024-04-17T16:46:34Z     36231f3beea60c788054877f91689d3469f84cbc     1 4102b9f67e5c28b85a154cf7ff0749e1f8f1a258 56182f73c56bc619a18a9ee707ef6c19d54c58a2 MEMBER   xarray 13221727 https://github.com/pydata/xarray/pull/8911  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [pull_requests] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [state] TEXT,
   [locked] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [body] TEXT,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [merged_at] TEXT,
   [merge_commit_sha] TEXT,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [draft] INTEGER,
   [head] TEXT,
   [base] TEXT,
   [author_association] TEXT,
   [auto_merge] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [url] TEXT,
   [merged_by] INTEGER REFERENCES [users]([id])
);
CREATE INDEX [idx_pull_requests_merged_by]
    ON [pull_requests] ([merged_by]);
CREATE INDEX [idx_pull_requests_repo]
    ON [pull_requests] ([repo]);
CREATE INDEX [idx_pull_requests_milestone]
    ON [pull_requests] ([milestone]);
CREATE INDEX [idx_pull_requests_assignee]
    ON [pull_requests] ([assignee]);
CREATE INDEX [idx_pull_requests_user]
    ON [pull_requests] ([user]);
Powered by Datasette · Queries took 26.021ms · About: xarray-datasette