html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/6392#issuecomment-1290454937,https://api.github.com/repos/pydata/xarray/issues/6392,1290454937,IC_kwDOAMm_X85M6seZ,4160723,2022-10-25T12:19:52Z,2022-10-25T12:19:52Z,MEMBER,"I'm thinking of only accepting one or more instances of [Indexes](https://github.com/pydata/xarray/blob/e678a1d7884a3c24dba22d41b2eef5d7fe5258e7/xarray/core/indexes.py#L1030) as `indexes` argument in the Dataset and DataArray constructors. The only exception is when `fastpath=True` a mapping can be given directly. - It is much easier to handle: just check that keys returned by `Indexes.variables` do no conflict with the coordinate names in the `coords` argument - It is slightly safer: it requires the user to explicitly create an `Indexes` object, thus with less chance to accidentally provide coordinate variables and index objects that do not relate to each other (we could probably add some safe guards in the `Indexes` class itself) - It is more convenient: an Xarray `Index` may provide a factory method that returns an instance of `Indexes` that we just need to pass as `indexes` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1175329407 https://github.com/pydata/xarray/issues/6392#issuecomment-1260618693,https://api.github.com/repos/pydata/xarray/issues/6392,1260618693,IC_kwDOAMm_X85LI4PF,4160723,2022-09-28T09:13:00Z,2022-09-28T12:52:01Z,MEMBER,"> How would we handle creating xarray objects from pandas objects where they have a multiindex? For `pandas.Series` / `pandas.DataFrame` objects, `DataArray.from_series()` / `Dataset.from_dataframe()` already expand multi-index levels as dimensions. For a `pandas.MultiIndex`, we could do like below but it is a bit tedious: ```python import pandas as pd import xarray as xr from xarray.indexes import PandasMultiIndex pd_idx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""foo"", ""bar"")) idx = PandasMultiIndex(pd_idx, ""x"") indexes = {""x"": idx, ""foo"": idx, ""bar"": idx} coords = idx.create_variables() ds = xr.Dataset(coords=coords, indexes=indexes) ``` For more convenience, we could add a class method to `PandasMultiIndex`, e.g., ```python # this calls PandasMultiIndex.__init__() and PandasMultiIndex.create_variables() internally indexes, coords = PandasMultiIndex.from_pandas_index(pd_idx, ""x"") ds = xr.Dataset(coords=coords, indexes=indexes) ``` Instead of `indexes, coords` raw dictionaries, we could return an instance of the [Indexes](https://github.com/pydata/xarray/blob/e678a1d7884a3c24dba22d41b2eef5d7fe5258e7/xarray/core/indexes.py#L1030) class (also returned by `Dataset.xindexes`), which encapsulates the coordinate variables: ```python xmidx = PandasMultiIndex.from_pandas_index(pd_idx, ""x"") ds = xr.Dataset(coords=xmidx.variables, indexes=xmidx) ``` For even more convenience, I think it might be reasonable to support special handling of `Indexes` instances given in Dataset / DataArray constructors and in `.update()`, i.e., ```python # both cases below will implicitly add the coordinates found in `xmidx` # (if there's no conflict with other coordinates) ds = xr.Dataset(indexes=xmidx) ds2 = xr.Dataset() ds2.update(xmidx) ``` The same approach could be used for `pandas.IntervalIndex` (as discussed in #4579). ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1175329407 https://github.com/pydata/xarray/issues/6392#issuecomment-1082497324,https://api.github.com/repos/pydata/xarray/issues/6392,1082497324,IC_kwDOAMm_X85AhZks,5635139,2022-03-30T00:32:48Z,2022-03-30T00:32:48Z,MEMBER,"Thanks for the thoughtful reply @benbovy (This is a level down and you can make a decision later, so fine if you prefer to push the discussion.) How would we handle creating xarray objects from pandas objects where they have a multiindex? To what extent do you think this is this the ""standard case"" and we could default to it? ```python idx = xr.PandasMultiIndex(pd_idx, ""x"") indexes = {""x"": idx, ""foo"": idx, ""bar"": idx} ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1175329407 https://github.com/pydata/xarray/issues/6392#issuecomment-1080738079,https://api.github.com/repos/pydata/xarray/issues/6392,1080738079,IC_kwDOAMm_X85AasEf,4160723,2022-03-28T14:38:13Z,2022-03-28T14:38:13Z,MEMBER,"> What's the rationale for deprecating this? I think my experience with users of xarray is mostly those coming from pandas; for them interop is quite important. Yes I agree that interoperability with pandas is important. Providing pandas (multi-)indexes via `coords` is convenient and worked pretty well so far because (1) indexes and dimension coordinates were not clearly distinct concepts and (2) multi-index levels were not ""real"" coordinates. However, this is not the case anymore. Now that indexes are really distinct from coordinates, I'd rather expect the following behavior for the case of pandas multi-index: ```python pd_idx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""foo"", ""bar"")) # convert a pandas multi-index to a numpy array returns level values as tuples np.array(pd_idx) # array([('a', 1), ('a', 2), ('b', 1), ('b', 2)], dtype=object) # simply pass the index as a coordinate would treat it as an array-like, i.e., like numpy does xr.Dataset(coords={""x"": pd_idx}) # # Dimensions: (x: 4) # Coordinates: # * x (x) object ('a', 1) ('a', 2) ('b', 1) ('b', 2) # Data variables: # *empty* ``` In this specific case, I'd favor consistency with how Numpy handles Pandas indexes over more convenient interoperability with Pandas. The array of tuple elements is not very useful, though. There should be ways to create Xarray objects with Pandas indexes, but I think it's better if we eventually pass them via `indexes` instead of via `coords`, or via both `indexes` and `coords` even if that's slightly less convenient. More generally, I don't know how will evolve the ecosystem in the future (how many custom Xarray indexes?). I wonder to which point in Xarray's API we should support special cases for Pandas (multi-)indexes compared to other kinds of indexes. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1175329407 https://github.com/pydata/xarray/issues/6392#issuecomment-1080007416,https://api.github.com/repos/pydata/xarray/issues/6392,1080007416,IC_kwDOAMm_X85AX5r4,5635139,2022-03-27T19:54:44Z,2022-03-27T19:54:44Z,MEMBER,"I realize there's a lot here and I've been out of this thread for a bit, so please forgive any naive questions! > I would suggest depreciating this behavior in favor of a more explicit (although more verbose) way to pass an existing pandas multi-index: What's the rationale for deprecating this? I think my experience with users of xarray is mostly those coming from pandas; for them interop is quite important. If there's a canonical way of transforming the index, it would be friendlier to do that automatically. ```python import pandas as pd import xarray as xr pd_idx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""foo"", ""bar"")) idx = pd_idx ds = xr.Dataset(coords={""x"": idx}) ``` i.e. ``` ds = xr.Dataset(coords=coords) # ValueError: missing index(es) for coordinate(s): 'x', 'foo', 'bar' # or # create unindexed coordinates 'foo' and 'bar' and a 'x' coordinate with a single pandas index ``` I would have expected the later, both for `coords=coords` and for `coords=pd_idx` (again, with the disclaimer that I may be missing crucial parts of the puzzle here). > Should we silently reorder the coordinates and/or indexes when the levels are not passed in the right order? It seems odd requiring mapping elements be passed in a given order. 👍 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1175329407 https://github.com/pydata/xarray/issues/6392#issuecomment-1079981685,https://api.github.com/repos/pydata/xarray/issues/6392,1079981685,IC_kwDOAMm_X85AXzZ1,14808389,2022-03-27T17:39:59Z,2022-03-27T17:39:59Z,MEMBER,"I wonder if it would help to have a custom type that unlike `tuple` is invalid for coordinates / data variables, but allows to reduce the redundancy? E.g. ```python indexes = {xr.combined(""lat"", ""lon""): idx, xr.combined(""z"", ""x"", ""y""): multi_index}) ``` This would be immediately normalized to: ```python indexes = {""lat"": idx, ""lon"": idx, ""z"": multi_index, ""x"": multi_index, ""y"": multi_index} ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1175329407