issues: 1175329407
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1175329407 | I_kwDOAMm_X85GDhp_ | 6392 | Pass indexes to the Dataset and DataArray constructors | 4160723 | closed | 0 | 6 | 2022-03-21T12:41:51Z | 2023-07-21T20:40:05Z | 2023-07-21T20:40:04Z | MEMBER | Is your feature request related to a problem?This is part of #6293 (explicit indexes next steps). Describe the solution you'd likeA pros:
cons:
An example with a pandas multi-indexCurrently a pandas multi-index may be passed directly as one (dimension) coordinate ; it is then "unpacked" into one dimension (tuple values) coordinate and one or more level coordinates. I would suggest depreciating this behavior in favor of a more explicit (although more verbose) way to pass an existing pandas multi-index: ```python import pandas as pd import xarray as xr pd_idx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar")) idx = xr.PandasMultiIndex(pd_idx, "x") indexes = {"x": idx, "foo": idx, "bar": idx} coords = idx.create_variables() ds = xr.Dataset(coords=coords, indexes=indexes) ``` The cases below should raise an error: ```python ds = xr.Dataset(indexes=indexes) ValueError: missing coordinate(s) for index(es): 'x', 'foo', 'bar'ds = xr.Dataset( coords=coords, indexes={"x": idx, "foo": idx}, ) ValueError: missing index(es) for coordinate(s): 'bar'ds = xr.Dataset( coords={"x": coords["x"], "foo": [0, 1, 2, 3], "bar": coords["bar"]}, indexes=indexes, ) ValueError: conflict between coordinate(s) and index(es): 'foo'ds = xr.Dataset( coords=coords, indexes={"x": idx, "foo": idx, "bar": xr.PandasIndex([0, 1, 2], "y")}, ) ValueError: conflict between coordinate(s) and index(es): 'bar'``` Should we raise an error or simply ignore the index in the case below? ```python ds = xr.Dataset(coords=coords) ValueError: missing index(es) for coordinate(s): 'x', 'foo', 'bar'orcreate unindexed coordinates 'foo' and 'bar' and a 'x' coordinate with a single pandas index``` Should we silently reorder the coordinates and/or indexes when the levels are not passed in the right order? It seems odd requiring mapping elements be passed in a given order. ```python ds = xr.Dataset(coords=coords, indexes={"bar": idx, "x": idx, "foo": idx}) list(ds.xindexes.keys()) ["x", "foo", "bar"]``` How to generalize to any (custom) index?With the case of multi-index, it is pretty easy to check whether the coordinates and indexes are consistent because we ensure consistent However, this may not be easy for other indexes. Some Xarray custom indexes (like a KD-Tree index) likely won't return anything from How could we solve this?
I think I prefer the second option. Describe alternatives you've consideredAlso allow passing index types (and build options) via
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6392/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |