home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1364798843

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1364798843 PR_kwDOAMm_X84-hLRI 7004 Rework PandasMultiIndex.sel internals 4160723 open 0     2 2022-09-07T14:57:29Z 2022-09-22T20:38:41Z   MEMBER   0 pydata/xarray/pulls/7004
  • [x] Closes #6838
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst

This PR hopefully improves how are handled the labels that are provided for multi-index level coordinates in .sel().

More specifically, slices are handled in a cleaner way and it is now allowed to provide array-like labels.

PandasMultiIndex.sel() relies on the underlying pandas.MultiIndex methods like this:

  • use get_loc when all levels are provided with each a scalar label (no slice, no array)
  • always drops the index and returns scalar coordinates for each multi-index level
  • use get_loc_level when only a subset of levels are provided with scalar labels only
  • may collapse one or more levels of the multi-index (dropped levels result in scalar coordinates)
  • if only one level remains: renames the dimension and the corresponding dimension coordinate
  • use get_locs for all other cases.
  • always keeps the multi-index and its coordinates (even if only one item or one level is selected)

This yields a predictable behavior: as soon as one of the provided labels is a slice or array-like, the multi-index and all its level coordinates are kept in the result.

Some cases illustrated below (I compare this PR with an older release due to the errors reported in #6838):

```python import xarray as xr import pandas as pd

midx = pd.MultiIndex.from_product([list("abc"), range(4)], names=("one", "two")) ds = xr.Dataset(coords={"x": midx})

<xarray.Dataset>

Dimensions: (x: 12)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'a' 'a' 'b' 'b' 'b' 'b' 'c' 'c' 'c' 'c'

* two (x) int64 0 1 2 3 0 1 2 3 0 1 2 3

Data variables:

empty

```

```python ds.sel(one="a", two=0)

this PR

<xarray.Dataset>

Dimensions: ()

Coordinates:

x object ('a', 0)

one <U1 'a'

two int64 0

Data variables:

empty

v2022.3.0

<xarray.Dataset>

Dimensions: ()

Coordinates:

x object ('a', 0)

Data variables:

empty

```

```python ds.sel(one="a")

this PR:

<xarray.Dataset>

Dimensions: (two: 4)

Coordinates:

* two (two) int64 0 1 2 3

one <U1 'a'

Data variables:

empty

v2022.3.0

<xarray.Dataset>

Dimensions: (two: 4)

Coordinates:

* two (two) int64 0 1 2 3

Data variables:

empty

```

```python ds.sel(one=slice("a", "b"))

this PR

<xarray.Dataset>

Dimensions: (x: 8)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'a' 'a' 'b' 'b' 'b' 'b'

* two (x) int64 0 1 2 3 0 1 2 3

Data variables:

empty

v2022.3.0

<xarray.Dataset>

Dimensions: (two: 8)

Coordinates:

* two (two) int64 0 1 2 3 0 1 2 3

Data variables:

empty

```

```python ds.sel(one="a", two=slice(1, 1))

this PR

<xarray.Dataset>

Dimensions: (x: 1)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a'

* two (x) int64 1

Data variables:

empty

v2022.3.0

<xarray.Dataset>

Dimensions: (x: 1)

Coordinates:

* x (x) MultiIndex

- one (x) object 'a'

- two (x) int64 1

Data variables:

empty

```

```python ds.sel(one=["b", "c"], two=[0, 2])

this PR

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'b' 'b' 'c' 'c'

* two (x) int64 0 2 0 2

Data variables:

empty

v2022.3.0

ValueError: Vectorized selection is not available along coordinate 'one' (multi-index level)

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7004/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 1 row from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 0.788ms · About: xarray-datasette