home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 919822643

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/5692#issuecomment-919822643 https://api.github.com/repos/pydata/xarray/issues/5692 919822643 IC_kwDOAMm_X84202Ez 4160723 2021-09-15T08:45:00Z 2021-09-15T08:45:00Z MEMBER

This PR introduces some minor changes in behavior (no API change), mostly related to somewhat tricky workarounds to the limitations of the “index/dimension” coordinates concept that we no longer need with explicit indexes.

I'll detail them below. A couple of those changes are bug fixes so we can safely make them now. For the other changes I'm not sure how best to proceed. Keeping the current behavior may in some cases require additional implementation effort that I'm not sure it's worth doing if no one relies on this weird behavior. Any thoughts @pydata/xarray?


  1. .rename_*

1.1 Indexes are now preserved when renaming dimensions or coordinates

```python ds = xr.Dataset({"x": ("x", [0, 1, 2])}) renamed = ds.rename_dims({"x": "x_new"})

Before

"x" in renamed.indexes # False

Now

"x" in renamed.indexes # True ```


  1. set_index

2.1 Coordinate(s) dtype is preserved when setting new (multi-)indexes

```python da = xr.DataArray( [0, 1, 3, 4], dims="x", coords={"x": [0, 1, 0, 1], "y": ("x", ["a", "b", "a", "b"])} ) print(da)

<xarray.DataArray (x: 4)>

array([0, 1, 3, 4])

Coordinates:

* x (x) int64 0 1 0 1

y (x) <U1 'a' 'b' 'a' 'b'

python indexed = da.set_index(xy=["x", "y"])

Before

indexed.y.dtype # dtype('O')

Now

indexed.y.dtype # dtype('<U1') python indexed = da.set_index(x="y")

Before

indexed.y.dtype # dtype('O')

Now

indexed.y.dtype # dtype('<U1') ```

2.2 Setting a new single index for a new dimension raises an error

New dimension names allowed in set_index is useful to create new multi-indexes from scratch and avoid dimension name conflicts for its levels. However, new dimension names is also allowed for single indexes and resulted in this case to weird dimension renaming, now it raises an error.

```python

Before (bug)

da.set_index(y="x")

<xarray.DataArray (x: 4)>

array([0, 1, 2, 3])

Coordinates:

* y (y) int64 0 1 0 1

Dimensions without coordinates: x

Now

da.set_index(y="x")

ValueError: try setting an index for dimension 'y'

with variable 'x' that has dimensions ('x',)

```


  1. reset_index

3.1 Coordinate (level) names are preserved when (partially) resetting a multi-index

```python indexed = da.set_index(xy=["x", "y"]) indexed

<xarray.DataArray (xy: 4)>

array([0, 1, 2, 3])

Coordinates:

* xy (xy) object MultiIndex

* x (xy) int64 0 1 0 1

* y (xy) <U1 'a' 'a' 'b' 'b'

```

```python

Before

indexed.reset_index("x")

<xarray.DataArray (xy: 4)>

array([0, 1, 2, 3])

Coordinates:

* xy (xy) object 'a' 'a' 'b' 'b'

x (xy) int64 0 1 0 1

Now

indexed.reset_index("x")

<xarray.DataArray (xy: 4)>

array([0, 1, 2, 3])

Coordinates:

xy (xy) object MultiIndex

x (xy) int64 0 1 0 1

* y (xy) <U1 'a' 'a' 'b' 'b'

```

(note: the xy coordinate in the latter result has MultiIndex but is actually not part anymore of any xarray index in the returned DataArray. We'll need to remove the MultiIndex inline repr in favor of the tuple (level values) coordinate, perhaps when we'll add an Indexes section to the xarray objects repr. Eventually we could also get rid of this tuple coordinate and keep the coordinates of the multi-index levels only)

3.2 Single index coordinates that are reset (but kept) are not renamed

```python

Before

da.reset_index("x")

<xarray.DataArray (x: 4)>

array([0, 1, 2, 3])

Coordinates:

y (x) <U1 'a' 'a' 'b' 'b'

x_ (x) int64 0 1 0 1

Dimensions without coordinates: x

Now

da.reset_index("x")

<xarray.DataArray (x: 4)>

array([0, 1, 2, 3])

Coordinates:

x (x) int64 0 1 0 1

y (x) <U1 'a' 'a' 'b' 'b'

```

3.3 Fix bug when trying to reset a non-dimension (and non-level) coordinate

```python

Before

da.reset_index("y")

<xarray.DataArray (x: 4)>

array([0, 1, 2, 3])

Coordinates:

* x (x) int64 0 1 0 1

y_ (y) object 'a' 'a' 'b' 'b'

Now

da.reset_index("y")

ValueError: ('y',) are not coordinates with an index

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  966983801
Powered by Datasette · Queries took 76.629ms · About: xarray-datasette