home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1454832041

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1454832041 I_kwDOAMm_X85Wtvmp 7297 stack().unstack() not the same as original for datavars dependent on single coordinate of multi_index 96822049 open 0     6 2022-11-18T10:12:52Z 2023-01-17T18:28:44Z   NONE      

What is your issue?

(See MVCE example) The combination ds.stack().unstack() doesn't entirely give back the original ds, when there's a datavariable that only depends on a subset of coords of the multi-index used for stacking.

  1. Is this on purpose? And if so, what's the rationale?
  2. I would imagine that it could also be more memory efficient, when the original indexes x and y are kept that make up the multi-index (midx=[x,y]) after a stack() operation. Because then you don't have to express and thus repeat the values of dataarrays that only depend on a subset of the indexes that make up the multi-index.

MVCE

```

xarray==2022.11.0

import xarray as xr

ds = xr.Dataset(coords={'x':[1,2], 'y':[3,4]}) ds['a'] = ds.x + 5

<xarray.Dataset>

Dimensions: (x: 2, y: 2)

Coordinates:

* x (x) int32 1 2

* y (y) int32 3 4

Data variables:

a (x) int32 6 7

ds_stacked = ds.stack(midx=['x','y'])

<xarray.Dataset>

Dimensions: (midx: 4)

Coordinates:

* midx (midx) object MultiIndex

* x (midx) int32 1 1 2 2

* y (midx) int32 3 4 3 4

Data variables:

a (midx) int32 6 6 7 7

ds_unstacked = ds_stacked.unstack()

<xarray.Dataset>

Dimensions: (x: 2, y: 2)

Coordinates:

* x (x) int32 1 2

* y (y) int32 3 4

Data variables:

a (x, y) int32 6 6 7 7

```

Expected

ds_unstacked to be the same as ds. Instead the variable a has now also become a function of coordinate y, but that's not entirely correct. I.e., after ds.stack(), that the variable 'a' is still only dependent on the original coordinate 'x', which is just a part of the multi-index.

``` ds_stacked = ds.stack(midx=['x','y'])

<xarray.Dataset>

Dimensions: (midx: 4)

Coordinates:

* midx (midx) object MultiIndex

* x (midx) int32 1 1 2 2

* y (midx) int32 3 4 3 4

Data variables:

a (x) int32 6 6 7 7

**Maybe for clarity**

<xarray.Dataset>

Dimensions: (midx: 4)

Coordinates:

* midx (midx) object MultiIndex

* x (midx) int32 1 1 2 2

* y (midx) int32 3 4 3 4

Data variables:

a (midx.x) int32 6 6 7 7

**Or maybe to save memory** Make a relation/difference between midx.x (repeated values of x due to stacking) and x (original unique values).

<xarray.Dataset>

Dimensions: (midx: 4)

Coordinates:

* midx (midx) object MultiIndex

* x (midx) int32 1 1 2 2

* y (midx) int32 3 4 3 4

Data variables:

a (x) int32 6 7

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7297/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 6 rows from issue in issue_comments
Powered by Datasette · Queries took 0.619ms · About: xarray-datasette