home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 603309899

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
603309899 MDU6SXNzdWU2MDMzMDk4OTk= 3985 xarray=1.15.1 regression: Groupby drop multi-index 8419157 closed 0     4 2020-04-20T15:05:51Z 2021-02-16T15:59:46Z 2021-02-16T15:59:46Z NONE      

I have written a function process_stacked_groupby that stack all but one dimension of a dataset/dataarray and perform groupby-apply-combine on the stacked dimension. However, after upgrading to 0.15.1, the function cease to work.

MCVE Code Sample

```python import xarray as xr

Dimensions

N = xr.DataArray(np.arange(100), dims='N', name='N') reps = xr.DataArray(np.arange(5), dims='reps', name='reps') horizon = xr.DataArray([1, -1], dims='horizon', name='horizon') horizon.attrs = {'long_name': 'Horizonal', 'units': 'H'} vertical = xr.DataArray(np.arange(1, 4), dims='vertical', name='vertical') vertical.attrs = {'long_name': 'Vertical', 'units': 'V'}

Variables

x = xr.DataArray(np.random.randn(len(N), len(reps), len(horizon), len(vertical)), dims=['N', 'reps', 'horizon', 'vertical'], name='x') y = x * 0.1 y.name = 'y'

Merge x, y

data = xr.merge([x, y])

Assign coords

data = data.assign_coords(reps=reps, vertical=vertical, horizon=horizon)

Function that stack all but one diensions and groupby over the stacked dimension.

def process_stacked_groupby(ds, dim, func, *args):

# Function to apply to stacked groupby
def apply_fn(ds, dim, func, *args):

    # Get groupby dim
    groupby_dim = list(ds.dims)
    groupby_dim.remove(dim)
    groupby_var = ds[groupby_dim]

    # Unstack groupby dim
    ds2 = ds.unstack(groupby_dim).squeeze()

    # perform function
    ds3 = func(ds2, *args)

    # Add mulit-index groupby_var to result
    ds3 = (ds3
           .reset_coords(drop=True)
           .assign_coords(groupby_var)
           .expand_dims(groupby_dim)
         )
    return ds3

# Get list of dimensions
groupby_dims = list(ds.dims)

# Remove dimension not grouped
groupby_dims.remove(dim)

# Stack all but one dimensions
stack_dim = '_'.join(groupby_dims)
ds2 = ds.stack({stack_dim: groupby_dims})

# Groupby and apply
ds2 = ds2.groupby(stack_dim, squeeze=False).map(apply_fn, args=(dim, func, *args))

# Unstack
ds2 = ds2.unstack(stack_dim)

# Restore attrs
for dim in groupby_dims:
    ds2[dim].attrs = ds[dim].attrs

return ds2

Function to apply on groupby

def fn(ds): return ds

Run groupby with applied function

data.pipe(process_stacked_groupby, 'N', fn) ```

Expected Output

Prior to xarray=0.15.0, the above code produce a result that I wanted.

The function should be able to 1. stack chosen dimensions 2. groupby the stacked dimension 3. apply a function on each group a. The function actually passes along another function with unstacked group coord b. Add multi-index stacked group coord back to the results of this function 4. combine the groups 5. Unstack stacked dimension

Problem Description

After upgrading to 0.15.1, the above code stopped working. The error occurred at the line # Unstack ds2 = ds2.unstack(stack_dim) with ValueError: cannot unstack dimensions that do not have a MultiIndex: ['horizon_reps_vertical']. This is on 5th step where the resulting combined object was found not to contain any multi-index. Somewhere in the 4th step, the combination of groups have lost the multi-index stacked dimension.

Versions

0.15.1

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3985/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 4 rows from issue in issue_comments
Powered by Datasette · Queries took 0.7ms · About: xarray-datasette