home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

2 rows where comments = 4, repo = 13221727 and "updated_at" is on date 2021-02-16 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 2

state 1

  • closed 2

repo 1

  • xarray · 2 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
603309899 MDU6SXNzdWU2MDMzMDk4OTk= 3985 xarray=1.15.1 regression: Groupby drop multi-index DancingQuanta 8419157 closed 0     4 2020-04-20T15:05:51Z 2021-02-16T15:59:46Z 2021-02-16T15:59:46Z NONE      

I have written a function process_stacked_groupby that stack all but one dimension of a dataset/dataarray and perform groupby-apply-combine on the stacked dimension. However, after upgrading to 0.15.1, the function cease to work.

MCVE Code Sample

```python import xarray as xr

Dimensions

N = xr.DataArray(np.arange(100), dims='N', name='N') reps = xr.DataArray(np.arange(5), dims='reps', name='reps') horizon = xr.DataArray([1, -1], dims='horizon', name='horizon') horizon.attrs = {'long_name': 'Horizonal', 'units': 'H'} vertical = xr.DataArray(np.arange(1, 4), dims='vertical', name='vertical') vertical.attrs = {'long_name': 'Vertical', 'units': 'V'}

Variables

x = xr.DataArray(np.random.randn(len(N), len(reps), len(horizon), len(vertical)), dims=['N', 'reps', 'horizon', 'vertical'], name='x') y = x * 0.1 y.name = 'y'

Merge x, y

data = xr.merge([x, y])

Assign coords

data = data.assign_coords(reps=reps, vertical=vertical, horizon=horizon)

Function that stack all but one diensions and groupby over the stacked dimension.

def process_stacked_groupby(ds, dim, func, *args):

# Function to apply to stacked groupby
def apply_fn(ds, dim, func, *args):

    # Get groupby dim
    groupby_dim = list(ds.dims)
    groupby_dim.remove(dim)
    groupby_var = ds[groupby_dim]

    # Unstack groupby dim
    ds2 = ds.unstack(groupby_dim).squeeze()

    # perform function
    ds3 = func(ds2, *args)

    # Add mulit-index groupby_var to result
    ds3 = (ds3
           .reset_coords(drop=True)
           .assign_coords(groupby_var)
           .expand_dims(groupby_dim)
         )
    return ds3

# Get list of dimensions
groupby_dims = list(ds.dims)

# Remove dimension not grouped
groupby_dims.remove(dim)

# Stack all but one dimensions
stack_dim = '_'.join(groupby_dims)
ds2 = ds.stack({stack_dim: groupby_dims})

# Groupby and apply
ds2 = ds2.groupby(stack_dim, squeeze=False).map(apply_fn, args=(dim, func, *args))

# Unstack
ds2 = ds2.unstack(stack_dim)

# Restore attrs
for dim in groupby_dims:
    ds2[dim].attrs = ds[dim].attrs

return ds2

Function to apply on groupby

def fn(ds): return ds

Run groupby with applied function

data.pipe(process_stacked_groupby, 'N', fn) ```

Expected Output

Prior to xarray=0.15.0, the above code produce a result that I wanted.

The function should be able to 1. stack chosen dimensions 2. groupby the stacked dimension 3. apply a function on each group a. The function actually passes along another function with unstacked group coord b. Add multi-index stacked group coord back to the results of this function 4. combine the groups 5. Unstack stacked dimension

Problem Description

After upgrading to 0.15.1, the above code stopped working. The error occurred at the line # Unstack ds2 = ds2.unstack(stack_dim) with ValueError: cannot unstack dimensions that do not have a MultiIndex: ['horizon_reps_vertical']. This is on 5th step where the resulting combined object was found not to contain any multi-index. Somewhere in the 4th step, the combination of groups have lost the multi-index stacked dimension.

Versions

0.15.1

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3985/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
748229907 MDU6SXNzdWU3NDgyMjk5MDc= 4598 Calling pd.to_datetime on cftime variable raybellwaves 17162724 closed 0     4 2020-11-22T12:14:27Z 2021-02-16T02:42:35Z 2021-02-16T02:42:35Z CONTRIBUTOR      

It would be nice to be able to convert cftime variables to pandas datetime to utilize the functionality there.

I understand this is an upstream issue as pandas probably isn't aware of cftime. However, i'm curious if a method could be added to cftime such as .to_dataframe().

I've found pd.to_datetime(np.datetime64(date_cf)) is the best way to do this currently.

``` import xarray as xr import numpy as np import pandas as pd

date_str = '2020-01-01' date_np = np.datetime64(date_str)

date_np numpy.datetime64('2020-01-01') date_pd = pd.to_datetime(date_np) date_pd Timestamp('2020-01-01 00:00:00')

date_cf = xr.cftime_range(start=date_str, periods=1)[0] pd.to_datetime(date_cf)

pd.to_datetime(date_cf) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ray/local/bin/anaconda3/envs/a/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 830, in to_datetime result = convert_listlike(np.array([arg]), format)[0] File "/home/ray/local/bin/anaconda3/envs/a/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 459, in _convert_listlike_datetimes result, tz_parsed = objects_to_datetime64ns( File "/home/ray/local/bin/anaconda3/envs/a/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 2044, in objects_to_datetime64ns result, tz_parsed = tslib.array_to_datetime( File "pandas/_libs/tslib.pyx", line 352, in pandas._libs.tslib.array_to_datetime File "pandas/_libs/tslib.pyx", line 579, in pandas._libs.tslib.array_to_datetime File "pandas/_libs/tslib.pyx", line 718, in pandas._libs.tslib.array_to_datetime_object File "pandas/_libs/tslib.pyx", line 552, in pandas._libs.tslib.array_to_datetime TypeError: <class 'cftime._cftime.DatetimeGregorian'> is not convertible to datetime ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4598/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 907.989ms · About: xarray-datasette