home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 425320466 and user = 20053498 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • C-H-Simpson · 1 ✖

issue 1

  • Allow grouping by dask variables · 1 ✖

author_association 1

  • NONE 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
652898319 https://github.com/pydata/xarray/issues/2852#issuecomment-652898319 https://api.github.com/repos/pydata/xarray/issues/2852 MDEyOklzc3VlQ29tbWVudDY1Mjg5ODMxOQ== C-H-Simpson 20053498 2020-07-02T09:29:32Z 2020-07-02T09:29:55Z NONE

I'm going to share a code snippet that might be useful to people reading this issue. I wanted to group my data by month and year, and take the mean for each group.

I did not want to use resample, as I wanted the dimensions to be ('month', 'year'), rather than ('time'). The obvious way of doing this is to use a pd.MultiIndex to create a 'year_month' stacked coordinate: I found this did not have good perfomance.

My solution was to use xr.apply_ufunc, as suggested above. I think it should be OK with dask chunked data, provided it is not chunked in time.

Here is the code:

``` def _grouped_mean( data: np.ndarray, months: np.ndarray, years: np.ndarray) -> np.ndarray: """similar to grouping year_month MultiIndex, but faster.

    Should be used wrapped by _wrapped_grouped_mean"""
    unique_months = np.sort(np.unique(months))
    unique_years = np.sort(np.unique(years))
    old_shape = list(data.shape)
    new_shape = old_shape[:-1]
    new_shape.append(unique_months.shape[0])
    new_shape.append(unique_years.shape[0])

    output = np.zeros(new_shape)

    for i_month, j_year in np.ndindex(output.shape[2:]):
        indices = np.intersect1d(
            (months == unique_months[i_month]).nonzero(),
            (years == unique_years[j_year]).nonzero()
        )

        output[:, :, i_month, j_year] =\
            np.mean(data[:, :, indices], axis=-1)

    return output

def _wrapped_grouped_mean(da: xr.DataArray) -> xr.DataArray: """similar to grouping by a year_month MultiIndex, but faster.

    Wraps a numpy-style function with xr.apply_ufunc
    """
    Y = xr.apply_ufunc(
        _grouped_mean,
        da,
        da.time.dt.month,
        da.time.dt.year,
        input_core_dims=[['lat', 'lon', 'time'], ['time'], ['time']],
        output_core_dims=[['lat', 'lon', 'month', 'year']],
    )
    Y = Y.assign_coords(
        {'month': np.sort(np.unique(da.time.dt.month)),
         'year': np.sort(np.unique(da.time.dt.year))})
    return Y

```

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow grouping by dask variables 425320466

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.114ms · About: xarray-datasette