home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 84127296 and user = 10194086 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • mathause · 4 ✖

issue 1

  • add average function · 4 ✖

author_association 1

  • MEMBER 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
218413377 https://github.com/pydata/xarray/issues/422#issuecomment-218413377 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDIxODQxMzM3Nw== mathause 10194086 2016-05-11T09:51:29Z 2016-05-11T09:51:29Z MEMBER

Do we want

da.weighted(weight, dim='time').mean()

or

da.weighted(weight).mean(dim='time')

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
218403213 https://github.com/pydata/xarray/issues/422#issuecomment-218403213 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDIxODQwMzIxMw== mathause 10194086 2016-05-11T09:06:49Z 2016-05-11T09:07:24Z MEMBER

Sounds like a clean solution. Then we can defer handling of NaN in the weights to weighted (e.g. by a skipna_weights argument in weighted). Also returning sum_of_weights can be a method of the class.

We may still end up implementing all required methods separately in weighted. For mean we do:

(data * weights / sum_of_weights).sum(dim=dim)

i.e. we use sum and not mean. We could rewrite this to:

(data * weights / sum_of_weights).mean(dim=dim) * weights.count(dim=dim)

However, I think this can not be generalized to a reduce function. See e.g. for std http://stackoverflow.com/questions/30383270/how-do-i-calculate-the-standard-deviation-between-weighted-measurements

Additionally, weighted does not make sense for many operations (I would say) e.g.: min, max, count, ...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
140823232 https://github.com/pydata/xarray/issues/422#issuecomment-140823232 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDE0MDgyMzIzMg== mathause 10194086 2015-09-16T18:02:39Z 2015-09-16T18:02:39Z MEMBER

Thanks - that seems to be the fastest possibility. I wrote the functions for Dataset and DataArray

``` python def average_da(self, dim=None, weights=None): """ weighted average for DataArrays

Parameters
----------
dim : str or sequence of str, optional
    Dimension(s) over which to apply average.
weights : DataArray
    weights to apply. Shape must be broadcastable to shape of self.

Returns
-------
reduced : DataArray
    New DataArray with average applied to its data and the indicated
    dimension(s) removed.

"""

if weights is None:
    return self.mean(dim)
else:
    if not isinstance(weights, xray.DataArray):
        raise ValueError("weights must be a DataArray")

    # if NaNs are present, we need individual weights
    if self.notnull().any():
        total_weights = weights.where(self.notnull()).sum(dim=dim)
    else:
        total_weights = weights.sum(dim)

    return (self * weights).sum(dim) / total_weights

-----------------------------------------------------------------------------

def average_ds(self, dim=None, weights=None): """ weighted average for Datasets

Parameters
----------
dim : str or sequence of str, optional
    Dimension(s) over which to apply average.
weights : DataArray
    weights to apply. Shape must be broadcastable to shape of data.

Returns
-------
reduced : Dataset
    New Dataset with average applied to its data and the indicated
    dimension(s) removed.

"""

if weights is None:
    return self.mean(dim)
else:
    return self.apply(average_da, dim=dim, weights=weights)

```

They can be combined to one function:

``` python def average(data, dim=None, weights=None): """ weighted average for xray objects

Parameters
----------
data : Dataset or DataArray
    the xray object to average over
dim : str or sequence of str, optional
    Dimension(s) over which to apply average.
weights : DataArray
    weights to apply. Shape must be broadcastable to shape of data.

Returns
-------
reduced : Dataset or DataArray
    New xray object with average applied to its data and the indicated
    dimension(s) removed.

"""

if isinstance(data, xray.Dataset):
    return average_ds(data, dim, weights)
elif isinstance(data, xray.DataArray):
    return average_da(data, dim, weights)
else:
    raise ValueError("date must be an xray Dataset or DataArray")

```

Or a monkey patch:

python xray.DataArray.average = average_da xray.Dataset.average = average_ds

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
140794893 https://github.com/pydata/xarray/issues/422#issuecomment-140794893 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDE0MDc5NDg5Mw== mathause 10194086 2015-09-16T16:29:22Z 2015-09-16T16:29:32Z MEMBER

This is has to be adjusted if there are NaN in the array. weights.sum(dim) needs to be corrected not to count weights on indices where there is a NaN in self.

Is there a better way to get the correct weights than:

total_weights = weights.sum(dim) * self / self

It should probably not be used on a Dataset as every DataArray may have its own NaN structure. Or the equivalent Dataset method should loop through the DataArrays.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 54.134ms · About: xarray-datasette