home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

23 rows where issue = 84127296 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 11

  • shoyer 4
  • mathause 4
  • dcherian 3
  • jbusecke 3
  • jhamman 2
  • pgierz 2
  • rabernat 1
  • pwolfram 1
  • spencerkclark 1
  • markelg 1
  • aaronspring 1

author_association 3

  • MEMBER 15
  • CONTRIBUTOR 6
  • NONE 2

issue 1

  • add average function · 23 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
485456780 https://github.com/pydata/xarray/issues/422#issuecomment-485456780 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDQ4NTQ1Njc4MA== dcherian 2448579 2019-04-22T15:52:15Z 2019-04-22T15:52:15Z MEMBER

With regard to the implementation, I thought of orienting myself along the lines of groupby, rolling or resample. Or are there any concerns for this specific method?

I would do the same i.e. take inspiration from the groupby / rolling / resample modules.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
485444538 https://github.com/pydata/xarray/issues/422#issuecomment-485444538 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDQ4NTQ0NDUzOA== aaronspring 12237157 2019-04-22T15:09:16Z 2019-04-22T15:09:16Z CONTRIBUTOR

Can the stats functions from https://esmlab.readthedocs.io/en/latest/api.html#statistics-functions be used?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
484470656 https://github.com/pydata/xarray/issues/422#issuecomment-484470656 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDQ4NDQ3MDY1Ng== rabernat 1197350 2019-04-18T11:47:08Z 2019-04-18T11:48:03Z MEMBER

@pgierz - Our documentation has a page on contributing which I encourage you to read through. ~Unfortunately, we don't have any "developer documentation" to explain the actual code base itself. That would be good to add at some point.~ Edit: that was wrong. We have a page on xarray internals.

Once you have your local development environment set up and your fork cloned, the next step is to start exploring the source code and figuring out where changes need to be made. At that point, you can post any questions you have here and we will be happy to give you some guidance.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
483715005 https://github.com/pydata/xarray/issues/422#issuecomment-483715005 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDQ4MzcxNTAwNQ== dcherian 2448579 2019-04-16T15:37:37Z 2019-04-16T15:37:37Z MEMBER

@pgierz take a look at the "good first issue" label: https://github.com/pydata/xarray/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
483705762 https://github.com/pydata/xarray/issues/422#issuecomment-483705762 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDQ4MzcwNTc2Mg== pgierz 2444231 2019-04-16T15:16:23Z 2019-04-16T15:16:23Z NONE

Maybe a bad question, but is there a good jumping off point to gain some familiarity with the code base? It’s admittedly my first time looking at xarray from the inside...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
483341164 https://github.com/pydata/xarray/issues/422#issuecomment-483341164 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDQ4MzM0MTE2NA== jbusecke 14314623 2019-04-15T17:18:17Z 2019-04-15T17:18:17Z CONTRIBUTOR

Point taken. I am still not thinking general enough :-)

Are we going to require that the argument to weighted is a DataArray that shares at least one dimension with da?

This sounds good to me.

With regard to the implementation, I thought of orienting myself along the lines of groupby, rolling or resample. Or are there any concerns for this specific method?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
482737161 https://github.com/pydata/xarray/issues/422#issuecomment-482737161 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDQ4MjczNzE2MQ== dcherian 2448579 2019-04-12T22:03:27Z 2019-04-12T22:03:27Z MEMBER

I think we should maybe build in a warning that when the weights array does not contain both of the average dimensions?

hmm.. the intent here would be that the weights are broadcasted against the input array no? Not sure that a warning is required. e.g. @shoyer's comment above:

I would suggest not using keyword arguments for weighted. Instead, just align based on the labels of the argument like regular xarray operations. So we'd write da.weighted(days_per_month(da.time)).mean()

Are we going to require that the argument to weighted is a DataArray that shares at least one dimension with da?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
482719668 https://github.com/pydata/xarray/issues/422#issuecomment-482719668 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDQ4MjcxOTY2OA== jbusecke 14314623 2019-04-12T20:54:23Z 2019-04-12T20:54:23Z CONTRIBUTOR

I have to say that I am still pretty bad at thinking fully object orientented, but is this what we want in general? A subclass of xr.DataArray which gets initialized with a weight array and with some logic for nans then 'knows' about the weight count? Where would I find a good analogue for this sort of organization? In the rolling class?

I like the syntax proposed by @jhamman above, but I am wondering what happens in a slightly modified example: ```

da.shape (72, 10, 15) da.dims ('time', 'x', 'y') weights = some_func_of_x(x) da.weighted(weights).mean(dim=('x', 'y')) `` I think we should maybe build in a warning that when theweights` array does not contain both of the average dimensions?

It was mentioned that the functions on ...weighted(), would have to be mostly rewritten since the logic for a weigthed average and std differs. What other functions should be included (if any)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
482393543 https://github.com/pydata/xarray/issues/422#issuecomment-482393543 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDQ4MjM5MzU0Mw== spencerkclark 6628425 2019-04-12T00:48:09Z 2019-04-12T10:28:59Z MEMBER

It would be great to have some progress on this issue! @mathause, @pgierz, @markelg, or @jbusecke if there is anything we can do to help you get started let us know.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
481945488 https://github.com/pydata/xarray/issues/422#issuecomment-481945488 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDQ4MTk0NTQ4OA== jbusecke 14314623 2019-04-11T02:55:06Z 2019-04-11T02:55:06Z CONTRIBUTOR

Found this issue due to @rabernats blogpost. This is a much requested feature in our working group, and it would be great to build onto it in xgcm aswell. I would be very keen to help this advance.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
428855722 https://github.com/pydata/xarray/issues/422#issuecomment-428855722 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDQyODg1NTcyMg== markelg 6883049 2018-10-11T07:48:36Z 2018-10-11T07:48:36Z CONTRIBUTOR

Hi,

This would be a really nice feature to have. I'd be happy to help too.

Thank you

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
413104436 https://github.com/pydata/xarray/issues/422#issuecomment-413104436 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDQxMzEwNDQzNg== pgierz 2444231 2018-08-15T06:17:12Z 2018-08-15T06:17:12Z NONE

Hi,

my research group recently discussed weighted averaging with x-array, and I was wondering if there had been any progress with implementing this? I'd be happy to get involved if help is needed.

Thanks!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
292646849 https://github.com/pydata/xarray/issues/422#issuecomment-292646849 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDI5MjY0Njg0OQ== pwolfram 4295853 2017-04-07T20:43:48Z 2017-04-07T20:43:48Z CONTRIBUTOR

@mathause can you please comment on the status of this issue? Is there an associated PR somewhere? Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
218520080 https://github.com/pydata/xarray/issues/422#issuecomment-218520080 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDIxODUyMDA4MA== shoyer 1217238 2016-05-11T16:51:10Z 2016-05-11T16:51:10Z MEMBER

Yes, +1 for da.weighted(weight).mean(dim='time'). The mean method on weighted should have the same arguments as the mean method on DataArray -- it's just changed due to the context.

We may still end up implementing all required methods separately in weighted.

This is a fair point, I haven't looked in to the details of these implementations yet. But I expect there are still at least a few picks of logic that we will be able to share.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
218513335 https://github.com/pydata/xarray/issues/422#issuecomment-218513335 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDIxODUxMzMzNQ== jhamman 2443309 2016-05-11T16:26:55Z 2016-05-11T16:26:55Z MEMBER

@mathause -

I would think you want the latter (da.weighted(weight).mean(dim='time')). weighted should handle the brodcasting of weight such that you could do this:

``` Python

da.shape (72, 10, 15) da.dims ('time', 'x', 'y') weights = some_func_of_time(time) da.weighted(weights).mean(dim=('time', 'x')) ... ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
218413377 https://github.com/pydata/xarray/issues/422#issuecomment-218413377 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDIxODQxMzM3Nw== mathause 10194086 2016-05-11T09:51:29Z 2016-05-11T09:51:29Z MEMBER

Do we want

da.weighted(weight, dim='time').mean()

or

da.weighted(weight).mean(dim='time')

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
218403213 https://github.com/pydata/xarray/issues/422#issuecomment-218403213 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDIxODQwMzIxMw== mathause 10194086 2016-05-11T09:06:49Z 2016-05-11T09:07:24Z MEMBER

Sounds like a clean solution. Then we can defer handling of NaN in the weights to weighted (e.g. by a skipna_weights argument in weighted). Also returning sum_of_weights can be a method of the class.

We may still end up implementing all required methods separately in weighted. For mean we do:

(data * weights / sum_of_weights).sum(dim=dim)

i.e. we use sum and not mean. We could rewrite this to:

(data * weights / sum_of_weights).mean(dim=dim) * weights.count(dim=dim)

However, I think this can not be generalized to a reduce function. See e.g. for std http://stackoverflow.com/questions/30383270/how-do-i-calculate-the-standard-deviation-between-weighted-measurements

Additionally, weighted does not make sense for many operations (I would say) e.g.: min, max, count, ...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
218360875 https://github.com/pydata/xarray/issues/422#issuecomment-218360875 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDIxODM2MDg3NQ== shoyer 1217238 2016-05-11T04:47:46Z 2016-05-11T04:47:46Z MEMBER

I would suggest not using keyword arguments for weighted. Instead, just align based on the labels of the argument like regular xarray operations. So we'd write da.weighted(days_per_month(da.time)).mean()

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
218358372 https://github.com/pydata/xarray/issues/422#issuecomment-218358372 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDIxODM1ODM3Mg== jhamman 2443309 2016-05-11T04:24:05Z 2016-05-11T04:24:05Z MEMBER

@MaximilianR has suggested a groupby/rolling-like interface to weighted reductions.

``` Python da.weighted(weights=ds.dim).mean()

or maybe

da.weighted(time=days_per_month(da.time)).mean() ```

I really like this idea, as does @shoyer. I'm going to close my PR in hopes of this becoming reality.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
140823232 https://github.com/pydata/xarray/issues/422#issuecomment-140823232 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDE0MDgyMzIzMg== mathause 10194086 2015-09-16T18:02:39Z 2015-09-16T18:02:39Z MEMBER

Thanks - that seems to be the fastest possibility. I wrote the functions for Dataset and DataArray

``` python def average_da(self, dim=None, weights=None): """ weighted average for DataArrays

Parameters
----------
dim : str or sequence of str, optional
    Dimension(s) over which to apply average.
weights : DataArray
    weights to apply. Shape must be broadcastable to shape of self.

Returns
-------
reduced : DataArray
    New DataArray with average applied to its data and the indicated
    dimension(s) removed.

"""

if weights is None:
    return self.mean(dim)
else:
    if not isinstance(weights, xray.DataArray):
        raise ValueError("weights must be a DataArray")

    # if NaNs are present, we need individual weights
    if self.notnull().any():
        total_weights = weights.where(self.notnull()).sum(dim=dim)
    else:
        total_weights = weights.sum(dim)

    return (self * weights).sum(dim) / total_weights

-----------------------------------------------------------------------------

def average_ds(self, dim=None, weights=None): """ weighted average for Datasets

Parameters
----------
dim : str or sequence of str, optional
    Dimension(s) over which to apply average.
weights : DataArray
    weights to apply. Shape must be broadcastable to shape of data.

Returns
-------
reduced : Dataset
    New Dataset with average applied to its data and the indicated
    dimension(s) removed.

"""

if weights is None:
    return self.mean(dim)
else:
    return self.apply(average_da, dim=dim, weights=weights)

```

They can be combined to one function:

``` python def average(data, dim=None, weights=None): """ weighted average for xray objects

Parameters
----------
data : Dataset or DataArray
    the xray object to average over
dim : str or sequence of str, optional
    Dimension(s) over which to apply average.
weights : DataArray
    weights to apply. Shape must be broadcastable to shape of data.

Returns
-------
reduced : Dataset or DataArray
    New xray object with average applied to its data and the indicated
    dimension(s) removed.

"""

if isinstance(data, xray.Dataset):
    return average_ds(data, dim, weights)
elif isinstance(data, xray.DataArray):
    return average_da(data, dim, weights)
else:
    raise ValueError("date must be an xray Dataset or DataArray")

```

Or a monkey patch:

python xray.DataArray.average = average_da xray.Dataset.average = average_ds

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
140797623 https://github.com/pydata/xarray/issues/422#issuecomment-140797623 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDE0MDc5NzYyMw== shoyer 1217238 2015-09-16T16:40:20Z 2015-09-16T16:40:20Z MEMBER

Possibly using where, e.g., weights.where(self.notnull()).sum(dim).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
140794893 https://github.com/pydata/xarray/issues/422#issuecomment-140794893 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDE0MDc5NDg5Mw== mathause 10194086 2015-09-16T16:29:22Z 2015-09-16T16:29:32Z MEMBER

This is has to be adjusted if there are NaN in the array. weights.sum(dim) needs to be corrected not to count weights on indices where there is a NaN in self.

Is there a better way to get the correct weights than:

total_weights = weights.sum(dim) * self / self

It should probably not be used on a Dataset as every DataArray may have its own NaN structure. Or the equivalent Dataset method should loop through the DataArrays.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296
108118570 https://github.com/pydata/xarray/issues/422#issuecomment-108118570 https://api.github.com/repos/pydata/xarray/issues/422 MDEyOklzc3VlQ29tbWVudDEwODExODU3MA== shoyer 1217238 2015-06-02T22:41:22Z 2015-06-02T22:41:22Z MEMBER

Module error checking, etc., this would look something like:

python def average(self, dim=None, weights=None): if weights is None: return self.mean(dim) else: return (self * weights).sum(dim) / weights.sum(dim)

This is pretty easy to do manually, but I can see the value in having the standard method around, so I'm definitely open to PRs to add this functionality.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add average function 84127296

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 21.462ms · About: xarray-datasette