home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER", issue = 729980097 and user = 10194086 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • mathause · 4 ✖

issue 1

  • Option to skip tests in `weighted()` · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
719907040 https://github.com/pydata/xarray/issues/4541#issuecomment-719907040 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxOTkwNzA0MA== mathause 10194086 2020-10-31T09:10:10Z 2020-10-31T09:10:10Z MEMBER

Yes that would be great.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
717320425 https://github.com/pydata/xarray/issues/4541#issuecomment-717320425 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNzMyMDQyNQ== mathause 10194086 2020-10-27T15:23:55Z 2020-10-27T15:23:55Z MEMBER

The discussion goes back to here: https://github.com/pydata/xarray/pull/2922#issuecomment-545200082 (by @dcherian)

I decided to replace all NaN in the weights with 0.

Can we raise an error instead? It should be easy for the user to do weights.fillna(0) instead of relying on xarray's magical behaviour.

Thinking a bit more about this I now favour the isnull().any() test and would add a check_weights kwargs. I would even be fine to set check_weights=False per default and say the user is responsible to supply valid weights (but I'd want others to weigh in here).

In addition, a.isnull().any() is quite a bit faster than a.fillna(0) (even if there are no nans present). This is mostly true for numpy arrays, not so much for dask (by my limited tests). On the other hand the isnull().any() test is a small percentage of the total time (https://github.com/pydata/xarray/issues/3883#issuecomment-630387515).


I am also not entirely sure I understand where your issue lies. You eventually have to compute, right? Do you do something between w = data.weighted(weights) and w.mean()?

Ah maybe I understand, your data looks like:

  • data: <xarray.DataArray (time: 1000, models: 1)>
  • weights: <xarray.DataArray (time: 1000, models: 100)>

And now weights gets checked for all 100 models where only one would be relevant. Is this correct? (So as another workaround would be using xr.align before sending weights to weighted.)


My limited speed tests:

```python import numpy as np import xarray as xr a = xr.DataArray(np.random.randn(1000, 1000, 10, 10)) %timeit a.isnull().any() %timeit a.fillna(0) b = xr.DataArray(np.random.randn(1000, 1000, 10, 10)).chunk(100) %timeit b.isnull().any() %timeit b.fillna(0) ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
717240738 https://github.com/pydata/xarray/issues/4541#issuecomment-717240738 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNzI0MDczOA== mathause 10194086 2020-10-27T13:24:43Z 2020-10-27T13:24:43Z MEMBER

The other possibility would be to do sth like:

```python def init(..., skipna=False):

if skipna:
    weights = weighs.fillna(0)

``` we did decide to not do this somewhere in the discussion, not entirely sure anymore why.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
717107362 https://github.com/pydata/xarray/issues/4541#issuecomment-717107362 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNzEwNzM2Mg== mathause 10194086 2020-10-27T09:27:25Z 2020-10-27T09:27:25Z MEMBER

weights cannot contain NaNs else the result will just be NaN, even with skipna=True. But then the weights rarely contain NaN. So this test is a bit a trade-off between time and convenience. A kwarg can certainly make sense (was also requested before). I would probably not call the kwarg skipna. Maybe check_weights? or check_nan? (better names welcome)

I think da.isnull().any() is lazy and it's the if that makes it eager. So an alternative would be to make the statement lazy but I don't know how this would be done.

The relevant test is here: https://github.com/pydata/xarray/blob/adc55ac4d2883e0c6647f3983c3322ca2c690514/xarray/tests/test_weighted.py#L22

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 69.255ms · About: xarray-datasette