home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

16 rows where issue = 729980097 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • jbusecke 5
  • max-sixty 4
  • mathause 4
  • dcherian 3

author_association 2

  • MEMBER 11
  • CONTRIBUTOR 5

issue 1

  • Option to skip tests in `weighted()` · 16 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
719907040 https://github.com/pydata/xarray/issues/4541#issuecomment-719907040 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxOTkwNzA0MA== mathause 10194086 2020-10-31T09:10:10Z 2020-10-31T09:10:10Z MEMBER

Yes that would be great.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
717625573 https://github.com/pydata/xarray/issues/4541#issuecomment-717625573 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNzYyNTU3Mw== jbusecke 14314623 2020-10-28T00:45:31Z 2020-10-28T00:45:31Z CONTRIBUTOR

Another option would be to put the check in a .map_blocks call for dask arrays. This would only run and raise at compute time.

Uh that sounds great actually. Same functionality, no triggered computation, and no intervention needed from the user. Should I try to implement this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
717343483 https://github.com/pydata/xarray/issues/4541#issuecomment-717343483 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNzM0MzQ4Mw== dcherian 2448579 2020-10-27T15:57:50Z 2020-10-27T15:57:50Z MEMBER

Another option would be to put the check in a .map_blocks call for dask arrays. This would only run and raise at compute time.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
717342942 https://github.com/pydata/xarray/issues/4541#issuecomment-717342942 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNzM0Mjk0Mg== dcherian 2448579 2020-10-27T15:57:03Z 2020-10-27T15:57:03Z MEMBER

The discussion goes back to here: #2922 (comment) (by @dcherian)

Ah, sorry! I was thinking of weights as being numpy arrays, not so much dask arrays.

Do you do something between w = data.weighted(weights) and w.mean()?

Yeah I think this is the issue. .weighted should be lazy.

Thinking a bit more about this I now favour the isnull().any() test and would add a check_weights kwargs.

This would be OK. We could also drop the check and let users deal with it, and also add a warning to the docstring.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
717320425 https://github.com/pydata/xarray/issues/4541#issuecomment-717320425 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNzMyMDQyNQ== mathause 10194086 2020-10-27T15:23:55Z 2020-10-27T15:23:55Z MEMBER

The discussion goes back to here: https://github.com/pydata/xarray/pull/2922#issuecomment-545200082 (by @dcherian)

I decided to replace all NaN in the weights with 0.

Can we raise an error instead? It should be easy for the user to do weights.fillna(0) instead of relying on xarray's magical behaviour.

Thinking a bit more about this I now favour the isnull().any() test and would add a check_weights kwargs. I would even be fine to set check_weights=False per default and say the user is responsible to supply valid weights (but I'd want others to weigh in here).

In addition, a.isnull().any() is quite a bit faster than a.fillna(0) (even if there are no nans present). This is mostly true for numpy arrays, not so much for dask (by my limited tests). On the other hand the isnull().any() test is a small percentage of the total time (https://github.com/pydata/xarray/issues/3883#issuecomment-630387515).


I am also not entirely sure I understand where your issue lies. You eventually have to compute, right? Do you do something between w = data.weighted(weights) and w.mean()?

Ah maybe I understand, your data looks like:

  • data: <xarray.DataArray (time: 1000, models: 1)>
  • weights: <xarray.DataArray (time: 1000, models: 100)>

And now weights gets checked for all 100 models where only one would be relevant. Is this correct? (So as another workaround would be using xr.align before sending weights to weighted.)


My limited speed tests:

```python import numpy as np import xarray as xr a = xr.DataArray(np.random.randn(1000, 1000, 10, 10)) %timeit a.isnull().any() %timeit a.fillna(0) b = xr.DataArray(np.random.randn(1000, 1000, 10, 10)).chunk(100) %timeit b.isnull().any() %timeit b.fillna(0) ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
717266102 https://github.com/pydata/xarray/issues/4541#issuecomment-717266102 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNzI2NjEwMg== jbusecke 14314623 2020-10-27T14:03:34Z 2020-10-27T14:03:34Z CONTRIBUTOR

Thanks @mathause , I was wondering how much of a performance trade off .fillna(0) is on a dask array with no nans, compared to the check.

I favor this, since it allows slicing before the calculation is triggered: I have a current situation where I do a bunch of operations on a large multi-model dataset. The weights are time and member dependent and I am trying to save each member separately. Having the calculation triggered for the full dataset is problematic and fillna(0) avoids that (working with a hacked branch where I simply removed the check for nans)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
717240738 https://github.com/pydata/xarray/issues/4541#issuecomment-717240738 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNzI0MDczOA== mathause 10194086 2020-10-27T13:24:43Z 2020-10-27T13:24:43Z MEMBER

The other possibility would be to do sth like:

```python def init(..., skipna=False):

if skipna:
    weights = weighs.fillna(0)

``` we did decide to not do this somewhere in the discussion, not entirely sure anymore why.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
717107362 https://github.com/pydata/xarray/issues/4541#issuecomment-717107362 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNzEwNzM2Mg== mathause 10194086 2020-10-27T09:27:25Z 2020-10-27T09:27:25Z MEMBER

weights cannot contain NaNs else the result will just be NaN, even with skipna=True. But then the weights rarely contain NaN. So this test is a bit a trade-off between time and convenience. A kwarg can certainly make sense (was also requested before). I would probably not call the kwarg skipna. Maybe check_weights? or check_nan? (better names welcome)

I think da.isnull().any() is lazy and it's the if that makes it eager. So an alternative would be to make the statement lazy but I don't know how this would be done.

The relevant test is here: https://github.com/pydata/xarray/blob/adc55ac4d2883e0c6647f3983c3322ca2c690514/xarray/tests/test_weighted.py#L22

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
716937842 https://github.com/pydata/xarray/issues/4541#issuecomment-716937842 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNjkzNzg0Mg== max-sixty 5635139 2020-10-27T02:29:59Z 2020-10-27T06:15:51Z MEMBER

If it leads to incorrect results, I agree. If it leads to a lazy error (even if more confusing), or a result array full of NaNs, then I think it's fine. Not super confident on the latter case, tbc.

If we want more control, I would advocate for using a standard kwarg that offers control over the computation — e.g. skip_na often gives more performance in exchange for (edit: the user) ensuring no NaNs — rather than an idiosyncratic kwarg that's derived by the internals of this implementation

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
716974071 https://github.com/pydata/xarray/issues/4541#issuecomment-716974071 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNjk3NDA3MQ== jbusecke 14314623 2020-10-27T04:33:04Z 2020-10-27T04:33:04Z CONTRIBUTOR

Sounds good. I'll see if I can make some time to test and put up a PR this week.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
716930400 https://github.com/pydata/xarray/issues/4541#issuecomment-716930400 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNjkzMDQwMA== jbusecke 14314623 2020-10-27T02:06:35Z 2020-10-27T02:06:35Z CONTRIBUTOR

What would happen in this case if a dask array with nans is passed? Would this somehow silently influence the results or would it not matter (in that case I wonder what the check was for). If this could lead to undetected errors I would still consider a kwargs a safer alternative, especially for new users?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
716928594 https://github.com/pydata/xarray/issues/4541#issuecomment-716928594 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNjkyODU5NA== max-sixty 5635139 2020-10-27T02:00:40Z 2020-10-27T02:00:40Z MEMBER

Sorry if my initial issue was unclear.

Not at all, my mistake

So you favor not having a 'skip' kwarg to just internally skipping the call to .any() if weights is a dask array?

👍

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
716927242 https://github.com/pydata/xarray/issues/4541#issuecomment-716927242 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNjkyNzI0Mg== jbusecke 14314623 2020-10-27T01:56:28Z 2020-10-27T01:56:28Z CONTRIBUTOR

Sorry if my initial issue was unclear. So you favor not having a 'skip' kwarg to just internally skipping the call to .any() if weights is a dask array?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
716913449 https://github.com/pydata/xarray/issues/4541#issuecomment-716913449 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNjkxMzQ0OQ== max-sixty 5635139 2020-10-27T01:13:04Z 2020-10-27T01:13:04Z MEMBER

Sorry, I completely misunderstood! I thought you were asking about skipping tests as in pytest, hence my confusion.

For sure re skipping those checks with dask arrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
716908428 https://github.com/pydata/xarray/issues/4541#issuecomment-716908428 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNjkwODQyOA== max-sixty 5635139 2020-10-27T00:57:04Z 2020-10-27T01:10:30Z MEMBER

I don't have that much context on xgcm so others may know better on this.

~Could you help me understand in what context you're running the tests?~

~IIRC we used to have --skip-slow here, but it wasn't used that much and so is no longer there. It's definitely possible to add that sort of flag.~

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097
716910761 https://github.com/pydata/xarray/issues/4541#issuecomment-716910761 https://api.github.com/repos/pydata/xarray/issues/4541 MDEyOklzc3VlQ29tbWVudDcxNjkxMDc2MQ== dcherian 2448579 2020-10-27T01:04:14Z 2020-10-27T01:04:22Z MEMBER

The relevant context is that .any() will trigger computation on a dask array. Maybe we skip the check using is_duck_dask_array?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Option to skip tests in `weighted()` 729980097

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.609ms · About: xarray-datasette