home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "CONTRIBUTOR", issue = 807089005 and user = 367900 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • bcbnz · 3 ✖

issue 1

  • Sum and prod with min_count forces evaluation · 3 ✖

author_association 1

  • CONTRIBUTOR · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
778341660 https://github.com/pydata/xarray/issues/4898#issuecomment-778341660 https://api.github.com/repos/pydata/xarray/issues/4898 MDEyOklzc3VlQ29tbWVudDc3ODM0MTY2MA== bcbnz 367900 2021-02-12T17:44:55Z 2021-02-12T17:55:48Z CONTRIBUTOR

@dcherian it looks like that works. A better test script:

```python import numpy as np import xarray as xr from xarray.tests import raise_if_dask_computes

def worker(da): if da.shape == (0, 0): return da

return da.where(da > 1)

np.random.seed(1023) da = xr.DataArray( np.random.normal(size=(20, 500)), dims=("x", "y"), coords=(np.arange(20), np.arange(500)), )

da = da.chunk(dict(x=5)) lazy = da.map_blocks(worker)

with raise_if_dask_computes(): result = lazy.sum("x", skipna=True, min_count=5)

result.load()

assert np.isnan(result[0]) assert not np.isnan(result[6]) ```

If I then remove the if null_mask.any() check and the following block, and replace it with

python dtype, fill_value = dtypes.maybe_promote(result.dtype) result = result.astype(dtype) result = np.where(null_mask, fill_value, result) it passes. I can start working on a pull request with these tests and changes if that looks acceptable to you.

~~How would you suggest handling the possible type promotion from the current dtype, fill_value = dtypes.maybe_promote(result.dtype) line? Currently it only tries promoting if the mask is True anywhere. Always promote, or just use the fill value and hope it works out?~~

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sum and prod with min_count forces evaluation 807089005
778329536 https://github.com/pydata/xarray/issues/4898#issuecomment-778329536 https://api.github.com/repos/pydata/xarray/issues/4898 MDEyOklzc3VlQ29tbWVudDc3ODMyOTUzNg== bcbnz 367900 2021-02-12T17:23:51Z 2021-02-12T17:23:51Z CONTRIBUTOR

A quick check with the debugger and it is the null_mask.any() call that is causing it to compute.

I think I've found another problem with _maybe_null_out if it is reducing over all dimensions. With the altered MCVE

```python import numpy as np import xarray as xr

def worker(da): if da.shape == (0, 0): return da

res = xr.full_like(da, np.nan)
res[0, 0] = 1
return res

da = xr.DataArray( np.random.normal(size=(20, 500)), dims=("x", "y"), coords=(np.arange(20), np.arange(500)), )

da = da.chunk(dict(x=5)) lazy = da.map_blocks(worker) result_allaxes = lazy.sum(skipna=True, min_count=5) result_allaxes.load() ```

I would expect result_allaxes to be nan since there are four blocks and therefore four non-nan values, less than min_count. Instead it is 4.

The problem seems to be the dtype check:

https://github.com/pydata/xarray/blob/5296ed18272a856d478fbbb3d3253205508d1c2d/xarray/core/nanops.py#L39

The test returns True for float64 and so the block isn't run. Another MCVE:

```python import numpy as np from xarray.core import dtypes

print(dtypes.NAT_TYPES) print(np.dtype("float64") in dtypes.NAT_TYPES) ```

Output: console (numpy.datetime64('NaT'), numpy.timedelta64('NaT')) True where I think False would be expected. Should I open a separate issue for this or can we track it here too?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sum and prod with min_count forces evaluation 807089005
778312719 https://github.com/pydata/xarray/issues/4898#issuecomment-778312719 https://api.github.com/repos/pydata/xarray/issues/4898 MDEyOklzc3VlQ29tbWVudDc3ODMxMjcxOQ== bcbnz 367900 2021-02-12T16:55:11Z 2021-02-12T16:55:11Z CONTRIBUTOR

grepping the code, the only other function that calls _maybe_null_out is prod, and I can confirm the problem also exists there. Updated the title, MCVE for prod:

```python import numpy as np import xarray as xr

def worker(da): if da.shape == (0, 0): return da

raise RuntimeError("I was evaluated")

da = xr.DataArray( np.random.normal(size=(20, 500)), dims=("x", "y"), coords=(np.arange(20), np.arange(500)), )

da = da.chunk(dict(x=5)) lazy = da.map_blocks(worker) result1 = lazy.prod("x", skipna=True) result2 = lazy.prod("x", skipna=True, min_count=5) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sum and prod with min_count forces evaluation 807089005

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.598ms · About: xarray-datasette