home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 882876804 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • max-sixty 3
  • AndrewILWilliams 3
  • dcherian 1
  • keewis 1

author_association 2

  • MEMBER 5
  • CONTRIBUTOR 3

issue 1

  • Dask-friendly nan check in xr.corr() and xr.cov() · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
849811814 https://github.com/pydata/xarray/pull/5284#issuecomment-849811814 https://api.github.com/repos/pydata/xarray/issues/5284 MDEyOklzc3VlQ29tbWVudDg0OTgxMTgxNA== max-sixty 5635139 2021-05-27T17:31:02Z 2021-05-27T17:31:02Z MEMBER

Thank you very much @AndrewWilliams3142 !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask-friendly nan check in xr.corr() and xr.cov() 882876804
849169576 https://github.com/pydata/xarray/pull/5284#issuecomment-849169576 https://api.github.com/repos/pydata/xarray/issues/5284 MDEyOklzc3VlQ29tbWVudDg0OTE2OTU3Ng== max-sixty 5635139 2021-05-26T22:46:43Z 2021-05-26T22:46:43Z MEMBER

Please feel free to add a whatsnew @AndrewWilliams3142

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask-friendly nan check in xr.corr() and xr.cov() 882876804
848612330 https://github.com/pydata/xarray/pull/5284#issuecomment-848612330 https://api.github.com/repos/pydata/xarray/issues/5284 MDEyOklzc3VlQ29tbWVudDg0ODYxMjMzMA== AndrewILWilliams 56925856 2021-05-26T09:19:50Z 2021-05-26T09:19:50Z CONTRIBUTOR

Hey both, I've added a test to check that dask doesn't compute when calling either xr.corr() or xr.cov(), and also that the end result is still a dask array. Let me know if there's anything I've missed though! thanks for the help :)

@dcherian, regarding the apply_ufunc approach, I might leave that for now but as you said it can always be a future PR

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask-friendly nan check in xr.corr() and xr.cov() 882876804
838231568 https://github.com/pydata/xarray/pull/5284#issuecomment-838231568 https://api.github.com/repos/pydata/xarray/issues/5284 MDEyOklzc3VlQ29tbWVudDgzODIzMTU2OA== AndrewILWilliams 56925856 2021-05-11T10:28:08Z 2021-05-12T20:45:00Z CONTRIBUTOR

Thanks for that @dcherian ! I didn't know you could use print debugging on chunked operations like this!

One thing actually: If I change da = da.where(missing_vals) to da = da.where(~missing_vals) then we get the results we'd expect. Do you think this fixes the problem?

``` def _get_valid_values(da, other): da1, da2 = xr.align(da, other, join="outer", copy=False)

# 2. Ignore the nans
missing_vals = np.logical_or(da1.isnull(), da2.isnull())

if missing_vals.any():
    da = da.where(~missing_vals)
    return da
else:
    return da

```

print(da_a.map_blocks(_get_valid_values, args=[da_b]).compute()) <xarray.DataArray (space: 3, time: 4)> array([[1. , 2. , 3. , 4. ], [1. , 0.1, 0.2, 0.3], [2. , 3.2, nan, 1.8]]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) object 'IA' 'IL' 'IN' *

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask-friendly nan check in xr.corr() and xr.cov() 882876804
837097308 https://github.com/pydata/xarray/pull/5284#issuecomment-837097308 https://api.github.com/repos/pydata/xarray/issues/5284 MDEyOklzc3VlQ29tbWVudDgzNzA5NzMwOA== dcherian 2448579 2021-05-10T18:25:29Z 2021-05-10T18:25:29Z MEMBER

Well that was confusing! if missing_vals.any(): will not be triggered if all the values in a block are valid. With .chunk({"time": 1}), some of the blocks are all valid.

I discovered this with print debugging and the single-threaded scheduler which loops over blocks in a for-loop ```

Define function to use in map_blocks

def _get_valid_values(da, other): da1, da2 = xr.align(da, other, join="inner", copy=False)

# 2. Ignore the nans
missing_vals = np.logical_or(da1.isnull(), da2.isnull())
print(missing_vals)

if missing_vals.any():
    da = da.where(missing_vals)
return da

da_a.map_blocks(_get_valid_values, args=[da_b]).compute(scheduler="sync") ```

For better performance, we should try a dask.array.map_blocks approach with duck_array_ops.isnull

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask-friendly nan check in xr.corr() and xr.cov() 882876804
837045902 https://github.com/pydata/xarray/pull/5284#issuecomment-837045902 https://api.github.com/repos/pydata/xarray/issues/5284 MDEyOklzc3VlQ29tbWVudDgzNzA0NTkwMg== keewis 14808389 2021-05-10T17:53:05Z 2021-05-10T17:53:05Z MEMBER

4559 and #4668 fixed a similar issue using map_blocks, maybe you can use that as a reference?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask-friendly nan check in xr.corr() and xr.cov() 882876804
837032429 https://github.com/pydata/xarray/pull/5284#issuecomment-837032429 https://api.github.com/repos/pydata/xarray/issues/5284 MDEyOklzc3VlQ29tbWVudDgzNzAzMjQyOQ== AndrewILWilliams 56925856 2021-05-10T17:44:29Z 2021-05-10T17:44:29Z CONTRIBUTOR

Hi @dcherian , just thinking about your suggestion for using map_blocks on the actual valid_values check. I've tested this and was wondering if you could maybe point to where I'm going wrong? It does mask out some of the values in a lazy way, but not the correct ones.

```python3 da_a = xr.DataArray( np.array([[1, 2, 3, 4], [1, 0.1, 0.2, 0.3], [2, 3.2, 0.6, 1.8]]), dims=("space", "time"), coords=[ ("space", ["IA", "IL", "IN"]), ("time", pd.date_range("2000-01-01", freq="1D", periods=4)), ], ).chunk({'time':1})

da_b = xr.DataArray( np.array([[0.2, 0.4, 0.6, 2], [15, 10, 5, 1], [1, 3.2, np.nan, 1.8]]), dims=("space", "time"), coords=[ ("space", ["IA", "IL", "IN"]), ("time", pd.date_range("2000-01-01", freq="1D", periods=4)), ], ).chunk({'time':1})

print(da_a)

<xarray.DataArray (space: 3, time: 4)> array([[1. , 2. , 3. , 4. ], [1. , 0.1, 0.2, 0.3], [2. , 3.2, 0.6, 1.8]]) Coordinates: * space (space) <U2 'IA' 'IL' 'IN' * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04

print(da_b)

<xarray.DataArray (space: 3, time: 4)> array([[ 0.2, 0.4, 0.6, 2. ], [15. , 10. , 5. , 1. ], [ 1. , 3.2, nan, 1.8]]) Coordinates: * space (space) <U2 'IA' 'IL' 'IN' * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04

Define function to use in map_blocks

def _get_valid_values(da, other): da1, da2 = xr.align(da, other, join="inner", copy=False)

# 2. Ignore the nans
missing_vals = np.logical_or(da1.isnull(), da2.isnull())

if missing_vals.any():
    da = da.where(missing_vals)
    return da
else:
    return da

test

outp = da_a.map_blocks(_get_valid_values, args=[da_b])

print(outp.compute())

<xarray.DataArray (space: 3, time: 4)> array([[1. , 2. , nan, 4. ], [1. , 0.1, nan, 0.3], [2. , 3.2, 0.6, 1.8]]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) object 'IA' 'IL' 'IN' ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask-friendly nan check in xr.corr() and xr.cov() 882876804
835907226 https://github.com/pydata/xarray/pull/5284#issuecomment-835907226 https://api.github.com/repos/pydata/xarray/issues/5284 MDEyOklzc3VlQ29tbWVudDgzNTkwNzIyNg== max-sixty 5635139 2021-05-09T22:15:25Z 2021-05-09T22:15:25Z MEMBER

Hi @AndrewWilliams3142 , thanks for another PR!

This looks good. Could we add a test like https://github.com/pydata/xarray/pull/4559/files#diff-74d2dc289aa601b2de094fb3a3b687fd65963401b51b95cc5e0afcd06cc4cb82R45? And maybe reference this or #4559 as an explanation of what's going on.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask-friendly nan check in xr.corr() and xr.cov() 882876804

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 721.989ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows