home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 882876804 and user = 56925856 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • AndrewILWilliams · 3 ✖

issue 1

  • Dask-friendly nan check in xr.corr() and xr.cov() · 3 ✖

author_association 1

  • CONTRIBUTOR 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
848612330 https://github.com/pydata/xarray/pull/5284#issuecomment-848612330 https://api.github.com/repos/pydata/xarray/issues/5284 MDEyOklzc3VlQ29tbWVudDg0ODYxMjMzMA== AndrewILWilliams 56925856 2021-05-26T09:19:50Z 2021-05-26T09:19:50Z CONTRIBUTOR

Hey both, I've added a test to check that dask doesn't compute when calling either xr.corr() or xr.cov(), and also that the end result is still a dask array. Let me know if there's anything I've missed though! thanks for the help :)

@dcherian, regarding the apply_ufunc approach, I might leave that for now but as you said it can always be a future PR

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask-friendly nan check in xr.corr() and xr.cov() 882876804
838231568 https://github.com/pydata/xarray/pull/5284#issuecomment-838231568 https://api.github.com/repos/pydata/xarray/issues/5284 MDEyOklzc3VlQ29tbWVudDgzODIzMTU2OA== AndrewILWilliams 56925856 2021-05-11T10:28:08Z 2021-05-12T20:45:00Z CONTRIBUTOR

Thanks for that @dcherian ! I didn't know you could use print debugging on chunked operations like this!

One thing actually: If I change da = da.where(missing_vals) to da = da.where(~missing_vals) then we get the results we'd expect. Do you think this fixes the problem?

``` def _get_valid_values(da, other): da1, da2 = xr.align(da, other, join="outer", copy=False)

# 2. Ignore the nans
missing_vals = np.logical_or(da1.isnull(), da2.isnull())

if missing_vals.any():
    da = da.where(~missing_vals)
    return da
else:
    return da

```

print(da_a.map_blocks(_get_valid_values, args=[da_b]).compute()) <xarray.DataArray (space: 3, time: 4)> array([[1. , 2. , 3. , 4. ], [1. , 0.1, 0.2, 0.3], [2. , 3.2, nan, 1.8]]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) object 'IA' 'IL' 'IN' *

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask-friendly nan check in xr.corr() and xr.cov() 882876804
837032429 https://github.com/pydata/xarray/pull/5284#issuecomment-837032429 https://api.github.com/repos/pydata/xarray/issues/5284 MDEyOklzc3VlQ29tbWVudDgzNzAzMjQyOQ== AndrewILWilliams 56925856 2021-05-10T17:44:29Z 2021-05-10T17:44:29Z CONTRIBUTOR

Hi @dcherian , just thinking about your suggestion for using map_blocks on the actual valid_values check. I've tested this and was wondering if you could maybe point to where I'm going wrong? It does mask out some of the values in a lazy way, but not the correct ones.

```python3 da_a = xr.DataArray( np.array([[1, 2, 3, 4], [1, 0.1, 0.2, 0.3], [2, 3.2, 0.6, 1.8]]), dims=("space", "time"), coords=[ ("space", ["IA", "IL", "IN"]), ("time", pd.date_range("2000-01-01", freq="1D", periods=4)), ], ).chunk({'time':1})

da_b = xr.DataArray( np.array([[0.2, 0.4, 0.6, 2], [15, 10, 5, 1], [1, 3.2, np.nan, 1.8]]), dims=("space", "time"), coords=[ ("space", ["IA", "IL", "IN"]), ("time", pd.date_range("2000-01-01", freq="1D", periods=4)), ], ).chunk({'time':1})

print(da_a)

<xarray.DataArray (space: 3, time: 4)> array([[1. , 2. , 3. , 4. ], [1. , 0.1, 0.2, 0.3], [2. , 3.2, 0.6, 1.8]]) Coordinates: * space (space) <U2 'IA' 'IL' 'IN' * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04

print(da_b)

<xarray.DataArray (space: 3, time: 4)> array([[ 0.2, 0.4, 0.6, 2. ], [15. , 10. , 5. , 1. ], [ 1. , 3.2, nan, 1.8]]) Coordinates: * space (space) <U2 'IA' 'IL' 'IN' * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04

Define function to use in map_blocks

def _get_valid_values(da, other): da1, da2 = xr.align(da, other, join="inner", copy=False)

# 2. Ignore the nans
missing_vals = np.logical_or(da1.isnull(), da2.isnull())

if missing_vals.any():
    da = da.where(missing_vals)
    return da
else:
    return da

test

outp = da_a.map_blocks(_get_valid_values, args=[da_b])

print(outp.compute())

<xarray.DataArray (space: 3, time: 4)> array([[1. , 2. , nan, 4. ], [1. , 0.1, nan, 0.3], [2. , 3.2, 0.6, 1.8]]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) object 'IA' 'IL' 'IN' ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask-friendly nan check in xr.corr() and xr.cov() 882876804

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.328ms · About: xarray-datasette