home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 837097308

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/5284#issuecomment-837097308 https://api.github.com/repos/pydata/xarray/issues/5284 837097308 MDEyOklzc3VlQ29tbWVudDgzNzA5NzMwOA== 2448579 2021-05-10T18:25:29Z 2021-05-10T18:25:29Z MEMBER

Well that was confusing! if missing_vals.any(): will not be triggered if all the values in a block are valid. With .chunk({"time": 1}), some of the blocks are all valid.

I discovered this with print debugging and the single-threaded scheduler which loops over blocks in a for-loop ```

Define function to use in map_blocks

def _get_valid_values(da, other): da1, da2 = xr.align(da, other, join="inner", copy=False)

# 2. Ignore the nans
missing_vals = np.logical_or(da1.isnull(), da2.isnull())
print(missing_vals)

if missing_vals.any():
    da = da.where(missing_vals)
return da

da_a.map_blocks(_get_valid_values, args=[da_b]).compute(scheduler="sync") ```

For better performance, we should try a dask.array.map_blocks approach with duck_array_ops.isnull

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  882876804
Powered by Datasette · Queries took 0.613ms · About: xarray-datasette