home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 508853564

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3068#issuecomment-508853564 https://api.github.com/repos/pydata/xarray/issues/3068 508853564 MDEyOklzc3VlQ29tbWVudDUwODg1MzU2NA== 1217238 2019-07-05T20:15:27Z 2019-07-05T20:15:27Z MEMBER

For the long term, I also understand that there isn't really a good way to check equality of two dask arrays. I wonder if dask's graph optimization could be used to "simplify" two dask arrays' graph separately and check the graph equality. For example, two dask arrays created by doing da.zeros((10, 10), chunks=2) + 5 should be theoretically equal because their dask graphs are made up of the same tasks.

Dask actually already does this canonicalization. If two arrays have the same name, they use the same dask graph, e.g., ``` In [5]: x = da.zeros((10, 10), chunks=2) + 5

In [6]: y = da.zeros((10, 10), chunks=2) + 5

In [7]: x.name Out[7]: 'add-f7441a0f46f5cf40458391cd08406c23'

In [8]: y.name Out[8]: 'add-f7441a0f46f5cf40458391cd08406c23' ```

So xarray could safely look at .name on dask arrays (e.g., inside Variable.equals or duck_array_ops.array_equiv) for determining that two dask arrays are the same, rather than merely using is to check if they are the same objects.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  462859457
Powered by Datasette · Queries took 238.516ms · About: xarray-datasette