home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 249011817

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/585#issuecomment-249011817 https://api.github.com/repos/pydata/xarray/issues/585 249011817 MDEyOklzc3VlQ29tbWVudDI0OTAxMTgxNw== 1217238 2016-09-22T20:00:57Z 2016-09-22T20:00:57Z MEMBER

I think #964 provides a viable path forward here.

Previously, I was imagining the user provides an function that maps xarray.DataArray -> xarray.DataArray. Such functions are tricky to parallelize with dask.array because need to run them to figure out the result dimensions/coordinates.

In contrast, with a user defined function ndarray -> ndarray, it's fairly straightforward to parallelize these with dask array (e.g., using dask.array.elemwise or dask.array.map_blocks). Then we could add the metadata back in afterwards with #964.

In principle, we could do this automatically -- especially if dask had a way to parallelize arbitrary NumPy generalized universal functions. Then the user could write something like xarray.apply(func, data, signature=signature, dask_array='auto') to automatically parallelize func over their data. In fact, I had this in some previous commits for #964, but took it out for now, just to reduce scope for the change.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  107424151
Powered by Datasette · Queries took 0.948ms · About: xarray-datasette