home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 241231815

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/978#issuecomment-241231815 https://api.github.com/repos/pydata/xarray/issues/978 241231815 MDEyOklzc3VlQ29tbWVudDI0MTIzMTgxNQ== 1217238 2016-08-21T00:31:11Z 2016-08-21T00:31:11Z MEMBER

Oops -- let's add a fix for this and a regression test in test_dask.py.

We should fix broadcast as you mention, but also fix the as_compatible_data function to try coercing data via the .data attribute before using .values: https://github.com/pydata/xarray/blob/584e70378c64e3fa861e5b4b4fd61d21639661c6/xarray/core/variable.py#L146

After that however there's a new issue: whenever broadcast adds a dimension to an array, it creates it in a single chunk, as opposed to copying the chunking of the other arrays. This can easily call a host to go out of memory, and makes it harder to work with the arrays afterwards because chunks won't match.

This is sort of but not completely right. We use dask.array.broadcast_to to expand dimensions for dask arrays, which under the hood uses numpy.broadcast_to for each chunk. Broadcasting uses a view to insert a new dimensions with stride 0, so it doesn't require any additional storage costs for the original array. But any arrays resulting from arithmetic will indeed require more space.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  172290413
Powered by Datasette · Queries took 0.494ms · About: xarray-datasette