home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 330701517

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/1517#issuecomment-330701517 https://api.github.com/repos/pydata/xarray/issues/1517 330701517 MDEyOklzc3VlQ29tbWVudDMzMDcwMTUxNw== 1217238 2017-09-19T23:25:08Z 2017-09-19T23:25:08Z MEMBER

I have a design question here: how should we handle cases where a core dimension exists in multiple chunks? For example, suppose you are applying a function that needs access to every point along the "time" axis at once (e.g., an auto-correlation function).

Should we: 1. Automatically rechunk along "time" into a single chunk, or 2. Raise an error, and require the user to rechunk manually (xref https://github.com/dask/dask/issues/2689 for API on this)

Currently we do behavior 1, but behavior 2 might be more user friendly. Otherwise it could be pretty easy to inadvertently pass in a dask array (e.g., in small chunks along time) that apply_ufunc would load into memory by putting in a single chunk.

dask.array has some heuristics to protect against this in rechunk() but I'm not sure they are effective enough to catch this. (@mrocklin?)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  252358450
Powered by Datasette · Queries took 1.084ms · About: xarray-datasette