issue_comments: 255286001
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/pull/1024#issuecomment-255286001 | https://api.github.com/repos/pydata/xarray/issues/1024 | 255286001 | MDEyOklzc3VlQ29tbWVudDI1NTI4NjAwMQ== | 1217238 | 2016-10-21T03:36:01Z | 2016-10-21T03:36:01Z | MEMBER |
I'm nervous about eager loading, especially for non-index coordinates. They can have more than one dimension, and thus can contain a lot of data. So potentially eagerly loading non-index coordinates could break existing use cases. On the other hand, non-index coordinates indeed checked for equality in most xarray operations (e.g., for the coordinate merge in align). So it is indeed useful not to have to recompute them all the time. Even eagerly loading indexes is potentially problematic, if loading the index values is expensive. So I'm conflicted:
- I like the current caching behavior for I'm going to start throwing out ideas for how to deal with this: Option AAdd two new (public?) methods, something like Hypothetically, we could even have options for turning this caching systematically on/off (e.g., Your proposal is basically an extreme version of this, where we call Advantages:
- It's fairly predictable when caching happens (especially if we opt for calling Downsides: - Caching is more aggressive than necessary -- we cache indexes even if that coord isn't actually indexed. Option BLike Option A, but someone infer the full set of variables that need to be cached (e.g., in a This solves the downside of A, but diminishes the predictability. We're basically back to how things work now. Option CCache dask.array in Advantages: - Much simpler and easier to implement than the alternatives. - Implicit conversions are greatly diminished. Downsides:
- Non-index coordinates get thrown away after being evaluated once. If you're doing lots of operations of the form Option DLoad the contents of an This has the most predictable performance, but might cause trouble for some edge use cases? I need to think about this a little more, but right now I am leaning towards Option C or D. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
180451196 |