home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 306688091

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1440#issuecomment-306688091 https://api.github.com/repos/pydata/xarray/issues/1440 306688091 MDEyOklzc3VlQ29tbWVudDMwNjY4ODA5MQ== 12229877 2017-06-07T05:09:06Z 2017-06-07T05:09:06Z CONTRIBUTOR

I'd certainly support a warning when dask chunks do not align with the on-disk chunks.

This sounds like a very good idea to me 👍

I think its unavoidable that users understand how their data will be processed (e.g., whether operations will be mapped over time or space). But maybe some sort of heuristics (if not a fully automated solution) are possible.

I think that depends on the size of the data - a very common workflow in our group is to open some national-scale collection, select a small (MB to low GB) section, and proceed with that. At this scale we only use chunks because many of the input files are larger than memory, and shape is basically irrelevant - chunks avoid loading anything until after selecting the subset (I think this is related to #1396).

It's certainly good to know the main processing dimensions though, and user-guided chunk selection heuristics could take us a long way - I actually think a dimension hint and good heuristics are likely to perform better than most users (who are not experts and have not profiled their performance).

The set notation is also very elegant, but I wonder about the interpretation. With chunks=, I specify how to break up the data - and any omitted dimensions are not chunked. For the hint, I'd expect to express which dimension(s) to keep - ie {'lat', lon'} should indicate that my analysis is mostly spatial, rather than mostly not. Maybe we could use a string (eg time for timeseries or lat lon for spatial) instead of a set to specify large chunk dimensions?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  233350060
Powered by Datasette · Queries took 0.851ms · About: xarray-datasette