issue_comments: 306688091

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/1440#issuecomment-306688091	https://api.github.com/repos/pydata/xarray/issues/1440	306688091	MDEyOklzc3VlQ29tbWVudDMwNjY4ODA5MQ==	12229877	2017-06-07T05:09:06Z	2017-06-07T05:09:06Z	CONTRIBUTOR	I'd certainly support a warning when dask chunks do not align with the on-disk chunks. This sounds like a very good idea to me 👍 I think its unavoidable that users understand how their data will be processed (e.g., whether operations will be mapped over time or space). But maybe some sort of heuristics (if not a fully automated solution) are possible. I think that depends on the size of the data - a very common workflow in our group is to open some national-scale collection, select a small (MB to low GB) section, and proceed with that. At this scale we only use chunks because many of the input files are larger than memory, and shape is basically irrelevant - chunks avoid loading anything until after selecting the subset (I think this is related to #1396). It's certainly good to know the main processing dimensions though, and user-guided chunk selection heuristics could take us a long way - I actually think a dimension hint and good heuristics are likely to perform better than most users (who are not experts and have not profiled their performance). The set notation is also very elegant, but I wonder about the interpretation. With `chunks=`, I specify how to break up the data - and any omitted dimensions are not chunked. For the hint, I'd expect to express which dimension(s) to keep - ie `{'lat', lon'}` should indicate that my analysis is mostly spatial, rather than mostly not. Maybe we could use a string (eg `time` for timeseries or `lat lon` for spatial) instead of a set to specify large chunk dimensions?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		233350060