issue_comments: 328724595

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/1279#issuecomment-328724595	https://api.github.com/repos/pydata/xarray/issues/1279	328724595	MDEyOklzc3VlQ29tbWVudDMyODcyNDU5NQ==	4992424	2017-09-12T03:29:29Z	2017-09-12T03:29:29Z	NONE	@shoyer - This output is usually provided as a sequence of daily netCDF files, each on a ~2 degree global grid with 24 timesteps per file (so shape 24 x 96 x 144). For convenience, I usually concatenate these files into yearly datasets, so they'll have a shape (8736 x 96 x 144). I haven't played too much with how to chunk the data, but it's not uncommon for me to load 20-50 of these files simultaneously (each holding a years worth of data) and treat each year as an "ensemble member dimension, so my data has shape (50 x 8736 x 96 x 144). Yes, keeping everything in dask array land is preferable, I suppose. @jhamman - Wow, that worked pretty much perfectly! There's a handful of typos (you switch from "a" to "x" halfway through), and there's a lot of room for optimization by chunksize. But it just works, which is absolutely ridiculous. I just pushed a ~200 GB dataset on my cluster with ~50 cores and it screamed through the calculation. Is there anyway this could be pushed before 0.10.0? It's a killer enhancement.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		208903781