issues: 188517316

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
188517316	MDU6SXNzdWUxODg1MTczMTY=	1103	add dask optimization tips to docs	1197350	closed	0			0	2016-11-10T14:08:39Z	2016-11-10T16:49:06Z	2016-11-10T16:49:06Z	MEMBER				We should add the optimization tips that @shoyer describes in this mailing list thread to @karenamckinnon. https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/xarray/11lDGSeza78/lR1uj9yWDAAJ Specific things to try (we should add similar guidelines to xarray's docs): Do your spatial and temporal indexing with .sel() earlier in the pipeline, specifically before you resample. Resample triggers some computation on all the blocks, which in theory should commute with indexing, but we haven't implemented this optimization in dask yet: https://github.com/dask/dask/issues/746 Save the temporal mean to disk as a netCDF file (and then load it again with open_dataset) before subtracting it. Again, in theory, dask should be able to do the computation in a streaming fashion, but in practice this is a fail case for the dask scheduler, because it tries to keep every chunk of an array that it computes in memory: https://github.com/dask/dask/issues/874 Specify smaller chunks across space when using open_mfdataset, e.g., chunks={'latitude': 10, 'longitude': 10}. This makes spatial subsetting easier, because there's no risk you will load chunks of data referring to different chunks (probably not necessary if you do my suggestion 1).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1103/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

0 rows from issues_id in issues_labels
0 rows from issue in issue_comments