issue_comments: 290106782

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/1092#issuecomment-290106782	https://api.github.com/repos/pydata/xarray/issues/1092	290106782	MDEyOklzc3VlQ29tbWVudDI5MDEwNjc4Mg==	4992424	2017-03-29T14:26:15Z	2017-03-29T14:26:15Z	NONE	Would the domain for this just be to simulate the tree-like structure that NetCDF permits, or could it extend to multiple datasets on disk? One of the ideas that we had during the aospy hackathon involved some sort of idiom based on xarray for packing multiple, similar datasets together. For instance, it's very common in climate science to re-run a model multiple times nearly identically, but changing a parameter or boundary condition. So you end up with large archives of data on disk which are identical in shape and metadata, and you want to be able to quickly analyze across them. As an example, I built a helper tool during my dissertation to automate much of this, allowing you to dump your processed output in some sort of directory structure and consistent naming scheme, and then easily ingest what you need for a given analysis. It's actually working great for a much larger, Monte Carlo set of model simulations right now (3 factor levels with 3-5 values at each level, for a total of 1500 years of simulation). My tool works by concatenating each experimental factor as a new dimension, which lets you use xarray's selection tools to perform analyses across the ensemble. You can pre-process things before concatenating too, if the data ends up being too big to fit in memory (e.g. for every simulation in the experiment, compute time-zonal averages before concatenation). Going back to @shoyer's comment, it still seems as though there is room to build some sort of collection of `Dataset`s, in the same way that a `Dataset` is a collection of `DataArray`s. Maybe this is different than @lamorton's grouping example, but it would be really, really cool if you could use the same sort of syntactic sugar to select across multiple `Dataset`s with like-dimensions just as you could slice into groups inside a `Dataset` as proposed here. It would certainly make things much more manageable than concatenating huge combinations of `Dataset`s in memory!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		187859705