html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1092#issuecomment-290133148,https://api.github.com/repos/pydata/xarray/issues/1092,290133148,MDEyOklzc3VlQ29tbWVudDI5MDEzMzE0OA==,4992424,2017-03-29T15:47:57Z,2017-03-29T15:48:17Z,NONE,"Ah, thanks for the heads-up @benbovy! I see the difference now, and I agree
both approaches could co-exist. I may play around with building some of
your proposed `DatasetNode` functionality into my `Experiment` tool.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187859705
https://github.com/pydata/xarray/issues/1092#issuecomment-290106782,https://api.github.com/repos/pydata/xarray/issues/1092,290106782,MDEyOklzc3VlQ29tbWVudDI5MDEwNjc4Mg==,4992424,2017-03-29T14:26:15Z,2017-03-29T14:26:15Z,NONE,"Would the domain for this just be to simulate the tree-like structure that NetCDF permits, or could it extend to multiple datasets on disk? One of the ideas that we had [during the aospy hackathon](https://aospy.hackpad.com/Data-StorageDiscovery-Design-Document-fM6LgfwrJ2K) involved some sort of idiom based on xarray for packing multiple, similar datasets together. For instance, it's very common in climate science to re-run a model multiple times nearly identically, but changing a parameter or boundary condition. So you end up with large archives of data on disk which are identical in shape and metadata, and you want to be able to quickly analyze across them.

As an example, I built [a helper tool](https://github.com/darothen/experiment/blob/master/experiment/experiment.py) during my dissertation to automate much of this, allowing you to dump your processed output in some sort of directory structure and consistent naming scheme, and then easily ingest what you need for a given analysis. It's actually working great for a much larger, Monte Carlo set of model simulations right now (3 factor levels with 3-5 values at each level, for a total of 1500 years of simulation). My tool works by concatenating each experimental factor as a new dimension, which lets you use xarray's selection tools to perform analyses across the ensemble. You can pre-process things before concatenating too, if the data ends up being too big to fit in memory (e.g. for every simulation in the experiment, compute time-zonal averages before concatenation). 

Going back to @shoyer's [comment](https://github.com/pydata/xarray/issues/1092#issuecomment-259206339), it still seems as though there is room to build some sort of collection of `Dataset`s, in the same way that a `Dataset` is a collection of `DataArray`s. Maybe this is different than @lamorton's grouping example, but it would be really, really cool if you could use the same sort of syntactic sugar to select across multiple `Dataset`s with like-dimensions just as you could slice into groups inside a `Dataset` as proposed here. It would certainly make things much more manageable than concatenating huge combinations of `Dataset`s in memory!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187859705