home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 290106782

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1092#issuecomment-290106782 https://api.github.com/repos/pydata/xarray/issues/1092 290106782 MDEyOklzc3VlQ29tbWVudDI5MDEwNjc4Mg== 4992424 2017-03-29T14:26:15Z 2017-03-29T14:26:15Z NONE

Would the domain for this just be to simulate the tree-like structure that NetCDF permits, or could it extend to multiple datasets on disk? One of the ideas that we had during the aospy hackathon involved some sort of idiom based on xarray for packing multiple, similar datasets together. For instance, it's very common in climate science to re-run a model multiple times nearly identically, but changing a parameter or boundary condition. So you end up with large archives of data on disk which are identical in shape and metadata, and you want to be able to quickly analyze across them.

As an example, I built a helper tool during my dissertation to automate much of this, allowing you to dump your processed output in some sort of directory structure and consistent naming scheme, and then easily ingest what you need for a given analysis. It's actually working great for a much larger, Monte Carlo set of model simulations right now (3 factor levels with 3-5 values at each level, for a total of 1500 years of simulation). My tool works by concatenating each experimental factor as a new dimension, which lets you use xarray's selection tools to perform analyses across the ensemble. You can pre-process things before concatenating too, if the data ends up being too big to fit in memory (e.g. for every simulation in the experiment, compute time-zonal averages before concatenation).

Going back to @shoyer's comment, it still seems as though there is room to build some sort of collection of Datasets, in the same way that a Dataset is a collection of DataArrays. Maybe this is different than @lamorton's grouping example, but it would be really, really cool if you could use the same sort of syntactic sugar to select across multiple Datasets with like-dimensions just as you could slice into groups inside a Dataset as proposed here. It would certainly make things much more manageable than concatenating huge combinations of Datasets in memory!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  187859705
Powered by Datasette · Queries took 0.796ms · About: xarray-datasette