home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 520741706

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3213#issuecomment-520741706 https://api.github.com/repos/pydata/xarray/issues/3213 520741706 MDEyOklzc3VlQ29tbWVudDUyMDc0MTcwNg== 1634164 2019-08-13T08:31:30Z 2019-08-13T08:31:30Z NONE

This is very exciting! In energy-economic research (unlike, e.g., earth systems research), data are almost always sparse, so first-class sparse support will be broadly useful.

I'm leaving a comment here (since this seems to be a meta-issue; please link from wherever else, if needed) with two example use-cases. For the moment, #3206 seems to cover them, so I can't name any specific additional features.

  1. MESSAGEix is an energy systems optimization model framework, formulated as a linear program.
  2. Some variables have many dimensions, for instance, the input coefficient for a technology has the dimensions (node_loc, technology, year_vintage, year_active, mode, node_origin, commodity, level, time, time_origin).
    • In the global version of our model, the technology dimension has over 400 labels.
    • Often two or more dimensions are tied, eg technology='coal power plant' will only take input from (commodity='coal', level='primary energy'); all other combinations of (commodity, level) are empty for this technology.
    • So, this data is inherently sparse.
  3. For modeling research, specifying quantities in this way is a good design because (a) it is intuitive to researchers in this domain, and (b) the optimization model is solved using various LP solvers via GAMS, which automatically prune zero rows in the resulting matrices.

    • When we were developing a dask/DAG-based system for model results post-processing, we wanted to use xarray, but had some quantities with tens of millions of elements that were less than 1% full. Here is some test code that triggered MemoryErrors using xarray. We chose to fall back on using a pd.Series subclass that mocks xarray methods.
  4. In transportation research, stock models of vehicle fleets are often used.

    • These models always have at least two time dimensions: cohort (the time period in which a vehicle was sold) and period(s) in which it is used (and thus consumes fuel, etc.).
    • Since a vehicle sold in 2020 can't be used in 2015, these data are always triangular w.r.t. these two dimensions. (The dimensions year_vintage and year_active in example #1 above have the same relationship.)
    • Once multiplied by other dimensions (technology; fuel; size or shape or market segment; embodied materials; different variables; model runs across various scenarios or input assumptions) the overhead of dense arrays can become problematic.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  479942077
Powered by Datasette · Queries took 0.711ms · About: xarray-datasette