html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3213#issuecomment-1534695467,https://api.github.com/repos/pydata/xarray/issues/3213,1534695467,IC_kwDOAMm_X85beZgr,1634164,2023-05-04T12:31:22Z,2023-05-04T12:31:22Z,NONE,"That's a totally valid scope limitation for the sparse package, and I understand the motivation.

I'm just saying that the [principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment) is not being followed: the user cannot at the moment read either the xarray or sparse docs and know which portions of the xarray API will work when giving `…, sparse=True`, and which instead require a deliberate choice to densify, or see examples of how best to mix the two. It would be helpful to clarify—that's all.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,479942077
https://github.com/pydata/xarray/issues/3213#issuecomment-1534231523,https://api.github.com/repos/pydata/xarray/issues/3213,1534231523,IC_kwDOAMm_X85bcoPj,1634164,2023-05-04T07:40:26Z,2023-05-04T07:40:26Z,NONE,"@jbbutler please also see this comment et seq. https://github.com/pydata/sparse/issues/1#issuecomment-792342987 and related pydata/sparse#438.

To add to @rabernat's point about sparse support being ""not well documented"", I suspect (but don't know, as I'm just a user of xarray, not a developer) that it's also not thoroughly *tested*. I expected to be able to use e.g. `DataArray.cumprod` when the underlying data was sparse, but could not.

IMHO, I/O to/from sparse-backed objects is less valuable if only a small subset of xarray functionality is available on those objects. Perhaps explicitly testing/confirming which parts of the API do/do not currently work with sparse would support the improvements to the docs that Ryan mentioned, and reveal the work remaining to provide full(er) support.

","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,479942077
https://github.com/pydata/xarray/issues/3213#issuecomment-520741706,https://api.github.com/repos/pydata/xarray/issues/3213,520741706,MDEyOklzc3VlQ29tbWVudDUyMDc0MTcwNg==,1634164,2019-08-13T08:31:30Z,2019-08-13T08:31:30Z,NONE,"This is very exciting! In energy-economic research (unlike, e.g., earth systems research), data are almost *always* sparse, so first-class sparse support will be broadly useful.

I'm leaving a comment here (since this seems to be a meta-issue; please link from wherever else, if needed) with two example use-cases. For the moment, #3206 seems to cover them, so I can't name any specific additional features.

1. [MESSAGEix](http://message.iiasa.ac.at/en/stable/) is an energy systems optimization model framework, formulated as a linear program.
   - Some variables have many dimensions, for instance, the input coefficient for a technology has the dimensions `(node_loc, technology, year_vintage, year_active, mode, node_origin, commodity, level, time, time_origin)`.
     - In the global version of our model, the `technology` dimension has over 400 labels.
     - Often two or more dimensions are tied, eg `technology='coal power plant'` will only take input from `(commodity='coal', level='primary energy')`; all other combinations of `(commodity, level)` are empty for this `technology`.
     - So, this data is inherently sparse.
   - For modeling research, specifying quantities in this way is a good design because (a) it is intuitive to researchers in this domain, and (b) the optimization model is solved using various LP solvers via GAMS, which automatically prune zero rows in the resulting matrices.
    - When we were developing a dask/DAG-based [system for model results post-processing](http://message.iiasa.ac.at/en/stable/reporting.html), we wanted to use xarray, but had some quantities with tens of millions of elements that were less than 1% full. [Here is some test code](https://github.com/iiasa/ixmp/blob/82ae6c92a2076a25d54d64a04c20aff653f4309b/tests/test_reporting.py#L430-L489) that triggered MemoryErrors using xarray. We chose to fall back on using a pd.Series subclass that mocks xarray methods.

2. In transportation research, stock models of vehicle fleets are often used.
    - These models always have at least two time dimensions: `cohort` (the time period in which a vehicle was sold) and `period`(s) in which it is used (and thus consumes fuel, etc.).
    - Since a vehicle sold in 2020 can't be used in 2015, these data are always triangular w.r.t. these two dimensions. (The dimensions `year_vintage` and `year_active` in example #1 above have the same relationship.)
    - Once multiplied by other dimensions (technology; fuel; size or shape or market segment; embodied materials; different variables; model runs across various scenarios or input assumptions) the overhead of dense arrays can become problematic.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,479942077