home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 479942077 and user = 1634164 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • khaeru · 3 ✖

issue 1

  • How should xarray use/support sparse arrays? · 3 ✖

author_association 1

  • NONE 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1534695467 https://github.com/pydata/xarray/issues/3213#issuecomment-1534695467 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X85beZgr khaeru 1634164 2023-05-04T12:31:22Z 2023-05-04T12:31:22Z NONE

That's a totally valid scope limitation for the sparse package, and I understand the motivation.

I'm just saying that the principle of least astonishment is not being followed: the user cannot at the moment read either the xarray or sparse docs and know which portions of the xarray API will work when giving …, sparse=True, and which instead require a deliberate choice to densify, or see examples of how best to mix the two. It would be helpful to clarify—that's all.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
1534231523 https://github.com/pydata/xarray/issues/3213#issuecomment-1534231523 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X85bcoPj khaeru 1634164 2023-05-04T07:40:26Z 2023-05-04T07:40:26Z NONE

@jbbutler please also see this comment et seq. https://github.com/pydata/sparse/issues/1#issuecomment-792342987 and related pydata/sparse#438.

To add to @rabernat's point about sparse support being "not well documented", I suspect (but don't know, as I'm just a user of xarray, not a developer) that it's also not thoroughly tested. I expected to be able to use e.g. DataArray.cumprod when the underlying data was sparse, but could not.

IMHO, I/O to/from sparse-backed objects is less valuable if only a small subset of xarray functionality is available on those objects. Perhaps explicitly testing/confirming which parts of the API do/do not currently work with sparse would support the improvements to the docs that Ryan mentioned, and reveal the work remaining to provide full(er) support.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
520741706 https://github.com/pydata/xarray/issues/3213#issuecomment-520741706 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyMDc0MTcwNg== khaeru 1634164 2019-08-13T08:31:30Z 2019-08-13T08:31:30Z NONE

This is very exciting! In energy-economic research (unlike, e.g., earth systems research), data are almost always sparse, so first-class sparse support will be broadly useful.

I'm leaving a comment here (since this seems to be a meta-issue; please link from wherever else, if needed) with two example use-cases. For the moment, #3206 seems to cover them, so I can't name any specific additional features.

  1. MESSAGEix is an energy systems optimization model framework, formulated as a linear program.
  2. Some variables have many dimensions, for instance, the input coefficient for a technology has the dimensions (node_loc, technology, year_vintage, year_active, mode, node_origin, commodity, level, time, time_origin).
    • In the global version of our model, the technology dimension has over 400 labels.
    • Often two or more dimensions are tied, eg technology='coal power plant' will only take input from (commodity='coal', level='primary energy'); all other combinations of (commodity, level) are empty for this technology.
    • So, this data is inherently sparse.
  3. For modeling research, specifying quantities in this way is a good design because (a) it is intuitive to researchers in this domain, and (b) the optimization model is solved using various LP solvers via GAMS, which automatically prune zero rows in the resulting matrices.

    • When we were developing a dask/DAG-based system for model results post-processing, we wanted to use xarray, but had some quantities with tens of millions of elements that were less than 1% full. Here is some test code that triggered MemoryErrors using xarray. We chose to fall back on using a pd.Series subclass that mocks xarray methods.
  4. In transportation research, stock models of vehicle fleets are often used.

    • These models always have at least two time dimensions: cohort (the time period in which a vehicle was sold) and period(s) in which it is used (and thus consumes fuel, etc.).
    • Since a vehicle sold in 2020 can't be used in 2015, these data are always triangular w.r.t. these two dimensions. (The dimensions year_vintage and year_active in example #1 above have the same relationship.)
    • Once multiplied by other dimensions (technology; fuel; size or shape or market segment; embodied materials; different variables; model runs across various scenarios or input assumptions) the overhead of dense arrays can become problematic.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.151ms · About: xarray-datasette