home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 520815068 and user = 12912489 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • SimonHeybrock · 1 ✖

issue 1

  • NEP 18, physical units, uncertainties, and the scipp library? · 1 ✖

author_association 1

  • NONE 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
552756428 https://github.com/pydata/xarray/issues/3509#issuecomment-552756428 https://api.github.com/repos/pydata/xarray/issues/3509 MDEyOklzc3VlQ29tbWVudDU1Mjc1NjQyOA== SimonHeybrock 12912489 2019-11-12T06:37:20Z 2022-09-09T13:08:45Z NONE

@jthielen Thanks for your reply! I am not familiar with pint and uncertainties so I cannot go in much detail there, so this is just generally speaking:

Units

I do not see any advantage using scipp. The current unit system in scipp is based on boost::units, which is very powerful (supporting custom units, heterogeneous systems, ...), but unfortunately it is a compile-time library (EDIT 2022: This does not apply any more since we have long switched to a runtime units library). I would imagine we would need to wrap another library to become more flexible (we could even consider wrapping something like pint's unit implementation).

Uncertainties

There are two routes to take here:

1. Store a single array of value/variance pairs

  • Propagation of uncertainties is "fast by default".
  • Probably harder to vectorize (SIMD) since data layout implies interleaved values. In practice this is unlikely to be relevant, since many workloads are just limited by memory bandwidth and cache sizes, so vectorization is not crucial in my experience.

2. Store two arrays (values array and uncertainties array)

  • This is what scipp does.
  • Special care must be taken when implementing propagation of uncertainties: Naive implementation based on operating with arrays will lead to massive performance loss (I have seen 10x or more) for things like multiplication (there is no penalty for addition and subtraction).
  • In practice this is not hard to do, we simply need to avoid computing the result's values and variances in two steps and put everything into a single loop. This avoids allocation of temporaries and loading / storing from memory multiple times.
  • Scipp does this, and does not sacrifice any performance.
  • Save 2x in performance when operating only with values, even if variances are present.
  • Can add/remove variances independently, e.g., if no longer needed, avoiding copies.
  • Can use existing numpy code to operate directly with values and variances (could probably be done in case 1., with a stride, loosing some efficiency).

Other aspects

Scipp supports a generic transform-type operation that can apply an arbitrary lambda to variables (units + values array + variances array). - This is done at compile-time and therefore static. It does however allow for very quick addition of new compound operations that propagate units and uncertainties. - For example, we could generate an operation sqrt(a*a + b*b): - automatically written using a single loop => fast - gives the correct output units - propagates uncertainties - does all the broadcasting and transposing - Not using expression templates, in case anyone asks.

Other

  • scipp.Variable includes the dimension labels and operations can do broadcasting and transposition, yielding good performance. I am not sure if this an advantage or a drawback in this case? Would need to look more into the inner workings of xarray and the __array_function__ protocol.

  • Scipp is written in C++ with performance in mind. That being said, it is not terribly difficult to achieve good performance in these cases since many workloads are bound by memory bandwidth (and probably dozens of other libraries have done so).

Questions

  • What is pint's approach to uncertainties?
  • Have you looked at the performance? Is performance relevant for you in these cases?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NEP 18, physical units, uncertainties, and the scipp library? 520815068

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.727ms · About: xarray-datasette