home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 520815068 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • jthielen 2
  • SimonHeybrock 1

author_association 2

  • CONTRIBUTOR 2
  • NONE 1

issue 1

  • NEP 18, physical units, uncertainties, and the scipp library? · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
552756428 https://github.com/pydata/xarray/issues/3509#issuecomment-552756428 https://api.github.com/repos/pydata/xarray/issues/3509 MDEyOklzc3VlQ29tbWVudDU1Mjc1NjQyOA== SimonHeybrock 12912489 2019-11-12T06:37:20Z 2022-09-09T13:08:45Z NONE

@jthielen Thanks for your reply! I am not familiar with pint and uncertainties so I cannot go in much detail there, so this is just generally speaking:

Units

I do not see any advantage using scipp. The current unit system in scipp is based on boost::units, which is very powerful (supporting custom units, heterogeneous systems, ...), but unfortunately it is a compile-time library (EDIT 2022: This does not apply any more since we have long switched to a runtime units library). I would imagine we would need to wrap another library to become more flexible (we could even consider wrapping something like pint's unit implementation).

Uncertainties

There are two routes to take here:

1. Store a single array of value/variance pairs

  • Propagation of uncertainties is "fast by default".
  • Probably harder to vectorize (SIMD) since data layout implies interleaved values. In practice this is unlikely to be relevant, since many workloads are just limited by memory bandwidth and cache sizes, so vectorization is not crucial in my experience.

2. Store two arrays (values array and uncertainties array)

  • This is what scipp does.
  • Special care must be taken when implementing propagation of uncertainties: Naive implementation based on operating with arrays will lead to massive performance loss (I have seen 10x or more) for things like multiplication (there is no penalty for addition and subtraction).
  • In practice this is not hard to do, we simply need to avoid computing the result's values and variances in two steps and put everything into a single loop. This avoids allocation of temporaries and loading / storing from memory multiple times.
  • Scipp does this, and does not sacrifice any performance.
  • Save 2x in performance when operating only with values, even if variances are present.
  • Can add/remove variances independently, e.g., if no longer needed, avoiding copies.
  • Can use existing numpy code to operate directly with values and variances (could probably be done in case 1., with a stride, loosing some efficiency).

Other aspects

Scipp supports a generic transform-type operation that can apply an arbitrary lambda to variables (units + values array + variances array). - This is done at compile-time and therefore static. It does however allow for very quick addition of new compound operations that propagate units and uncertainties. - For example, we could generate an operation sqrt(a*a + b*b): - automatically written using a single loop => fast - gives the correct output units - propagates uncertainties - does all the broadcasting and transposing - Not using expression templates, in case anyone asks.

Other

  • scipp.Variable includes the dimension labels and operations can do broadcasting and transposition, yielding good performance. I am not sure if this an advantage or a drawback in this case? Would need to look more into the inner workings of xarray and the __array_function__ protocol.

  • Scipp is written in C++ with performance in mind. That being said, it is not terribly difficult to achieve good performance in these cases since many workloads are bound by memory bandwidth (and probably dozens of other libraries have done so).

Questions

  • What is pint's approach to uncertainties?
  • Have you looked at the performance? Is performance relevant for you in these cases?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NEP 18, physical units, uncertainties, and the scipp library? 520815068
573102940 https://github.com/pydata/xarray/issues/3509#issuecomment-573102940 https://api.github.com/repos/pydata/xarray/issues/3509 MDEyOklzc3VlQ29tbWVudDU3MzEwMjk0MA== jthielen 3460034 2020-01-10T16:21:08Z 2020-01-10T16:22:20Z CONTRIBUTOR

@SimonHeybrock So sorry I neglected to reply back in November! https://github.com/hgrecco/pint/issues/982 and https://github.com/hgrecco/pint/issues/918 pinged my recollection of this issue. In short, Pint actually doesn't currently support uncertainties with arrays, only scalars, so my earlier wrapping comment was mistaken. So, moving forward with __array_function__ in Scipp in order to integrate it as a "duck array type" within xarray might be a good way to get physical units and uncertainties working together in xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NEP 18, physical units, uncertainties, and the scipp library? 520815068
552594045 https://github.com/pydata/xarray/issues/3509#issuecomment-552594045 https://api.github.com/repos/pydata/xarray/issues/3509 MDEyOklzc3VlQ29tbWVudDU1MjU5NDA0NQ== jthielen 3460034 2019-11-11T20:06:49Z 2019-11-11T20:06:49Z CONTRIBUTOR

With regards to physical units (and to a lesser extent propagation of uncertainties), this would have overlap with pint. Efforts have been ongoing towards integration with xarray through NEP-18 (xref https://github.com/pydata/xarray/issues/525, https://github.com/hgrecco/pint/pull/764, https://github.com/hgrecco/pint/issues/845, https://github.com/hgrecco/pint/issues/849, as well as https://github.com/pydata/xarray/pull/3238 and following test implementation PRs), but are still not quite there yet...hopefully very soon though!

Would you be able to describe any advantages/disadvantages you would see with xarray wrapping scipp, versus something like xarray > pint > uncertainties > numpy?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NEP 18, physical units, uncertainties, and the scipp library? 520815068

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3942.624ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows