html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/3509#issuecomment-552756428,https://api.github.com/repos/pydata/xarray/issues/3509,552756428,MDEyOklzc3VlQ29tbWVudDU1Mjc1NjQyOA==,12912489,2019-11-12T06:37:20Z,2022-09-09T13:08:45Z,NONE,"@jthielen Thanks for your reply! I am not familiar with `pint` and `uncertainties` so I cannot go in much detail there, so this is just generally speaking: ### Units I do not see any advantage using scipp. The current unit system in scipp is based on `boost::units`, which is very powerful (supporting custom units, heterogeneous systems, ...), but unfortunately it is a compile-time library (EDIT 2022: This does not apply any more since we have long switched to a runtime units library). I would imagine we would need to wrap another library to become more flexible (we could even consider wrapping something like pint's unit implementation). ### Uncertainties There are two routes to take here: #### 1. Store a single array of value/variance pairs - Propagation of uncertainties is ""fast by default"". - Probably harder to vectorize (SIMD) since data layout implies interleaved values. In practice this is unlikely to be relevant, since many workloads are just limited by memory bandwidth and cache sizes, so vectorization is not crucial in my experience. #### 2. Store two arrays (values array and uncertainties array) - This is what `scipp` does. - *Special care must be taken when implementing propagation of uncertainties*: Naive implementation based on operating with arrays will lead to **massive performance loss** (I have seen 10x or more) for things like multiplication (there is no penalty for addition and subtraction). - In practice this is not hard to do, we simply need to avoid computing the result's values and variances in two steps and put everything into a single loop. This avoids allocation of temporaries and loading / storing from memory multiple times. - Scipp does this, and does not sacrifice any performance. - Save 2x in performance when operating only with values, even if variances are present. - Can add/remove variances independently, e.g., if no longer needed, avoiding copies. - Can use existing `numpy` code to operate directly with values and variances (could probably be done in case 1., with a stride, loosing some efficiency). #### Other aspects Scipp supports a generic `transform`-type operation that can apply an arbitrary lambda to variables (units + values array + variances array). - This is done at compile-time and therefore static. It does however allow for very quick addition of new compound operations that propagate units and uncertainties. - For example, we could generate an operation `sqrt(a*a + b*b)`: - automatically written using a *single loop* => fast - gives the correct output units - propagates uncertainties - does all the broadcasting and transposing - *Not* using expression templates, in case anyone asks. ### Other - `scipp.Variable` includes the dimension labels and operations can do broadcasting and transposition, yielding good performance. I am not sure if this an advantage or a drawback in this case? Would need to look more into the inner workings of xarray and the `__array_function__` protocol. - Scipp is written in C++ with performance in mind. That being said, it is not terribly difficult to achieve good performance in these cases since many workloads are bound by memory bandwidth (and probably dozens of other libraries have done so). ### Questions - What is pint's approach to uncertainties? - Have you looked at the performance? Is performance relevant for you in these cases?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,520815068