home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1216123818

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4285#issuecomment-1216123818 https://api.github.com/repos/pydata/xarray/issues/4285 1216123818 IC_kwDOAMm_X85IfJOq 12912489 2022-08-16T04:15:24Z 2022-08-16T04:15:24Z NONE

5. Neutron scattering data

Scipp is an xarray-like labelled data structure for neutron scattering experiment data. On their FAQ Q titled "Why is xarray not enough", one of the things they quote is

Support for event data, a particular form of sparse data. More concretely, this is essentially a 1-D (or N-D) array of random-length lists, with very small list entries. This type of data arises in time-resolved detection of neutrons in pixelated detectors.

Would a RaggedArray class that's wrappable in xarray help with this? (cc @SimonHeybrock)

Partially, but the bigger challenge may be the related algorithms, e.g., for getting data into this layout, and for switching to other ragged layouts.

For context, one of the main reasons for our data layout is the ability to make cuts/slices quickly. We frequently deal with 2-D, 3-D, and 4-D data. For example, a 3-D case may be be the momentum transfer $\vec Q$ in a scattering process, with a "record" for every detected neutron. Desired final resolution may exceed 1000 per dimension (of the 3 components of $\vec Q$). On top of this there may be additional dimensions relating to environment parameters of the sample under study, such as temperature, pressure, or strain. This would lead to bin-counts that cannot be handled easily (in single-node memory).

A naive solution could be to simply work with something like pandas.DataFrame, with columns for the components of $\vec Q$ as well as the sample environment parameters. Those could then be used for grouping/histogramming to the desired 2-D cuts or slices. However, as frequently many such slices or required this can quickly become inefficient (though there is certainly cases where it would work well, providing a simpler solution that scipp).

Scipp's ragged data can be considered a "partial sorting", to build a sort of "index". Based on all this we can then, e.g., quickly compute high-resolution cuts. Say we are in 3-D (Qx, Qy, Qz). We would not have bin sizes that match the final resolution required by the science. Instead we could use 50x50x50 bins. Then we can very quickly produce a high-res 2-D plot (say (1000x1000), Qx, Qz or whatever), since our binned data format reduces the data/memory you have to load and consider by a factor of up to 50 (in this example).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  667864088
Powered by Datasette · Queries took 79.446ms · About: xarray-datasette