html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/4285#issuecomment-1288374461,https://api.github.com/repos/pydata/xarray/issues/4285,1288374461,IC_kwDOAMm_X85Mywi9,12912489,2022-10-24T03:44:44Z,2022-11-03T17:04:15Z,NONE,Also note the [Ragged Array Summit]( https://discuss.scientific-python.org/t/ragged-array-summit/465/2) on Scientific Python.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1283416324,https://api.github.com/repos/pydata/xarray/issues/4285,1283416324,IC_kwDOAMm_X85Mf2EE,12912489,2022-10-19T04:39:06Z,2022-10-19T04:39:06Z,NONE,"A possibly relevant distinction that had not occurred to me previously is the example by @milancurcic: If I understand this correctly then this type of data is essentially an array of variable-length time-series (essentially a list of lists?), i.e., there is an *order* within each inner list. This is conceptually different from the data I am typically dealing with, where each inner list is a list of records without specific ordering.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1216208075,https://api.github.com/repos/pydata/xarray/issues/4285,1216208075,IC_kwDOAMm_X85IfdzL,12912489,2022-08-16T06:38:32Z,2022-08-16T06:42:28Z,NONE,"@jpivarski
> > > Support for event data, a particular form of sparse data.
>
> I might have been misinterpreting the word ""sparse data"" in conversations about this. I had thought that ""sparse data"" is logically rectilinear but represented in memory with the zeros removed, so the internal machinery has to deal with irregular structures, but the outward API it presents is regular (dimensionality is completely described by a `shape: tuple[int]`).
You are right that ""sparse"" is misleading. Since it is indeed most commonly used for sparse matrix/array representations we are now usually avoiding this term (and refer to it as binned data, or ragged data instead). Obviously our title page needs an update 😬 .
> logically rectilinear
This *does* actually apply to Scipp's binned data. A `scipp.Variable` may have `shape=(N,M)` and be ""ragged"". But the ""ragged"" dimension is in addition to the two regular dimensions. That is, in this case we have (conceptually) a 2-D array of lists.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1216107702,https://api.github.com/repos/pydata/xarray/issues/4285,1216107702,IC_kwDOAMm_X85IfFS2,12912489,2022-08-16T03:43:29Z,2022-08-16T05:11:50Z,NONE,"> 1. **Generalise xarray to allow for variable-length dimensions**
>
>
> This seems hard. Xarray's whole model is built assuming that `dims` has type `Mapping[Hashable, int]`. It also breaks our normal concept of alignment, which we need to put coordinate variables in DataArrays alongside data variables.
Anecdotal evidence that this is indeed not a good solution:
[scipp's](https://scipp.github.io/) ""ragged data"" implementation was originally implemented with such a variable-length dimension support. This led to a whole series of problems, including significantly complicating `scipp.DataArray`, both in terms of code and conceptually. After this experience we switched to the [current model](https://scipp.github.io/user-guide/binned-data/binned-data.html), which exposes only the regular, aligned dimensions.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1216144957,https://api.github.com/repos/pydata/xarray/issues/4285,1216144957,IC_kwDOAMm_X85IfOY9,12912489,2022-08-16T04:54:25Z,2022-08-16T04:54:25Z,NONE,Is anyone here going to EuroScipy (two weeks from now) and interested in having a chat/discussion about ragged data?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1216125098,https://api.github.com/repos/pydata/xarray/issues/4285,1216125098,IC_kwDOAMm_X85IfJiq,12912489,2022-08-16T04:17:52Z,2022-08-16T04:17:52Z,NONE,"@danielballan mentioned that the photon community (synchrotrons/X-ray scattering) is starting to talk more and more about ragged data related to ""event mode"" data collection as well.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1216123818,https://api.github.com/repos/pydata/xarray/issues/4285,1216123818,IC_kwDOAMm_X85IfJOq,12912489,2022-08-16T04:15:24Z,2022-08-16T04:15:24Z,NONE,"> 5\. **Neutron scattering data**
>
>
> [Scipp](https://github.com/scipp/scipp) is an xarray-like labelled data structure for neutron scattering experiment data. On their FAQ Q titled [""Why is xarray not enough""](https://scipp.github.io/getting-started/faq.html#why-is-xarray-not-enough), one of the things they quote is
>
> > Support for event data, a particular form of sparse data. More concretely, this is essentially a 1-D (or N-D) array of random-length lists, with very small list entries. This type of data arises in time-resolved detection of neutrons in pixelated detectors.
>
> Would a `RaggedArray` class that's wrappable in xarray help with this? (cc @SimonHeybrock)
Partially, but the bigger challenge may be the related algorithms, e.g., for getting data *into* this layout, and for *switching* to other ragged layouts.
For context, one of the main reasons for our data layout is the ability to make cuts/slices quickly. We frequently deal with 2-D, 3-D, and 4-D data. For example, a 3-D case may be be the momentum transfer $\vec Q$ in a scattering process, with a ""record"" for every detected neutron. Desired final resolution may exceed 1000 per dimension (of the 3 components of $\vec Q$). On top of this there may be additional dimensions relating to environment parameters of the sample under study, such as temperature, pressure, or strain. This would lead to bin-counts that cannot be handled easily (in single-node memory).
A naive solution could be to simply work with something like `pandas.DataFrame`, with columns for the components of $\vec Q$ as well as the sample environment parameters. Those could then be used for grouping/histogramming to the desired 2-D cuts or slices. However, as frequently *many* such slices or required this can quickly become inefficient (though there is certainly cases where it would work well, providing a simpler solution that scipp).
Scipp's ragged data can be considered a ""partial sorting"", to build a sort of ""index"". Based on all this we can then, e.g., quickly compute high-resolution cuts. Say we are in 3-D (Qx, Qy, Qz). We would not have bin sizes that match the final resolution required by the science. Instead we could use 50x50x50 bins. Then we can very quickly produce a high-res 2-D plot (say (1000x1000), Qx, Qz or whatever), since our binned data format reduces the data/memory you have to load and consider by a factor of up to 50 (in this example).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,667864088