home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "NONE", issue = 667864088 and user = 12912489 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • SimonHeybrock · 7 ✖

issue 1

  • Awkward array backend? · 7 ✖

author_association 1

  • NONE · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1288374461 https://github.com/pydata/xarray/issues/4285#issuecomment-1288374461 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85Mywi9 SimonHeybrock 12912489 2022-10-24T03:44:44Z 2022-11-03T17:04:15Z NONE

Also note the Ragged Array Summit on Scientific Python.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1283416324 https://github.com/pydata/xarray/issues/4285#issuecomment-1283416324 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85Mf2EE SimonHeybrock 12912489 2022-10-19T04:39:06Z 2022-10-19T04:39:06Z NONE

A possibly relevant distinction that had not occurred to me previously is the example by @milancurcic: If I understand this correctly then this type of data is essentially an array of variable-length time-series (essentially a list of lists?), i.e., there is an order within each inner list. This is conceptually different from the data I am typically dealing with, where each inner list is a list of records without specific ordering.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1216208075 https://github.com/pydata/xarray/issues/4285#issuecomment-1216208075 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85IfdzL SimonHeybrock 12912489 2022-08-16T06:38:32Z 2022-08-16T06:42:28Z NONE

@jpivarski

Support for event data, a particular form of sparse data.

I might have been misinterpreting the word "sparse data" in conversations about this. I had thought that "sparse data" is logically rectilinear but represented in memory with the zeros removed, so the internal machinery has to deal with irregular structures, but the outward API it presents is regular (dimensionality is completely described by a shape: tuple[int]).

You are right that "sparse" is misleading. Since it is indeed most commonly used for sparse matrix/array representations we are now usually avoiding this term (and refer to it as binned data, or ragged data instead). Obviously our title page needs an update 😬 .

logically rectilinear

This does actually apply to Scipp's binned data. A scipp.Variable may have shape=(N,M) and be "ragged". But the "ragged" dimension is in addition to the two regular dimensions. That is, in this case we have (conceptually) a 2-D array of lists.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1216107702 https://github.com/pydata/xarray/issues/4285#issuecomment-1216107702 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85IfFS2 SimonHeybrock 12912489 2022-08-16T03:43:29Z 2022-08-16T05:11:50Z NONE
  1. Generalise xarray to allow for variable-length dimensions

This seems hard. Xarray's whole model is built assuming that dims has type Mapping[Hashable, int]. It also breaks our normal concept of alignment, which we need to put coordinate variables in DataArrays alongside data variables.

Anecdotal evidence that this is indeed not a good solution:

scipp's "ragged data" implementation was originally implemented with such a variable-length dimension support. This led to a whole series of problems, including significantly complicating scipp.DataArray, both in terms of code and conceptually. After this experience we switched to the current model, which exposes only the regular, aligned dimensions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1216144957 https://github.com/pydata/xarray/issues/4285#issuecomment-1216144957 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85IfOY9 SimonHeybrock 12912489 2022-08-16T04:54:25Z 2022-08-16T04:54:25Z NONE

Is anyone here going to EuroScipy (two weeks from now) and interested in having a chat/discussion about ragged data?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1216125098 https://github.com/pydata/xarray/issues/4285#issuecomment-1216125098 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85IfJiq SimonHeybrock 12912489 2022-08-16T04:17:52Z 2022-08-16T04:17:52Z NONE

@danielballan mentioned that the photon community (synchrotrons/X-ray scattering) is starting to talk more and more about ragged data related to "event mode" data collection as well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1216123818 https://github.com/pydata/xarray/issues/4285#issuecomment-1216123818 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85IfJOq SimonHeybrock 12912489 2022-08-16T04:15:24Z 2022-08-16T04:15:24Z NONE

5. Neutron scattering data

Scipp is an xarray-like labelled data structure for neutron scattering experiment data. On their FAQ Q titled "Why is xarray not enough", one of the things they quote is

Support for event data, a particular form of sparse data. More concretely, this is essentially a 1-D (or N-D) array of random-length lists, with very small list entries. This type of data arises in time-resolved detection of neutrons in pixelated detectors.

Would a RaggedArray class that's wrappable in xarray help with this? (cc @SimonHeybrock)

Partially, but the bigger challenge may be the related algorithms, e.g., for getting data into this layout, and for switching to other ragged layouts.

For context, one of the main reasons for our data layout is the ability to make cuts/slices quickly. We frequently deal with 2-D, 3-D, and 4-D data. For example, a 3-D case may be be the momentum transfer $\vec Q$ in a scattering process, with a "record" for every detected neutron. Desired final resolution may exceed 1000 per dimension (of the 3 components of $\vec Q$). On top of this there may be additional dimensions relating to environment parameters of the sample under study, such as temperature, pressure, or strain. This would lead to bin-counts that cannot be handled easily (in single-node memory).

A naive solution could be to simply work with something like pandas.DataFrame, with columns for the components of $\vec Q$ as well as the sample environment parameters. Those could then be used for grouping/histogramming to the desired 2-D cuts or slices. However, as frequently many such slices or required this can quickly become inefficient (though there is certainly cases where it would work well, providing a simpler solution that scipp).

Scipp's ragged data can be considered a "partial sorting", to build a sort of "index". Based on all this we can then, e.g., quickly compute high-resolution cuts. Say we are in 3-D (Qx, Qy, Qz). We would not have bin sizes that match the final resolution required by the science. Instead we could use 50x50x50 bins. Then we can very quickly produce a high-res 2-D plot (say (1000x1000), Qx, Qz or whatever), since our binned data format reduces the data/memory you have to load and consider by a factor of up to 50 (in this example).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 478.549ms · About: xarray-datasette