issue_comments
7 rows where issue = 667864088 and user = 12912489 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Awkward array backend? · 7 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1288374461 | https://github.com/pydata/xarray/issues/4285#issuecomment-1288374461 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85Mywi9 | SimonHeybrock 12912489 | 2022-10-24T03:44:44Z | 2022-11-03T17:04:15Z | NONE | Also note the Ragged Array Summit on Scientific Python. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1283416324 | https://github.com/pydata/xarray/issues/4285#issuecomment-1283416324 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85Mf2EE | SimonHeybrock 12912489 | 2022-10-19T04:39:06Z | 2022-10-19T04:39:06Z | NONE | A possibly relevant distinction that had not occurred to me previously is the example by @milancurcic: If I understand this correctly then this type of data is essentially an array of variable-length time-series (essentially a list of lists?), i.e., there is an order within each inner list. This is conceptually different from the data I am typically dealing with, where each inner list is a list of records without specific ordering. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1216208075 | https://github.com/pydata/xarray/issues/4285#issuecomment-1216208075 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85IfdzL | SimonHeybrock 12912489 | 2022-08-16T06:38:32Z | 2022-08-16T06:42:28Z | NONE | @jpivarski
You are right that "sparse" is misleading. Since it is indeed most commonly used for sparse matrix/array representations we are now usually avoiding this term (and refer to it as binned data, or ragged data instead). Obviously our title page needs an update 😬 .
This does actually apply to Scipp's binned data. A |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1216107702 | https://github.com/pydata/xarray/issues/4285#issuecomment-1216107702 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85IfFS2 | SimonHeybrock 12912489 | 2022-08-16T03:43:29Z | 2022-08-16T05:11:50Z | NONE |
Anecdotal evidence that this is indeed not a good solution: scipp's "ragged data" implementation was originally implemented with such a variable-length dimension support. This led to a whole series of problems, including significantly complicating |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1216144957 | https://github.com/pydata/xarray/issues/4285#issuecomment-1216144957 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85IfOY9 | SimonHeybrock 12912489 | 2022-08-16T04:54:25Z | 2022-08-16T04:54:25Z | NONE | Is anyone here going to EuroScipy (two weeks from now) and interested in having a chat/discussion about ragged data? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1216125098 | https://github.com/pydata/xarray/issues/4285#issuecomment-1216125098 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85IfJiq | SimonHeybrock 12912489 | 2022-08-16T04:17:52Z | 2022-08-16T04:17:52Z | NONE | @danielballan mentioned that the photon community (synchrotrons/X-ray scattering) is starting to talk more and more about ragged data related to "event mode" data collection as well. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1216123818 | https://github.com/pydata/xarray/issues/4285#issuecomment-1216123818 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85IfJOq | SimonHeybrock 12912489 | 2022-08-16T04:15:24Z | 2022-08-16T04:15:24Z | NONE |
Partially, but the bigger challenge may be the related algorithms, e.g., for getting data into this layout, and for switching to other ragged layouts. For context, one of the main reasons for our data layout is the ability to make cuts/slices quickly. We frequently deal with 2-D, 3-D, and 4-D data. For example, a 3-D case may be be the momentum transfer $\vec Q$ in a scattering process, with a "record" for every detected neutron. Desired final resolution may exceed 1000 per dimension (of the 3 components of $\vec Q$). On top of this there may be additional dimensions relating to environment parameters of the sample under study, such as temperature, pressure, or strain. This would lead to bin-counts that cannot be handled easily (in single-node memory). A naive solution could be to simply work with something like Scipp's ragged data can be considered a "partial sorting", to build a sort of "index". Based on all this we can then, e.g., quickly compute high-resolution cuts. Say we are in 3-D (Qx, Qy, Qz). We would not have bin sizes that match the final resolution required by the science. Instead we could use 50x50x50 bins. Then we can very quickly produce a high-res 2-D plot (say (1000x1000), Qx, Qz or whatever), since our binned data format reduces the data/memory you have to load and consider by a factor of up to 50 (in this example). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1