issue_comments
14 rows where user = 1852447 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: issue_url, reactions, created_at (date), updated_at (date)
user 1
- jpivarski · 14 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1302254100 | https://github.com/pydata/xarray/issues/4285#issuecomment-1302254100 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85NntIU | jpivarski 1852447 | 2022-11-03T15:07:36Z | 2022-11-03T15:07:36Z | NONE | Send me an email address, and I'll send you the Zoom URL. The email that you have listed here: http://tom-nicholas.com/contact/ doesn't work (bounced back). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1297615976 | https://github.com/pydata/xarray/issues/4285#issuecomment-1297615976 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85NWAxo | jpivarski 1852447 | 2022-10-31T20:04:12Z | 2022-10-31T20:04:12Z | NONE | @milancurcic, @joshmoore, and I are all available on Thursday, November 3 at 11am U.S. Central (12pm U.S. Eastern/Florida, 5pm Central European/Germany: note the unusual U.S.-Europe difference this week, 16:00 UTC). Let's meet then! I sent a Google calendar invitation to both of you at that time, which contains a Zoom URL. If anyone else is interested, let me know and I'll send you the Zoom URL as well (just not on a public GitHub comment). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1295475130 | https://github.com/pydata/xarray/issues/4285#issuecomment-1295475130 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85NN2G6 | jpivarski 1852447 | 2022-10-28T21:15:55Z | 2022-10-28T21:15:55Z | NONE |
Everyone who is interested in this, but particularly @milancurcic, please fill out this poll: https://www.when2meet.com/?17481732-uGwNn and we'll meet by Zoom (URL to be distributed later) to talk about RaggedArray. I'll pick a best time from these responses on Monday. Thanks! |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1283043390 | https://github.com/pydata/xarray/issues/4285#issuecomment-1283043390 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85MebA- | jpivarski 1852447 | 2022-10-18T21:46:29Z | 2022-10-18T21:46:29Z | NONE | This sounds good to me! To represent a (Same for I'm in favor of a video call meeting to discuss this. In general, I'm busiest on U.S. mornings, on Wednesday and Thursday, but perhaps you can send a when2meet or equivalent poll? One thing that could be discussed in writing (maybe more easily) is what data types you would consider in scope for That is,
You don't want record-types or union-types, so the only questions are how to implement (2) and whether you want (3) and (4). Including a type, such as missing data, allows for more function return values but obliges you to consider that type for all function arguments. You'll want to choose carefully how you close your system. (Maybe this block of details can be copied to an issue where you're doing the development of |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1211328405 | https://github.com/pydata/xarray/issues/4285#issuecomment-1211328405 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85IM2eV | jpivarski 1852447 | 2022-08-10T22:01:55Z | 2022-08-10T22:01:55Z | NONE | This is a wonderful list; thank you!
I believe that this use-case benefits from being able to mix regular and ragged dimensions, that the data have 3 regular dimensions and 1 ragged dimension, with the ragged one as the innermost. (The RaggedArray described above has this feature.)
I might have been misinterpreting the word "sparse data" in conversations about this. I had thought that "sparse data" is logically rectilinear but represented in memory with the zeros removed, so the internal machinery has to deal with irregular structures, but the outward API it presents is regular (dimensionality is completely described by a
is definitely what we mean by a ragged array (again with the ragged dimension potentially within zero or more regular dimensions). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1210718350 | https://github.com/pydata/xarray/issues/4285#issuecomment-1210718350 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85IKhiO | jpivarski 1852447 | 2022-08-10T14:01:48Z | 2022-08-10T14:01:48Z | NONE | Also on the digression, I just want to clarify where we're coming from, why we did the things we did.
I can see how minimal extensions of the NumPy array model to include ragged arrays represent the majority of use-cases, though it wouldn't have been enough for our first use-case in particle physics, which looks roughly like this (with made-up numbers):
We needed "records with differently typed fields" and "variable-length lists" to be nestable within each other. It's even sometimes the case that one of the inner records representing a particle has another variable-length list within it, identifying the indexes of particles in the collision event that it's close to. We deliberated on whether those cross-links should allow the structure to be non-tree-like, either a DAG or to actually have cycles (https://github.com/scikit-hep/awkward/issues/178). The prior art is a C++ infrastructure that does have a full graph model: collision events represented as arbitrary C++ class instances, and those arbitrary C++ data are serialized to disk in exabytes of ROOT files. Our first problem was to get a high-performance representation of these data in Python. For that, we didn't need missing data or heterogeneous unions ( Another consideration is that this scope exactly matches Apache Arrow (including the lack of cross-references). As such, we can use Arrow as an interchange format and Parquet as a disk format without having to exclude a subspace of types in either direction. We don't use Arrow as an internal format for performance reasons—we have node types that are lazier than Arrow's so they're better as intermediate arrays in a multi-step calculation—but it's important to have one-to-one, minimal computation (sometimes zero-copy) transformations to and from Arrow. That said, as we've been looking for use-cases beyond particle physics, most of them would be handled well by simple ragged arrays. Also, we've found the "just ragged arrays" part of Arrow to be the most developed or at least the first to be developed, driven by SQL-like applications. Our unit tests in Awkward Array have revealed a lot of unhandled cases in Arrow, particularly the Parquet serialization, that we report in JIRA (and they quickly get resolved). Two possible conclusions:
If it turns out that conclusion (1) is right or more right than (2), then at least a subset of what we're working on is going to be useful to the wider community. If it's (2), though, then it's a big opportunity. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1210023820 | https://github.com/pydata/xarray/issues/4285#issuecomment-1210023820 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85IH3-M | jpivarski 1852447 | 2022-08-10T00:36:42Z | 2022-08-10T00:36:42Z | NONE |
It shouldn't be a subclass because it doesn't satisfy a substitution principle: Since
Oh....... I hadn't been thinking that RaggedArray is something we'd put in the general Awkward Array library. I was thinking of it only as a way to define "the subset of Awkward Arrays that xarray uses," which would live in xarray. I don't want to introduce another level of type-specificity to the system, since that would make things harder to understand. (Imagine reading the docs and it says, "You can apply this function to ak.Array, but not to ak.RaggedArray." Or "this is an ak.Array that happens to be ragged, but not a ak.RaggedArray.") So let me rethink your original idea of adding If that point is negotiable, I could introduce an
Or maybe the best way to present it is with a Anyway, you can see why I'm loath to add a property to ak.Array that's just named " But if I'm providing it as an extra function, or as a trio of properties named So in the end, I just came back to where we started: xarray would own the RaggedArray wrapper. Or it could be a third package, as awkward-pandas is to awkward and pandas.
No, I initialized it incorrectly: it should have started as
and then recurse from there. My previous example also had the wrong output, but I didn't count square brackets carefully enough to have caught it. (By the way, not copying the context is why it's called "lateral"; if a copied dict is needed, it's "depth_context". I just went back and checked: yes, they're being handled appropriately.) I fixed the code that I wrote in the comments above for posterity. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1208723159 | https://github.com/pydata/xarray/issues/4285#issuecomment-1208723159 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85IC6bX | jpivarski 1852447 | 2022-08-08T23:30:12Z | 2022-08-10T00:02:44Z | NONE | Given that you have an array of only list-type, regular-type, and numpy-type (which the ```python def shape_dtype(layout, lateral_context, **kwargs): if layout.is_RegularType: lateral_context["shape"].append(layout.size) elif layout.is_ListType: max_size = ak.max(ak.num(layout)) lateral_context["shape"].append(max_size) elif layout.is_NumpyType: lateral_context["dtype"] = layout.dtype else: raise AssertionError(f"what? {layout.form.type}") context = {"shape": [len(array)]} array.layout.recursively_apply( shape_dtype, lateral_context=context, return_array=False ) check context for "shape" and "dtype"``` Here's the application on an array of mixed regular and irregular lists: ```python
(This To answer your question about monkey-patching, I think it would be best to make a wrapper. You don't want to give all Here's a start of a wrapper: ```python class RaggedArray: def init(self, array_like): layout = ak.to_layout(array_like, allow_record=False, allow_other=False) behavior = None if isinstance(array_like, ak.Array): behavior = array_like.behavior self._array = ak.Array(layout.recursively_apply(prepare), behavior=behavior)
``` It keeps an Thus, it can act as a gatekeeper of what kinds of operations are allowed: I meant to say something earlier about why we go for full generality in types: it's because some of the things we want to do, such as ak.cartesian, require more complex types, and as soon as one function needs it, the whole space needs to be enlarged. For the first year of Awkward Array use, most users wanted it for plain ragged arrays (based on their bug-reports and questions), but after about a year, they were asking about missing values and records, too, because you eventually need them unless you intend to work within a narrow set of functions. Union arrays are still not widely used, but they can come from some file formats. Some GeoJSON files that I looked at had longitude, latitude points in different list depths because some were points and some were polygons, disambiguated by a string label. That's not good to work with (we can't handle that in Numba, for instance), but if you select all points with some slice, put them in one array, and select all polygons with another slice, putting them in their own array, these each become trivial unions, and that's why I added the squashing of trivial unions to the |
{ "total_count": 3, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 3, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1208646168 | https://github.com/pydata/xarray/issues/4285#issuecomment-1208646168 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85ICnoY | jpivarski 1852447 | 2022-08-08T21:46:57Z | 2022-08-08T21:46:57Z | NONE | The passing on of behavior is just to not break applications that depend on it. I did that just for correctness. Monkey-patching will add the desired properties to the Ragged array is not a specialized subset of types within Awkward Array. There are More later... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1208568777 | https://github.com/pydata/xarray/issues/4285#issuecomment-1208568777 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85ICUvJ | jpivarski 1852447 | 2022-08-08T20:18:30Z | 2022-08-08T20:18:30Z | NONE | You mentioned union arrays, but for completeness, the type system in Awkward Array has
You're interested in a subset of this type system, but that subset doesn't just exclude unions, it also excludes records. If you have an xarray, you don't need top-level records since those could just be the columns of an xarray, but some data source might provide records nested within variable-length lists (very common in HEP) or other nesting. It would have to be explicitly excluded. That leaves the possibility of missing lists and missing numeric primitives. Missing lists could be turned into empty lists (Google projects like Protocol Buffers often make that equivalence) and missing numbers could be turned into NaN if you're willing to lose integer-ness. Here's a way to determine if an ```python import awkward._v2 as ak import numpy as np def prepare(layout, continuation, **kwargs): if layout.is_RecordType: raise NotImplementedError("no records!") elif layout.is_UnionType: if len(layout) == 0 or np.all(layout.tags) == layout.tags[0]: return layout.project(layout.tags[0]).recursively_apply(prepare) else: raise NotImplementedError("no non-trivial unions!") elif layout.is_OptionType: next = continuation() # fully recurse content_type = next.content.form.type if isinstance(content_type, ak.types.NumpyType): return ak.fill_none(next, np.nan, axis=0, highlevel=False) elif isinstance(content_type, ak.types.ListType): return ak.fill_none(next, [], axis=0, highlevel=False) elif isinstance(content_type, ak.types.RegularType): return ak.fill_none(next.toListOffsetArray64(False), [], axis=0, highlevel=False) else: raise AssertionError(f"what? {content_type}") ak.Array(array.layout.recursively_apply(prepare), behavior=array.behavior) ``` It should catch all the cases and doesn't rely on string-processing the type's DataShape representation. Given that you're working within that subset, it would be possible to define Oh, if you're replacing variable-length dimensions with the maximum length in that dimension, what about actually padding the array with ak.pad_none?
The above would have to be expanded to get every |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 } |
Awkward array backend? 667864088 | |
1203295236 | https://github.com/pydata/xarray/issues/4285#issuecomment-1203295236 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85HuNQE | jpivarski 1852447 | 2022-08-02T23:03:16Z | 2022-08-02T23:03:16Z | NONE | Hi! I will be looking deeply into this when I get back from traveling (next week). Just to let you know that I saw this and I'm interested. Thanks! |
{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 2, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
890260105 | https://github.com/pydata/xarray/issues/5648#issuecomment-890260105 | https://api.github.com/repos/pydata/xarray/issues/5648 | IC_kwDOAMm_X841EEqJ | jpivarski 1852447 | 2021-07-31T00:06:14Z | 2021-07-31T00:06:14Z | NONE |
I'm interested. Let us know when the time will be or if there's a poll for picking a time. Thanks! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Duck array compatibility meeting 956103236 | |
707321343 | https://github.com/pydata/xarray/issues/4285#issuecomment-707321343 | https://api.github.com/repos/pydata/xarray/issues/4285 | MDEyOklzc3VlQ29tbWVudDcwNzMyMTM0Mw== | jpivarski 1852447 | 2020-10-12T20:08:32Z | 2020-10-12T20:08:32Z | NONE | Copied from https://gitter.im/pangeo-data/Lobby : I've been using Xarray with argopy recently, and the immediate value I see is the documentation of columns, which is semi-lacking in Awkward (one user has been passing this information through an Awkward tree as a scikit-hep/awkward-1.0#422). I should also look into Xarray's indexing, which I've always seen as being the primary difference between NumPy and Pandas; Awkward Array has no indexing, though every node has an optional Identities which would be used to track such information through Awkward manipulations—Identities would have a bijection with externally supplied indexes. They haven't been used for anything yet. Although the elevator pitch for Xarray is "n-dimensional Pandas," it's rather different, isn't it? The contextual metadata is more extensive than anything I've seen in Pandas, and Xarray can be partitioned for out-of-core analysis: Xarray wraps Dask, unlike Dask's array collection, which wraps NumPy. I had troubles getting Pandas to wrap Awkward array (scikit-hep/awkward-1.0#350 ), but maybe these won't be issues for Xarray. One last thing (in this very rambly message): the main difficulty I think we would have in that is that Awkward Arrays don't have shape and dtype, since those define a rectilinear array of numbers. The data model is Datashape plus union types. There is a sense in which ndim is defined: the number of nested lists before reaching the first record, which may split it into different depths for each field, but even this can be ill-defined with union types: ```python
So if we wanted to have an Xarray of Awkward Arrays, we'd have to take stock of all the assumptions Xarray makes about the arrays it contains. |
{ "total_count": 5, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 5, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
665740365 | https://github.com/pydata/xarray/issues/4285#issuecomment-665740365 | https://api.github.com/repos/pydata/xarray/issues/4285 | MDEyOklzc3VlQ29tbWVudDY2NTc0MDM2NQ== | jpivarski 1852447 | 2020-07-29T15:40:24Z | 2020-07-29T15:40:24Z | NONE | I'm linking myself here, to follow this: @jpivarski. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
issue 2