issue_comments: 1208568777
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/4285#issuecomment-1208568777 | https://api.github.com/repos/pydata/xarray/issues/4285 | 1208568777 | IC_kwDOAMm_X85ICUvJ | 1852447 | 2022-08-08T20:18:30Z | 2022-08-08T20:18:30Z | NONE | You mentioned union arrays, but for completeness, the type system in Awkward Array has
You're interested in a subset of this type system, but that subset doesn't just exclude unions, it also excludes records. If you have an xarray, you don't need top-level records since those could just be the columns of an xarray, but some data source might provide records nested within variable-length lists (very common in HEP) or other nesting. It would have to be explicitly excluded. That leaves the possibility of missing lists and missing numeric primitives. Missing lists could be turned into empty lists (Google projects like Protocol Buffers often make that equivalence) and missing numbers could be turned into NaN if you're willing to lose integer-ness. Here's a way to determine if an ```python import awkward._v2 as ak import numpy as np def prepare(layout, continuation, **kwargs): if layout.is_RecordType: raise NotImplementedError("no records!") elif layout.is_UnionType: if len(layout) == 0 or np.all(layout.tags) == layout.tags[0]: return layout.project(layout.tags[0]).recursively_apply(prepare) else: raise NotImplementedError("no non-trivial unions!") elif layout.is_OptionType: next = continuation() # fully recurse content_type = next.content.form.type if isinstance(content_type, ak.types.NumpyType): return ak.fill_none(next, np.nan, axis=0, highlevel=False) elif isinstance(content_type, ak.types.ListType): return ak.fill_none(next, [], axis=0, highlevel=False) elif isinstance(content_type, ak.types.RegularType): return ak.fill_none(next.toListOffsetArray64(False), [], axis=0, highlevel=False) else: raise AssertionError(f"what? {content_type}") ak.Array(array.layout.recursively_apply(prepare), behavior=array.behavior) ``` It should catch all the cases and doesn't rely on string-processing the type's DataShape representation. Given that you're working within that subset, it would be possible to define Oh, if you're replacing variable-length dimensions with the maximum length in that dimension, what about actually padding the array with ak.pad_none?
The above would have to be expanded to get every |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 } |
667864088 |