issue_comments: 1208723159

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/4285#issuecomment-1208723159	https://api.github.com/repos/pydata/xarray/issues/4285	1208723159	IC_kwDOAMm_X85IC6bX	1852447	2022-08-08T23:30:12Z	2022-08-10T00:02:44Z	NONE	Given that you have an array of only list-type, regular-type, and numpy-type (which the `prepare` function, above, guarantees), here's a one-pass function to get the dtype and shape: ```python def shape_dtype(layout, lateral_context, *kwargs): if layout.is_RegularType: lateral_context["shape"].append(layout.size) elif layout.is_ListType: max_size = ak.max(ak.num(layout)) lateral_context["shape"].append(max_size) elif layout.is_NumpyType: lateral_context["dtype"] = layout.dtype else: raise AssertionError(f"what? {layout.form.type}") context = {"shape": [len(array)]} array.layout.recursively_apply( shape_dtype, lateral_context=context, return_array=False ) check context for "shape" and "dtype" ``` Here's the application on an array of mixed regular and irregular lists: ```python array = ak.to_regular(ak.Array([[[[1, 2, 3], []]], [[[4], [5]], [[6, 7], [8]]]]), axis=2) print(array.type) 2 var * 2 * var * int64 context = {"shape": [len(array)]} array.layout.recursively_apply( ... shape_dtype, lateral_context=context, return_array=False ... ) context {'shape': [2, 2, 2, 3], 'dtype': dtype('int64')} ``` (This `recursively_apply` is a Swiss Army knife for restructuring or getting data out of layouts that we use internally all over the codebase, and intend to make public in v2: https://github.com/scikit-hep/awkward/issues/516.) To answer your question about monkey-patching, I think it would be best to make a wrapper. You don't want to give all `ak.Array` instances properties named `shape` and `dtype`, since those properties won't make sense for general types. This is exactly the reason we had to back off on making `ak.Array` inherit from `pandas.api.extensions.ExtensionArray`: Pandas wanted it to have methods with names and behaviors that would have been misleading for Awkward Arrays. We think we'll be able to reintroduce Awkward Arrays as Pandas columns by wrapping them—that's what we're doing differently this time. Here's a start of a wrapper: ```python class RaggedArray: def init(self, array_like): layout = ak.to_layout(array_like, allow_record=False, allow_other=False) behavior = None if isinstance(array_like, ak.Array): behavior = array_like.behavior self._array = ak.Array(layout.recursively_apply(prepare), behavior=behavior) context = {"shape": [len(self._array)]} self._array.layout.recursively_apply( shape_dtype, lateral_context=context, return_array=False ) self._shape = context["shape"] self._dtype = context["dtype"] def __repr__(self): # this is pretty cheesy return "<Ragged" + repr(self._array)[1:] @property def dtype(self): return self._dtype @property def shape(self): return self._shape def __getitem__(self, where): if isinstance(where, RaggedArray): where = where._array if isinstance(where, tuple): where = tuple(x._array if isinstance(x, RaggedArray) else x for x in where) out = self._array[where] if isinstance(out, ak.Array): return RaggedArray(out) else: return out def __array_ufunc__(self, ufunc, method, inputs, kwargs): inputs = [x._array if isinstance(x, RaggedArray) else x for x in inputs] out = self._array.__array_ufunc__(ufunc, method, inputs, *kwargs) return RaggedArray(out) def sum(self, axis=None, keepdims=False, mask_identity=False): out = ak.sum(self._array, axis=axis, keepdims=keepdims, mask_identity=mask_identity) if isinstance(out, ak.Array): return RaggedArray(out) else: return out ``` It keeps an `_array` (`ak.Array`), performs all internal operations on the `ak.Array` level (unwrapping RaggedArrays if necessary), but returns RaggedArrays (if the output is not scalar). It handles only the operations you want it to: this one handles all the complex slicing, NumPy ufuncs, and one reducer, `sum`. Thus, it can act as a gatekeeper of what kinds of operations are allowed: `ak.` won't recognize RaggedArray, which is good because some `ak.*` functions would take you out of this "ragged array" subset of types. You can add some non-ufunc NumPy functions with `__array_function__`, but only the ones that make sense for this subset of types. I meant to say something earlier about why we go for full generality in types: it's because some of the things we want to do, such as ak.cartesian, require more complex types, and as soon as one function needs it, the whole space needs to be enlarged. For the first year of Awkward Array use, most users wanted it for plain ragged arrays (based on their bug-reports and questions), but after about a year, they were asking about missing values and records, too, because you eventually need them unless you intend to work within a narrow set of functions. Union arrays are still not widely used, but they can come from some file formats. Some GeoJSON files that I looked at had longitude, latitude points in different list depths because some were points and some were polygons, disambiguated by a string label. That's not good to work with (we can't handle that in Numba, for instance), but if you select all points with some slice, put them in one array, and select all polygons with another slice, putting them in their own array, these each become trivial unions, and that's why I added the squashing of trivial unions to the `prepare` function example above.	{ "total_count": 3, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 3, "rocket": 0, "eyes": 0 }		667864088

issue_comments: 1208723159

check context for "shape" and "dtype"