home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1210023820

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4285#issuecomment-1210023820 https://api.github.com/repos/pydata/xarray/issues/4285 1210023820 IC_kwDOAMm_X85IH3-M 1852447 2022-08-10T00:36:42Z 2022-08-10T00:36:42Z NONE

If you want a RaggedArray class that is more specific (i.e. defines more attributes) than awkward.Array, then surely the "correct" thing to do would be be to subclass though?

It shouldn't be a subclass because it doesn't satisfy a substitution principle: ak.combinations(array: ak.Array, n: int) -> ak.Array, but ak.combinations(array: RaggedArray, n: int) -> ⊥ (at best, would raise an exception because RaggedArray isn't closed under ak.combinations).

Since RaggedArray can't be used everywhere that an ak.Array can be used, it shouldn't be a subclass.

I mean for eventual integration of RaggedArray within awkward's codebase.

Oh.......

I hadn't been thinking that RaggedArray is something we'd put in the general Awkward Array library. I was thinking of it only as a way to define "the subset of Awkward Arrays that xarray uses," which would live in xarray.

I don't want to introduce another level of type-specificity to the system, since that would make things harder to understand. (Imagine reading the docs and it says, "You can apply this function to ak.Array, but not to ak.RaggedArray." Or "this is an ak.Array that happens to be ragged, but not a ak.RaggedArray.")

So let me rethink your original idea of adding shape and dtype properties to all ak.Arrays. Perhaps they should raise exceptions when the array is not a ragged array? People don't usually expect properties to raise exceptions, and you really need them to be properties with the exact spellings "shape" and "dtype" to get what you want.

If that point is negotiable, I could introduce an ak.shape_dtype(array) function that returns a shape and dtype if array has the right properties and raise an exception if it doesn't. That would be more normal: you're asking if it satisfies a specific constraint, and if so, to get some information about it. Then we would also be able to deal with the fact that

  • some people are going to want the shape to specify the maximum of "var" dimensions (what you asked for): "virtually padding",
  • some people are going to want the shape to specify the minimum of "var" dimensions because that tells you what upper bounds are legal to slice: "virtually truncating",
  • and some people are going to want the string "var" or maybe None or maybe np.nan in place of "var" dimensions because no integer is correct. Then they would have to deal with the fact that this shape is not a tuple of integers.

Or maybe the best way to present it is with a min_shape and a max_shape, whose items are equal where the array is regular.

Anyway, you can see why I'm loath to add a property to ak.Array that's just named "shape"? That has the potential for misinterpretation. (Pandas wanted arrays to have a shape that is always equal to (len(array),); if we satisfied that constraint, we couldn't satisfy yours!) In fact, "dtype" of a general array would be misleading, too, though a list of unique "dtypes" of all the leaf-nodes could be a useful thing to have. (2 shapes and n dtypes!)

But if I'm providing it as an extra function, or as a trio of properties named min_shape, max_shape, and dtypes which are all spelled differently from the shape and dtype you want, you'd then be forced to wrap it as a RaggedArray type within xarray again, anyway. Which, as a reminder, is what we're doing for Pandas: https://github.com/intake/awkward-pandas lives outside the Awkward codebase and it wraps ak.Array to put them in Series.

So in the end, I just came back to where we started: xarray would own the RaggedArray wrapper. Or it could be a third package, as awkward-pandas is to awkward and pandas.


(I expected 2 and (3, 2) respectively). I think perhaps context["shape"] is being overwritten as it recurses through the data structure, when it should be being appended?

No, I initialized it incorrectly: it should have started as

python context = {"shape": [len(array)]}

and then recurse from there. My previous example also had the wrong output, but I didn't count square brackets carefully enough to have caught it. (By the way, not copying the context is why it's called "lateral"; if a copied dict is needed, it's "depth_context". I just went back and checked: yes, they're being handled appropriately.)

I fixed the code that I wrote in the comments above for posterity.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  667864088
Powered by Datasette · Queries took 0.561ms · About: xarray-datasette