html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/4285#issuecomment-1302293898,https://api.github.com/repos/pydata/xarray/issues/4285,1302293898,IC_kwDOAMm_X85Nn22K,35968931,2022-11-03T15:34:57Z,2022-11-03T15:34:57Z,MEMBER,"> The email that you have listed here doesn't work (bounced back).
Oops - use thomas dot nicholas at columbia dot edu please!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1302240686,https://api.github.com/repos/pydata/xarray/issues/4285,1302240686,IC_kwDOAMm_X85Nnp2u,35968931,2022-11-03T14:58:11Z,2022-11-03T14:58:11Z,MEMBER,I should be able to join today as well @jpivarski ! Will need the zoom address,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1287028512,https://api.github.com/repos/pydata/xarray/issues/4285,1287028512,IC_kwDOAMm_X85Mtn8g,35968931,2022-10-21T14:15:44Z,2022-10-21T14:15:44Z,MEMBER,"That sounds extremely exciting @milancurcic ! Someone dedicated who wants to make a widely-useful tool is exactly what is needed.
I think there are many technical questions (and tbh I didn't really follow a lot of the details of your last comment @jpivarski), but the answers to those will likely depend on intended use cases.
I'm happy to attend a video call to discuss this, and think that organising one with people interested in ragged arrays and xarray across disciplines would be a sensible next step. (You should also advertise such a meeting on the pangeo discourse - we could start a new pangeo working group [like this](https://discourse.pangeo.io/t/new-working-group-for-distributed-array-computing/2734) if it goes well.)","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1211197176,https://api.github.com/repos/pydata/xarray/issues/4285,1211197176,IC_kwDOAMm_X85IMWb4,35968931,2022-08-10T19:51:43Z,2022-08-10T19:56:02Z,MEMBER,"> Also on the digression, I just want to clarify where we're coming from, why we did the things we did.
Very interesting @jpivarski - that would make a good blog post / think piece if you ever felt like it.
> Two possible conclusions:
I'm biased in thinking that (1) is true, but then I'm not a particle physicist - the closest I came was using ROOT in undergrad extremely briefly :smile: .
> If it turns out that conclusion (1) is right or more right than (2), then at least a subset of what we're working on is going to be useful to the wider community.
> That said, as we've been looking for use-cases beyond particle physics, most of them would be handled well by simple ragged arrays.
> Either way, I would definitely encourage figuring out some actual use-cases before building this out :)
> Does anyone see any other potential use case?
Now seems like a good time to list some potential use cases for a `RaggedArray` that's wrappable by xarray, and tag people who might be interested in taking the development on as a project.
1) **Oceanography observation data**
NOAA's [Global Drifter Program](https://www.aoml.noaa.gov/phod/gdp/) tracks the movement of floating buoys, each of which takes measurements at specified time intervals as it moves along. As each drifter may take a completely different path across the ocean, the length of their trajectories is variable.

@dhruvbalwada pointed me to [this notebook](https://github.com/Cloud-Drift/earthcube-meeting-2022) which compares analyzing drifter data using
1) xarray wrapping rectilinear arrays
2) pandas
3) `awkward.Array`
Reading the notebook it seems that a new option (4) of ragged data within xarray might well be the best of both worlds for this particular use case.
@selipot @philippemiron is creating a `RaggedArray` class in order to wrap awkward data in xarray something that could be tackled as part of the @Cloud-Drift project? (cc @Marioherreroglez too)
2) **Alleles in Genomics**
Allele data can have a wide variation in the number of alt alleles (most variants will have one, but a few could have thousands), as mentioned by @tomwhite in https://github.com/pystatgen/sgkit/issues/634.
I'm not sure whether the `RaggedArray` class being proposed here would work for that use case?
I'm also unclear if this would be useful for ANNData https://github.com/scverse/anndata/issues/744 (cc @ivirshup)
3) **Neutron scattering data**
[Scipp](https://github.com/scipp/scipp) is an xarray-like labelled data structure for neutron scattering experiment data. On their FAQ Q titled [""Why is xarray not enough""](https://scipp.github.io/getting-started/faq.html#why-is-xarray-not-enough), one of the things they quote is
> Support for event data, a particular form of sparse data. More concretely, this is essentially a 1-D (or N-D) array of random-length lists, with very small list entries. This type of data arises in time-resolved detection of neutrons in pixelated detectors.
Would a `RaggedArray` class that's wrappable in xarray help with this? (cc @simonheybrock)
4) **Other ""Record""-like data**
A ""Record"" is for when you want to store multiple pieces of information (of possibly different types) about an ""event"".
In `awkward` a [`Record`](https://awkward-array.org/how-to-create-records.html) can be contained within an `awkward.array`.
Whilst I don't think we can store awkward arrays containing Records directly in xarray (though after @shoyer's [comment](https://github.com/pydata/xarray/issues/4285#issuecomment-1210190649) I'm not so sure...), what we could do is have multiple named data variables, each of which contains a `RaggedArray` of the same shape. This should be roughly equivalent IIUC.
As an example of a quirky use case for record-like data, a biologist friend recently showed me a dataset of hummingbird feeding patterns. He had strapped RFID tags to hundreds of hummingbirds, then set up feeder stations equipped with radio antennae. When the birds came to feed an event would be recorded. As the resulting data varied with bird ID, date, and feeder, but each individual bird could visit any particular feeder any number of times on a given day, I thought he could store this data in a Ragged array within xarray with the dimension representing number of visits having variable length.
---
There are probably a lot more possible use cases for a `RaggedArray` in xarray that I'm not currently aware of!","{""total_count"": 3, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 3, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1209967070,https://api.github.com/repos/pydata/xarray/issues/4285,1209967070,IC_kwDOAMm_X85IHqHe,35968931,2022-08-09T22:47:24Z,2022-08-10T05:50:40Z,MEMBER,"Thanks for the huge response there @jpivarski !
> Ragged array is not a specialized subset of types within Awkward Array. There are `ak.*` functions that would take you out of this subset. However (thinking it through...) I don't think slices, ufuncs, or reducers would take you out of this subset.
This is an important point which I meant to ask about earlier. We need a `RaggedArray` class which always returns other `RaggedArray` instances (i.e. the set of ragged arrays is closed under the set of numpy-like methods / functions that xarray might call upon it).
> To answer your question about monkey-patching, I think it would be best to make a wrapper. You don't want to give all `ak.Array` instances properties named shape and dtype, since those properties won't make sense for general types. This is exactly the reason [we had to back off](https://github.com/scikit-hep/awkward/issues/350) on making `ak.Array` inherit from `pandas.api.extensions.ExtensionArray`: Pandas wanted it to have methods with names and behaviors that would have been misleading for Awkward Arrays.
If you want a `RaggedArray` class that is more specific (i.e. defines more attributes) than `awkward.Array`, then surely the ""correct"" thing to do would be be to subclass though? I mean for eventual integration of `RaggedArray` within awkward's codebase.
> Thus, it can act as a gatekeeper of what kinds of operations are allowed: `ak.*` won't recognize `RaggedArray`, which is good because some `ak.*` functions would take you out of this ""ragged array"" subset of types. You can add some non-ufunc NumPy functions with `__array_function__`, but only the ones that make sense for this subset of types.
That makes sense. And if you subclassed then I guess you would also need to change those `ak.*` functions to not accept `RaggedArray`, so maybe wrapping is better...
Thanks for the wrapping example! I think there is a bug with your `.shape` method though - if I put your code snippets in a file then they return the wrong results:
```python
In [1]: from ragged import RaggedArray
In [2]: ra = RaggedArray([[1, 2, 3], [4, 5]])
In [3]: ra.ndim
Out[3]: 1
In [4]: ra.shape
Out[4]: [3]
```
(I expected `2` and `(2, 3)` respectively). I think perhaps `context[""shape""]` is being overwritten as it recurses through the data structure, when it should be being appended?
I would really like to try testing the `RaggedArray` class with our WIP public framework for testing duck array compatiblity (#6894). If we can get a very basic wrapper then I could make a PR to add `RaggedArray` to awkward, and import xarray's new tests to test it with.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1210175870,https://api.github.com/repos/pydata/xarray/issues/4285,1210175870,IC_kwDOAMm_X85IIdF-,35968931,2022-08-10T05:25:17Z,2022-08-10T05:32:13Z,MEMBER,"> Since `RaggedArray` can't be used everywhere that an `ak.Array` can be used, it shouldn't be a subclass.
I see, makes sense.
> I hadn't been thinking that RaggedArray is something we'd put in the general Awkward Array library.
Oh I was just thinking if we're building a new class that is tightly coupled to `awkward.Array` then it should live in `awkward`. (I also would like someone else to maintain it ideally! :sweat_smile: )
> I was thinking of it only as a way to define ""the subset of Awkward Arrays that xarray uses,"" which would live in xarray.
I don't think it's within scope of xarray to offer a numpy-like array class in our main library - we don't do this for any other case!
> Or it could be a third package, as awkward-pandas is to awkward and pandas.
However we could definitely have a separate `awkward-xarray` package that lives in [xarray-contrib](https://github.com/xarray-contrib/) and provides a `RaggedArray` class. (see [pint-xarray](https://github.com/xarray-contrib/pint-xarray) for something sort of similar.) That seems fine, all it takes is some keen bean to take our prototypes here and turn them into something usable...
> (Imagine reading the docs and it says, ""You can apply this function to ak.Array, but not to ak.RaggedArray."" Or ""this is an ak.Array that happens to be ragged, but not a ak.RaggedArray."")
Yeah that wouldn't be ideal.
(Digression: From my perspective part of the problem is that *merely* generalising numpy arrays to be ragged would have been useful for lots of people, but `awkward.Array` goes a lot further. It also generalises the type system, adds things like Records, and possibly adds [xarray-like features](https://github.com/scikit-hep/awkward/issues/1391). That puts `awkward.Array` in a somewhat ill-defined place within the wider scientific python ecosystem: it's kind of a numpy-like duck array, but can't be treated as one, it's also a more general type system, and it might even get features of higher-level data structures like xarray.)
> - some people are going to want the `shape` to specify the maximum of ""var"" dimensions (what you asked for): ""virtually padding"",
> - some people are going to want the `shape` to specify the minimum of ""var"" dimensions because that tells you what upper bounds are legal to slice: ""virtually truncating"",
> - and some people are going to want the string ""var"" or maybe `None` or maybe `np.nan` in place of ""var"" dimensions because no integer is correct. Then they would have to deal with the fact that this shape is not a tuple of integers.
That's very interesting. I'm not immediately sure which of those would be best for xarray wrapping - I think it's plausible that we could eventually support any of those options... ((3) through the issues Deepak linked to (#5168, #2801).)
> I fixed the code that I wrote in the comments above for posterity.
Thanks for fixing that, and for all the explanations!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1208617600,https://api.github.com/repos/pydata/xarray/issues/4285,1208617600,IC_kwDOAMm_X85ICgqA,35968931,2022-08-08T21:15:01Z,2022-08-08T21:15:27Z,MEMBER,"> You mentioned union arrays, but for completeness, the type system in Awkward Array has
> ...
> Here's a way to determine if an array (Python type ak.Array) is in that subset
That's very helpful, thank you!
> ```python
> ak.Array(array.layout.recursively_apply(prepare), behavior=array.behavior)
> ```
(FWIW I find the ""behavior"" stuff very confusing in general, even after reading the [docs page](https://awkward-array.readthedocs.io/en/latest/ak.behavior.html) on it. I don't really understand why I can't just reimplement my monkey-patched example above by [subclassing `ak.Array`](https://github.com/scikit-hep/awkward/discussions/1177), or should I be wrapping it?)
> it would be possible to define shape with some token for the variable-length dimensions and dtype.
How would I do this without monkey-patching?
All I really want (and I hazard all that most xarray users want) is to be able to import some class from `awkward` that offers only the simplest possible Ragged Array, that conforms to the data API standard (i.e. defines `shape` and `dtype`).
> Oh, if you're replacing variable-length dimensions with the maximum length in that dimension, what about actually padding the array with [ak.pad_none](https://awkward-array.readthedocs.io/en/latest/_auto/ak.pad_none.html)?
What's the benefit of doing this over just using `ak.num` on each axis like I did above?
> That uses all the memory of a padded array, but it's what people use now if they want to convert Awkward data into non-Awkward data (maybe passing the final step to [ak.to_numpy](https://awkward-array.readthedocs.io/en/latest/_auto/ak.to_numpy.html)).
I can see that this might be useful in [xarray's `.to_numpy` methods](https://github.com/pydata/xarray/blob/c8607e19ba57537326114edcb1d56ab70fd05583/xarray/core/variable.py#L1145) though.
This is exciting though @jpivarski !","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,667864088
https://github.com/pydata/xarray/issues/4285#issuecomment-1200110315,https://api.github.com/repos/pydata/xarray/issues/4285,1200110315,IC_kwDOAMm_X85HiDrr,35968931,2022-07-30T07:40:59Z,2022-07-30T07:40:59Z,MEMBER,"**So I actually think we can do this, with some caveats.**
I recently found a cool dataset with ragged-like data which has rekindled my interest in this interfacing, and given me a real example to try it out with.
As far as I understand it the main problem is that awkward arrays don't define a `shape` or `dtype` attribute. Instead they follow a different model [(the ""datashape"" model)](https://datashape.readthedocs.io/en/latest/overview.html). Xarray expects `shape` and `dtype` to be defined, and given that those attributes are in the data API standard, this is a pretty reasonable expectation for most cases.
(There is a [useful discussion here](https://github.com/data-apis/consortium-feedback/discussions/6) on the data-apis consortium repo about why awkward arrays don't define these attributes in general.)
Conceptually though, it seems to me that `shape` and `dtype` do make sense for Awkward arrays, at least for some subset of them, because Awkward's ""type"" is clearly related to the normal notion of `shape` and `dtype`.
Let's take an Awkward array that can be coerced directly to a numpy array:
```python
In [27]: rect = ak.Array([[1, 2, 3], [4, 5, 6]])
...: rect
Out[27]:
In [28]: np.array(rect)
Out[28]:
array([[1, 2, 3],
[4, 5, 6]])
```
Here there is a clear correspondence: the first axis of the awkward array has length 2, and because *in this case* the second axis has a consistent length of 3, we can coerce this to a numpy array with `shape=(2,3)`. The dtype also makes sense, because *in this case* the awkward array only contains data of one type, an `int64`.
Now imagine a ""ragged"" (or ""jagged"") array, which is like a numpy array except that the lengths along one (or more) of the axes can be variable. Awkward allows this, e.g.
```python
In [29]: ragged = ak.Array([[1, 2, 3, 100], [4, 5, 6]])
...: ragged
Out[29]:
```
but a direct coercion to numpy will fail.
However **we still conceptually have a ""shape""**. It's either `(2, ""var"")`, where ""var"" means a variable length across the other axes, or alternatively we could say the shape is `(2, 4)`, where `4` is simply the maximum length along the variable-length axis. The latter interpretation is kind of similar to sparse arrays.
In the second case **you can still read off the dtype too**. However awkward also allows ""Union types"", which basically means that one array can contain data of multiple numpy dtypes. Unfortunately this seems to completely break the numpy / xarray model, but we can completely ignore this problem if we simply say that **xarray should only try to wrap awkward arrays with non-Union types**. I think that's okay - a ragged-length array with a fixed dtype would still be extremely useful!
---
So if we want to wrap an (non-union type) awkward array instance like `ragged` in xarray we have to do one of two things:
1) **Generalise xarray to allow for variable-length dimensions**
This seems hard. Xarray's whole model is built assuming that `dims` has type `Mapping[Hashable, int]`. It also breaks our normal concept of alignment, which we need to put coordinate variables in DataArrays alongside data variables.
It would also mean a big change to xarray in order to support one unusual type of array, that goes beyond the data API standard. That breaks xarray's general design philosophy of providing a general wrapper and delegating to domain-specific array implementations / backends / etc. for specificity.
2) **Expose a version of `shape` and `dtype` on Awkward arrays**
This doesn't seem as hard, at least for non-union type awkward arrays. In fact this crude monkey-patching seems to mostly work:
```python
In [1]: from awkward import Array, num
...: import numpy as np
In [2]: def get_dtype(self) -> np.dtype:
...: if ""Union"" in str(self.type):
...: raise ValueError(""awkward arrays with Union types can't be expressed in terms of a single numpy dtype"")
...:
...: datatype = str(self.type).split("" * "")[-1]
...:
...: if datatype == ""string"":
...: return np.dtype(""str"")
...: else:
...: return np.dtype(datatype)
...:
In [3]: def get_shape(self):
...: if ""Union"" in str(self.type):
...: raise ValueError(""awkward arrays with Union types can't be expressed in terms of a single numpy dtype"")
...:
...: lengths = str(self.type).split("" * "")[:-1]
...:
...: for axis in range(self.ndim):
...: if lengths[axis] == ""var"":
...: lengths[axis] = np.max(num(self, axis))
...: else:
...: lengths[axis] = int(lengths[axis])
...:
...: return tuple(lengths)
...:
In [4]: def get_size(self):
...: return np.prod(get_shape(self))
...:
In [5]: setattr(Array, 'dtype', property(get_dtype))
...: setattr(Array, 'shape', property(get_shape))
...: setattr(Array, 'size', property(get_size))
```
Now if we make the same ragged array but with the monkey-patched class, we have a sensible return value for `dtype`, `shape`, and `size`, which means that the xarray constructors will accept our Array now!
```python
In [6]: ragged = Array([[1, 2, 3, 100], [4, 5, 6]])
In [7]: import xarray as xr
In [8]: da = xr.DataArray(ragged, dims=['x', 't'])
In [17]: da
Out[17]:
Dimensions without coordinates: x, t
In [18]: da.dtype
Out[18]: dtype('int64')
In [19]: da.size
Out[19]: 8
In [20]: da.shape
Out[20]: (2, 4)
```
Promising...
Let's try indexing:
```python
In [21]: da.isel(t=2)
Out[21]:
Dimensions without coordinates: x
In [22]: da.isel(t=4)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [22], in ()
----> 1 da.isel(t=4)
...
File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/highlevel.py:991, in Array.__getitem__(self, where)
579 """"""
580 Args:
581 where (many types supported; see below): Index of positions to
(...)
988 have the same dimension as the array being indexed.
989 """"""
990 if not hasattr(self, ""_tracers""):
--> 991 tmp = ak._util.wrap(self.layout[where], self._behavior)
992 else:
993 tmp = ak._connect._jax.jax_utils._jaxtracers_getitem(self, where)
ValueError: in ListOffsetArray64 attempting to get 4, index out of range
(https://github.com/scikit-hep/awkward-1.0/blob/1.8.0/src/cpu-kernels/awkward_NumpyArray_getitem_next_at.cpp#L21)
```
That's what should happen - xarray delegates the indexing to the underlying array, which throws an error if there is a problem.
Arithmetic also seems to work
```python
In [23]: da * 2
Out[23]:
Dimensions without coordinates: x, t
```
But we hit snags with numpy functions
```python
In [24]: np.mean(da)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [24], in ()
----> 1 np.mean(da)
File <__array_function__ internals>:180, in mean(*args, **kwargs)
File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3430, in mean(a, axis, dtype, out, keepdims, where)
3428 pass
3429 else:
-> 3430 return mean(axis=axis, dtype=dtype, out=out, **kwargs)
3432 return _methods._mean(a, axis=axis, dtype=dtype,
3433 out=out, **kwargs)
File ~/Documents/Work/Code/xarray/xarray/core/_reductions.py:1478, in DataArrayReductions.mean(self, dim, skipna, keep_attrs, **kwargs)
1403 def mean(
1404 self,
1405 dim: None | Hashable | Sequence[Hashable] = None,
(...)
1409 **kwargs: Any,
1410 ) -> DataArray:
1411 """"""
1412 Reduce this DataArray's data by applying ``mean`` along some dimension(s).
1413
(...)
1476 array(nan)
1477 """"""
-> 1478 return self.reduce(
1479 duck_array_ops.mean,
1480 dim=dim,
1481 skipna=skipna,
1482 keep_attrs=keep_attrs,
1483 **kwargs,
1484 )
File ~/Documents/Work/Code/xarray/xarray/core/dataarray.py:2930, in DataArray.reduce(self, func, dim, axis, keep_attrs, keepdims, **kwargs)
2887 def reduce(
2888 self: T_DataArray,
2889 func: Callable[..., Any],
(...)
2895 **kwargs: Any,
2896 ) -> T_DataArray:
2897 """"""Reduce this array by applying `func` along some dimension(s).
2898
2899 Parameters
(...)
2927 summarized data and the indicated dimension(s) removed.
2928 """"""
-> 2930 var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, **kwargs)
2931 return self._replace_maybe_drop_dims(var)
File ~/Documents/Work/Code/xarray/xarray/core/variable.py:1854, in Variable.reduce(self, func, dim, axis, keep_attrs, keepdims, **kwargs)
1852 data = func(self.data, axis=axis, **kwargs)
1853 else:
-> 1854 data = func(self.data, **kwargs)
1856 if getattr(data, ""shape"", ()) == self.shape:
1857 dims = self.dims
File ~/Documents/Work/Code/xarray/xarray/core/duck_array_ops.py:579, in mean(array, axis, skipna, **kwargs)
577 return _to_pytimedelta(mean_timedeltas, unit=""us"") + offset
578 else:
--> 579 return _mean(array, axis=axis, skipna=skipna, **kwargs)
File ~/Documents/Work/Code/xarray/xarray/core/duck_array_ops.py:341, in _create_nan_agg_method..f(values, axis, skipna, **kwargs)
339 with warnings.catch_warnings():
340 warnings.filterwarnings(""ignore"", ""All-NaN slice encountered"")
--> 341 return func(values, axis=axis, **kwargs)
342 except AttributeError:
343 if not is_duck_dask_array(values):
File <__array_function__ internals>:180, in mean(*args, **kwargs)
File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/highlevel.py:1434, in Array.__array_function__(self, func, types, args, kwargs)
1417 def __array_function__(self, func, types, args, kwargs):
1418 """"""
1419 Intercepts attempts to pass this Array to those NumPy functions other
1420 than universal functions that have an Awkward equivalent.
(...)
1432 See also #__array_ufunc__.
1433 """"""
-> 1434 return ak._connect._numpy.array_function(func, types, args, kwargs)
File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/_connect/_numpy.py:43, in array_function(func, types, args, kwargs)
41 return out
42 else:
---> 43 return function(*args, **kwargs)
TypeError: mean() got an unexpected keyword argument 'dtype'
```
This seems fixable though.
In fact I think if we changed https://github.com/pydata/xarray/issues/6845 (@dcherian) then this alternative would already work
```python
In [25]: import awkward as ak
In [26]: ak.mean(da)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [26], in ()
----> 1 ak.mean(da)
File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/operations/reducers.py:971, in mean(x, weight, axis, keepdims, mask_identity)
969 with np.errstate(invalid=""ignore""):
970 if weight is None:
--> 971 sumw = count(x, axis=axis, keepdims=keepdims, mask_identity=mask_identity)
972 sumwx = sum(x, axis=axis, keepdims=keepdims, mask_identity=mask_identity)
973 else:
File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/operations/reducers.py:79, in count(array, axis, keepdims, mask_identity)
10 def count(array, axis=None, keepdims=False, mask_identity=False):
11 """"""
12 Args:
13 array: Data in which to count elements.
(...)
77 to turn the None values into something that would be counted.
78 """"""
---> 79 layout = ak.operations.convert.to_layout(
80 array, allow_record=False, allow_other=False
81 )
82 if axis is None:
84 def reduce(xs):
File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/operations/convert.py:1917, in to_layout(array, allow_record, allow_other, numpytype)
1914 return from_iter([array], highlevel=False)
1916 elif isinstance(array, Iterable):
-> 1917 return from_iter(array, highlevel=False)
1919 elif not allow_other:
1920 raise TypeError(
1921 f""{array} cannot be converted into an Awkward Array""
1922 + ak._util.exception_suffix(__file__)
1923 )
File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/operations/convert.py:891, in from_iter(iterable, highlevel, behavior, allow_record, initial, resize)
889 out = ak.layout.ArrayBuilder(initial=initial, resize=resize)
890 for x in iterable:
--> 891 out.fromiter(x)
892 layout = out.snapshot()
893 return ak._util.maybe_wrap(layout, behavior, highlevel)
ValueError: cannot convert
array(1) (type DataArray) to an array element
(https://github.com/scikit-hep/awkward-1.0/blob/1.8.0/src/python/content.cpp#L974)
```
---
Suggestion: How about awkward offer a specialized array class which uses the same fast code underneath but disallows Union types, and follows the array API standard, implementing `shape`, `dtype` etc. as described above. That should then ""just work"" in xarray, in the same way that `sparse` arrays already do.
Am I missing anything here? @jpivarski
---
tl;dr We probably could support awkward arrays, at least instances where all values have the same dtype.
","{""total_count"": 4, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 1, ""eyes"": 2}",,667864088
| | |