issue_comments
8 rows where issue = 667864088 and user = 35968931 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Awkward array backend? · 8 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1302293898 | https://github.com/pydata/xarray/issues/4285#issuecomment-1302293898 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85Nn22K | TomNicholas 35968931 | 2022-11-03T15:34:57Z | 2022-11-03T15:34:57Z | MEMBER |
Oops - use thomas dot nicholas at columbia dot edu please! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1302240686 | https://github.com/pydata/xarray/issues/4285#issuecomment-1302240686 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85Nnp2u | TomNicholas 35968931 | 2022-11-03T14:58:11Z | 2022-11-03T14:58:11Z | MEMBER | I should be able to join today as well @jpivarski ! Will need the zoom address |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1287028512 | https://github.com/pydata/xarray/issues/4285#issuecomment-1287028512 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85Mtn8g | TomNicholas 35968931 | 2022-10-21T14:15:44Z | 2022-10-21T14:15:44Z | MEMBER | That sounds extremely exciting @milancurcic ! Someone dedicated who wants to make a widely-useful tool is exactly what is needed. I think there are many technical questions (and tbh I didn't really follow a lot of the details of your last comment @jpivarski), but the answers to those will likely depend on intended use cases. I'm happy to attend a video call to discuss this, and think that organising one with people interested in ragged arrays and xarray across disciplines would be a sensible next step. (You should also advertise such a meeting on the pangeo discourse - we could start a new pangeo working group like this if it goes well.) |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1211197176 | https://github.com/pydata/xarray/issues/4285#issuecomment-1211197176 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85IMWb4 | TomNicholas 35968931 | 2022-08-10T19:51:43Z | 2022-08-10T19:56:02Z | MEMBER |
Very interesting @jpivarski - that would make a good blog post / think piece if you ever felt like it.
I'm biased in thinking that (1) is true, but then I'm not a particle physicist - the closest I came was using ROOT in undergrad extremely briefly :smile: .
Now seems like a good time to list some potential use cases for a 1) Oceanography observation data NOAA's Global Drifter Program tracks the movement of floating buoys, each of which takes measurements at specified time intervals as it moves along. As each drifter may take a completely different path across the ocean, the length of their trajectories is variable. @dhruvbalwada pointed me to this notebook which compares analyzing drifter data using 1) xarray wrapping rectilinear arrays
2) pandas
3) Reading the notebook it seems that a new option (4) of ragged data within xarray might well be the best of both worlds for this particular use case. @selipot @philippemiron is creating a 2) Alleles in Genomics Allele data can have a wide variation in the number of alt alleles (most variants will have one, but a few could have thousands), as mentioned by @tomwhite in https://github.com/pystatgen/sgkit/issues/634. I'm not sure whether the I'm also unclear if this would be useful for ANNData https://github.com/scverse/anndata/issues/744 (cc @ivirshup) 3) Neutron scattering data Scipp is an xarray-like labelled data structure for neutron scattering experiment data. On their FAQ Q titled "Why is xarray not enough", one of the things they quote is
Would a 4) Other "Record"-like data A "Record" is for when you want to store multiple pieces of information (of possibly different types) about an "event". In Whilst I don't think we can store awkward arrays containing Records directly in xarray (though after @shoyer's comment I'm not so sure...), what we could do is have multiple named data variables, each of which contains a As an example of a quirky use case for record-like data, a biologist friend recently showed me a dataset of hummingbird feeding patterns. He had strapped RFID tags to hundreds of hummingbirds, then set up feeder stations equipped with radio antennae. When the birds came to feed an event would be recorded. As the resulting data varied with bird ID, date, and feeder, but each individual bird could visit any particular feeder any number of times on a given day, I thought he could store this data in a Ragged array within xarray with the dimension representing number of visits having variable length. There are probably a lot more possible use cases for a |
{ "total_count": 3, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 3, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1209967070 | https://github.com/pydata/xarray/issues/4285#issuecomment-1209967070 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85IHqHe | TomNicholas 35968931 | 2022-08-09T22:47:24Z | 2022-08-10T05:50:40Z | MEMBER | Thanks for the huge response there @jpivarski !
This is an important point which I meant to ask about earlier. We need a
If you want a
That makes sense. And if you subclassed then I guess you would also need to change those Thanks for the wrapping example! I think there is a bug with your ```python In [1]: from ragged import RaggedArray In [2]: ra = RaggedArray([[1, 2, 3], [4, 5]]) In [3]: ra.ndim Out[3]: 1 In [4]: ra.shape
Out[4]: [3]
I would really like to try testing the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1210175870 | https://github.com/pydata/xarray/issues/4285#issuecomment-1210175870 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85IIdF- | TomNicholas 35968931 | 2022-08-10T05:25:17Z | 2022-08-10T05:32:13Z | MEMBER |
I see, makes sense.
Oh I was just thinking if we're building a new class that is tightly coupled to
I don't think it's within scope of xarray to offer a numpy-like array class in our main library - we don't do this for any other case!
However we could definitely have a separate
Yeah that wouldn't be ideal. (Digression: From my perspective part of the problem is that merely generalising numpy arrays to be ragged would have been useful for lots of people, but
That's very interesting. I'm not immediately sure which of those would be best for xarray wrapping - I think it's plausible that we could eventually support any of those options... ((3) through the issues Deepak linked to (#5168, #2801).)
Thanks for fixing that, and for all the explanations! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1208617600 | https://github.com/pydata/xarray/issues/4285#issuecomment-1208617600 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85ICgqA | TomNicholas 35968931 | 2022-08-08T21:15:01Z | 2022-08-08T21:15:27Z | MEMBER |
That's very helpful, thank you!
(FWIW I find the "behavior" stuff very confusing in general, even after reading the docs page on it. I don't really understand why I can't just reimplement my monkey-patched example above by subclassing
How would I do this without monkey-patching? All I really want (and I hazard all that most xarray users want) is to be able to import some class from
What's the benefit of doing this over just using
I can see that this might be useful in xarray's This is exciting though @jpivarski ! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Awkward array backend? 667864088 | |
1200110315 | https://github.com/pydata/xarray/issues/4285#issuecomment-1200110315 | https://api.github.com/repos/pydata/xarray/issues/4285 | IC_kwDOAMm_X85HiDrr | TomNicholas 35968931 | 2022-07-30T07:40:59Z | 2022-07-30T07:40:59Z | MEMBER | So I actually think we can do this, with some caveats. I recently found a cool dataset with ragged-like data which has rekindled my interest in this interfacing, and given me a real example to try it out with. As far as I understand it the main problem is that awkward arrays don't define a Conceptually though, it seems to me that Let's take an Awkward array that can be coerced directly to a numpy array: ```python In [27]: rect = ak.Array([[1, 2, 3], [4, 5, 6]]) ...: rect Out[27]: <Array [[1, 2, 3], [4, 5, 6]] type='2 * var * int64'> In [28]: np.array(rect)
Out[28]:
array([[1, 2, 3],
[4, 5, 6]])
Now imagine a "ragged" (or "jagged") array, which is like a numpy array except that the lengths along one (or more) of the axes can be variable. Awkward allows this, e.g.
However we still conceptually have a "shape". It's either In the second case you can still read off the dtype too. However awkward also allows "Union types", which basically means that one array can contain data of multiple numpy dtypes. Unfortunately this seems to completely break the numpy / xarray model, but we can completely ignore this problem if we simply say that xarray should only try to wrap awkward arrays with non-Union types. I think that's okay - a ragged-length array with a fixed dtype would still be extremely useful! So if we want to wrap an (non-union type) awkward array instance like 1) Generalise xarray to allow for variable-length dimensions This seems hard. Xarray's whole model is built assuming that It would also mean a big change to xarray in order to support one unusual type of array, that goes beyond the data API standard. That breaks xarray's general design philosophy of providing a general wrapper and delegating to domain-specific array implementations / backends / etc. for specificity. 2) Expose a version of This doesn't seem as hard, at least for non-union type awkward arrays. In fact this crude monkey-patching seems to mostly work: ```python In [1]: from awkward import Array, num ...: import numpy as np In [2]: def get_dtype(self) -> np.dtype: ...: if "Union" in str(self.type): ...: raise ValueError("awkward arrays with Union types can't be expressed in terms of a single numpy dtype") ...: ...: datatype = str(self.type).split(" * ")[-1] ...: ...: if datatype == "string": ...: return np.dtype("str") ...: else: ...: return np.dtype(datatype) ...: In [3]: def get_shape(self): ...: if "Union" in str(self.type): ...: raise ValueError("awkward arrays with Union types can't be expressed in terms of a single numpy dtype") ...: ...: lengths = str(self.type).split(" * ")[:-1] ...: ...: for axis in range(self.ndim): ...: if lengths[axis] == "var": ...: lengths[axis] = np.max(num(self, axis)) ...: else: ...: lengths[axis] = int(lengths[axis]) ...: ...: return tuple(lengths) ...: In [4]: def get_size(self): ...: return np.prod(get_shape(self)) ...: In [5]: setattr(Array, 'dtype', property(get_dtype))
...: setattr(Array, 'shape', property(get_shape))
...: setattr(Array, 'size', property(get_size))
```python In [6]: ragged = Array([[1, 2, 3, 100], [4, 5, 6]]) In [7]: import xarray as xr In [8]: da = xr.DataArray(ragged, dims=['x', 't']) In [17]: da Out[17]: <xarray.DataArray (x: 2, t: 4)> <Array [[1, 2, 3, 100], [4, 5, 6]] type='2 * var * int64'> Dimensions without coordinates: x, t In [18]: da.dtype Out[18]: dtype('int64') In [19]: da.size Out[19]: 8 In [20]: da.shape Out[20]: (2, 4) ``` Promising... Let's try indexing: ```python In [21]: da.isel(t=2) Out[21]: <xarray.DataArray (x: 2)> <Array [3, 6] type='2 * int64'> Dimensions without coordinates: x In [22]: da.isel(t=4)ValueError Traceback (most recent call last) Input In [22], in <cell line: 1>() ----> 1 da.isel(t=4) ... File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/highlevel.py:991, in Array.getitem(self, where) 579 """ 580 Args: 581 where (many types supported; see below): Index of positions to (...) 988 have the same dimension as the array being indexed. 989 """ 990 if not hasattr(self, "_tracers"): --> 991 tmp = ak._util.wrap(self.layout[where], self._behavior) 992 else: 993 tmp = ak._connect._jax.jax_utils._jaxtracers_getitem(self, where) ValueError: in ListOffsetArray64 attempting to get 4, index out of range (https://github.com/scikit-hep/awkward-1.0/blob/1.8.0/src/cpu-kernels/awkward_NumpyArray_getitem_next_at.cpp#L21) ``` That's what should happen - xarray delegates the indexing to the underlying array, which throws an error if there is a problem. Arithmetic also seems to work
But we hit snags with numpy functions ```python In [24]: np.mean(da) TypeError Traceback (most recent call last) Input In [24], in <cell line: 1>() ----> 1 np.mean(da) File <array_function internals>:180, in mean(args, *kwargs) File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3430, in mean(a, axis, dtype, out, keepdims, where) 3428 pass 3429 else: -> 3430 return mean(axis=axis, dtype=dtype, out=out, kwargs) 3432 return _methods._mean(a, axis=axis, dtype=dtype, 3433 out=out, kwargs) File ~/Documents/Work/Code/xarray/xarray/core/_reductions.py:1478, in DataArrayReductions.mean(self, dim, skipna, keep_attrs, kwargs)
1403 def mean(
1404 self,
1405 dim: None | Hashable | Sequence[Hashable] = None,
(...)
1409 kwargs: Any,
1410 ) -> DataArray:
1411 """
1412 Reduce this DataArray's data by applying File ~/Documents/Work/Code/xarray/xarray/core/dataarray.py:2930, in DataArray.reduce(self, func, dim, axis, keep_attrs, keepdims, kwargs)
2887 def reduce(
2888 self: T_DataArray,
2889 func: Callable[..., Any],
(...)
2895 kwargs: Any,
2896 ) -> T_DataArray:
2897 """Reduce this array by applying File ~/Documents/Work/Code/xarray/xarray/core/variable.py:1854, in Variable.reduce(self, func, dim, axis, keep_attrs, keepdims, kwargs) 1852 data = func(self.data, axis=axis, kwargs) 1853 else: -> 1854 data = func(self.data, **kwargs) 1856 if getattr(data, "shape", ()) == self.shape: 1857 dims = self.dims File ~/Documents/Work/Code/xarray/xarray/core/duck_array_ops.py:579, in mean(array, axis, skipna, kwargs) 577 return _to_pytimedelta(mean_timedeltas, unit="us") + offset 578 else: --> 579 return _mean(array, axis=axis, skipna=skipna, kwargs) File ~/Documents/Work/Code/xarray/xarray/core/duck_array_ops.py:341, in _create_nan_agg_method.<locals>.f(values, axis, skipna, kwargs) 339 with warnings.catch_warnings(): 340 warnings.filterwarnings("ignore", "All-NaN slice encountered") --> 341 return func(values, axis=axis, kwargs) 342 except AttributeError: 343 if not is_duck_dask_array(values): File <array_function internals>:180, in mean(args, *kwargs) File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/highlevel.py:1434, in Array.array_function(self, func, types, args, kwargs) 1417 def array_function(self, func, types, args, kwargs): 1418 """ 1419 Intercepts attempts to pass this Array to those NumPy functions other 1420 than universal functions that have an Awkward equivalent. (...) 1432 See also #array_ufunc. 1433 """ -> 1434 return ak._connect._numpy.array_function(func, types, args, kwargs) File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/_connect/_numpy.py:43, in array_function(func, types, args, kwargs) 41 return out 42 else: ---> 43 return function(args, *kwargs) TypeError: mean() got an unexpected keyword argument 'dtype' ``` This seems fixable though. In fact I think if we changed https://github.com/pydata/xarray/issues/6845 (@dcherian) then this alternative would already work ```python In [25]: import awkward as ak In [26]: ak.mean(da)ValueError Traceback (most recent call last) Input In [26], in <cell line: 1>() ----> 1 ak.mean(da) File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/operations/reducers.py:971, in mean(x, weight, axis, keepdims, mask_identity) 969 with np.errstate(invalid="ignore"): 970 if weight is None: --> 971 sumw = count(x, axis=axis, keepdims=keepdims, mask_identity=mask_identity) 972 sumwx = sum(x, axis=axis, keepdims=keepdims, mask_identity=mask_identity) 973 else: File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/operations/reducers.py:79, in count(array, axis, keepdims, mask_identity) 10 def count(array, axis=None, keepdims=False, mask_identity=False): 11 """ 12 Args: 13 array: Data in which to count elements. (...) 77 to turn the None values into something that would be counted. 78 """ ---> 79 layout = ak.operations.convert.to_layout( 80 array, allow_record=False, allow_other=False 81 ) 82 if axis is None: 84 def reduce(xs): File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/operations/convert.py:1917, in to_layout(array, allow_record, allow_other, numpytype) 1914 return from_iter([array], highlevel=False) 1916 elif isinstance(array, Iterable): -> 1917 return from_iter(array, highlevel=False) 1919 elif not allow_other: 1920 raise TypeError( 1921 f"{array} cannot be converted into an Awkward Array" 1922 + ak._util.exception_suffix(file) 1923 ) File ~/miniconda3/envs/hummingbirds/lib/python3.10/site-packages/awkward/operations/convert.py:891, in from_iter(iterable, highlevel, behavior, allow_record, initial, resize) 889 out = ak.layout.ArrayBuilder(initial=initial, resize=resize) 890 for x in iterable: --> 891 out.fromiter(x) 892 layout = out.snapshot() 893 return ak._util.maybe_wrap(layout, behavior, highlevel) ValueError: cannot convert <xarray.DataArray ()> array(1) (type DataArray) to an array element (https://github.com/scikit-hep/awkward-1.0/blob/1.8.0/src/python/content.cpp#L974) ``` Suggestion: How about awkward offer a specialized array class which uses the same fast code underneath but disallows Union types, and follows the array API standard, implementing Am I missing anything here? @jpivarski tl;dr We probably could support awkward arrays, at least instances where all values have the same dtype. |
{ "total_count": 4, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 1, "eyes": 2 } |
Awkward array backend? 667864088 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1