home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1211197176

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4285#issuecomment-1211197176 https://api.github.com/repos/pydata/xarray/issues/4285 1211197176 IC_kwDOAMm_X85IMWb4 35968931 2022-08-10T19:51:43Z 2022-08-10T19:56:02Z MEMBER

Also on the digression, I just want to clarify where we're coming from, why we did the things we did.

Very interesting @jpivarski - that would make a good blog post / think piece if you ever felt like it.

Two possible conclusions:

I'm biased in thinking that (1) is true, but then I'm not a particle physicist - the closest I came was using ROOT in undergrad extremely briefly :smile: .

If it turns out that conclusion (1) is right or more right than (2), then at least a subset of what we're working on is going to be useful to the wider community.

That said, as we've been looking for use-cases beyond particle physics, most of them would be handled well by simple ragged arrays.

Either way, I would definitely encourage figuring out some actual use-cases before building this out :)

Does anyone see any other potential use case?

Now seems like a good time to list some potential use cases for a RaggedArray that's wrappable by xarray, and tag people who might be interested in taking the development on as a project.

1) Oceanography observation data

NOAA's Global Drifter Program tracks the movement of floating buoys, each of which takes measurements at specified time intervals as it moves along. As each drifter may take a completely different path across the ocean, the length of their trajectories is variable.

@dhruvbalwada pointed me to this notebook which compares analyzing drifter data using

1) xarray wrapping rectilinear arrays 2) pandas 3) awkward.Array

Reading the notebook it seems that a new option (4) of ragged data within xarray might well be the best of both worlds for this particular use case.

@selipot @philippemiron is creating a RaggedArray class in order to wrap awkward data in xarray something that could be tackled as part of the @Cloud-Drift project? (cc @Marioherreroglez too)

2) Alleles in Genomics

Allele data can have a wide variation in the number of alt alleles (most variants will have one, but a few could have thousands), as mentioned by @tomwhite in https://github.com/pystatgen/sgkit/issues/634.

I'm not sure whether the RaggedArray class being proposed here would work for that use case?

I'm also unclear if this would be useful for ANNData https://github.com/scverse/anndata/issues/744 (cc @ivirshup)

3) Neutron scattering data

Scipp is an xarray-like labelled data structure for neutron scattering experiment data. On their FAQ Q titled "Why is xarray not enough", one of the things they quote is

Support for event data, a particular form of sparse data. More concretely, this is essentially a 1-D (or N-D) array of random-length lists, with very small list entries. This type of data arises in time-resolved detection of neutrons in pixelated detectors.

Would a RaggedArray class that's wrappable in xarray help with this? (cc @simonheybrock)

4) Other "Record"-like data

A "Record" is for when you want to store multiple pieces of information (of possibly different types) about an "event".

In awkward a Record can be contained within an awkward.array.

Whilst I don't think we can store awkward arrays containing Records directly in xarray (though after @shoyer's comment I'm not so sure...), what we could do is have multiple named data variables, each of which contains a RaggedArray of the same shape. This should be roughly equivalent IIUC.

As an example of a quirky use case for record-like data, a biologist friend recently showed me a dataset of hummingbird feeding patterns. He had strapped RFID tags to hundreds of hummingbirds, then set up feeder stations equipped with radio antennae. When the birds came to feed an event would be recorded. As the resulting data varied with bird ID, date, and feeder, but each individual bird could visit any particular feeder any number of times on a given day, I thought he could store this data in a Ragged array within xarray with the dimension representing number of visits having variable length.


There are probably a lot more possible use cases for a RaggedArray in xarray that I'm not currently aware of!

{
    "total_count": 3,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 3,
    "rocket": 0,
    "eyes": 0
}
  667864088
Powered by Datasette · Queries took 0.676ms · About: xarray-datasette