home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "MEMBER", issue = 628719058 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions

These facets timed out: author_association

user 1

  • shoyer · 7 ✖

issue 1

  • Feature Request: Hierarchical storage and processing in xarray · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1042660100 https://github.com/pydata/xarray/issues/4118#issuecomment-1042660100 https://api.github.com/repos/pydata/xarray/issues/4118 IC_kwDOAMm_X84-JbsE shoyer 1217238 2022-02-17T07:45:24Z 2022-02-17T07:45:24Z MEMBER

One thing that came up in our discussion about this in the developer meeting today is that we could also pretty easily expose a "low level" API for IO using dictionaries of xarray.Variable objects. This intermediate representation could be useful for cleaning up data into a form suitable for conversion into Dataset objects.

On Wed, Feb 16, 2022 at 11:39 PM Alessandro Amici @.***> wrote:

@TomNicholas https://github.com/TomNicholas (cc @mraspaud https://github.com/mraspaud)

Do you have use cases which one of these designs could handle but the other couldn't?

The two main classes of on-disk formats that, I know of, which cannot be always represented in the "group is a Dataset" approach are:

  • in netCDF following the CF conventions for groups https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#groups, it is legal for an array to refer to a dimension or a coordinate in a different group and so arrays in the same group may have dimensions with the same name, but different size / coordinate values,
  • the current spec for the Next-generation file formats (NGFF) https://ngff.openmicroscopy.org for bio-imaging has all scales of the same 5D data in the same group.

I don't have an example at hand, but my impression is that satellite products that use HDF5 file format also place arrays with inconsistent dimensions / coordinates in the same group.

— Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/4118#issuecomment-1042656377, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJFVT27QD4RQDYZ2N4W7TU3SQ3BANCNFSM4NQEIKFQ . You are receiving this because you were mentioned.Message ID: @.***>

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature Request: Hierarchical storage and processing in xarray 628719058
901598698 https://github.com/pydata/xarray/issues/4118#issuecomment-901598698 https://api.github.com/repos/pydata/xarray/issues/4118 IC_kwDOAMm_X841vU3q shoyer 1217238 2021-08-19T04:23:15Z 2021-08-19T04:23:15Z MEMBER

However, if one of the variables has the same name as one of the groups (which I think is permitted in the netCDF format), then there is no easy way to access all the elements whilst retaining the nice syntax.

NetCDF does not allow variables and groups with the same name, e..g, ```python import netCDF4

nc = netCDF4.Dataset('testing.nc', 'w') nc.createVariable('foo', float) nc.createGroup('foo')

RuntimeError: NetCDF: String match to name in use

```

I'm pretty sure this is also prohibited for all HDF5 files, just like how you can't have a directory and file with the same name on most filesystems.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature Request: Hierarchical storage and processing in xarray 628719058
873316602 https://github.com/pydata/xarray/issues/4118#issuecomment-873316602 https://api.github.com/repos/pydata/xarray/issues/4118 MDEyOklzc3VlQ29tbWVudDg3MzMxNjYwMg== shoyer 1217238 2021-07-03T00:40:55Z 2021-07-03T00:40:55Z MEMBER

if you used tags wouldn't you lose the ability to round-trip a netCDF file with groups?

That sounds right to me -- a downside of tags is that they can't be (uniquely) expressed in a hierarchical arrangement like those found in HDF5/netCDF4 files.

But if this is a better way to organize data in memory, we could consider how to make an adapter layer for on disk storage.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature Request: Hierarchical storage and processing in xarray 628719058
873227326 https://github.com/pydata/xarray/issues/4118#issuecomment-873227326 https://api.github.com/repos/pydata/xarray/issues/4118 MDEyOklzc3VlQ29tbWVudDg3MzIyNzMyNg== shoyer 1217238 2021-07-02T19:55:31Z 2021-07-02T19:55:31Z MEMBER

@martinitus raises a really interesting point about tags vs hierarchical structures over in https://github.com/pydata/xarray/issues/1092#issuecomment-868324949

However, one point I didn't see in the discussion is the following:

Hierarchical structures often force a user to come up with some arbitrary order of hierarchy levels. The classical example is document filing: do you put your health insurance documents under /insurance/health/2021, 2021/health/insurance,....?

One solution to that is a tagging of documents instead of putting them into a hierarchy. This would give the full flexibility to retrieve any flat DataSet out of a TaggedDataSet by specifying the set of tags that the individual DataArrays must be listed under.

I think using tags is a really interesting alternative to hierarchies. I don't have a clear sense of the overall tradeoffs, though.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature Request: Hierarchical storage and processing in xarray 628719058
807908489 https://github.com/pydata/xarray/issues/4118#issuecomment-807908489 https://api.github.com/repos/pydata/xarray/issues/4118 MDEyOklzc3VlQ29tbWVudDgwNzkwODQ4OQ== shoyer 1217238 2021-03-26T03:24:48Z 2021-03-26T03:24:48Z MEMBER

I'm excited to see this coming together! I would be happy to advise as well...

Side note: at some point, this would probably be worth adding to Xarray's official roadmap.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature Request: Hierarchical storage and processing in xarray 628719058
638481215 https://github.com/pydata/xarray/issues/4118#issuecomment-638481215 https://api.github.com/repos/pydata/xarray/issues/4118 MDEyOklzc3VlQ29tbWVudDYzODQ4MTIxNQ== shoyer 1217238 2020-06-03T21:52:53Z 2020-06-03T23:08:47Z MEMBER

The data model you sketch out here looks very similar to what we discussed in #1092. I agree that the semantics are well defined.

The main question in my mind is whether it would make more sense to make an entirely new data structure (e.g., xarray.TreeDataset) or add in a new feature like groups to the existing xarray.Dataset.

Probably a new data structure would be easier at this point, because would keep Dataset simpler and wouldn't break existing code that works on xarray.Dataset.

{
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature Request: Hierarchical storage and processing in xarray 628719058
638478790 https://github.com/pydata/xarray/issues/4118#issuecomment-638478790 https://api.github.com/repos/pydata/xarray/issues/4118 MDEyOklzc3VlQ29tbWVudDYzODQ3ODc5MA== shoyer 1217238 2020-06-03T21:46:48Z 2020-06-03T21:46:48Z MEMBER

I would be open to exploring adding a hierarchical data structure into xarray (on an experimental basis, to start), but it would need someone with serious interest and time to make it happen. Certainly there are plenty of use cases across various fields.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature Request: Hierarchical storage and processing in xarray 628719058

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3741.335ms · About: xarray-datasette