home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "MEMBER" and issue = 202260275 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • mrocklin 2
  • shoyer 1

issue 1

  • zarr as persistent store for xarray · 3 ✖

author_association 1

  • MEMBER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
274230041 https://github.com/pydata/xarray/issues/1223#issuecomment-274230041 https://api.github.com/repos/pydata/xarray/issues/1223 MDEyOklzc3VlQ29tbWVudDI3NDIzMDA0MQ== shoyer 1217238 2017-01-21T03:18:38Z 2017-01-21T03:21:19Z MEMBER

@martindurant thanks for posting this as an issue -- I didn't get a notification from your ping in the gist.

I agree that serializing xarray objects to zarr should be pretty straightforward and seems quite useful.

To properly handle edge cases like strange data types (e.g., datetime64 or object) and Dataset objects, we probably want to integrate this with xarray existing conventions handling and DataStore interface. This will be good motivation for me to finish up my refactor in https://github.com/pydata/xarray/pull/1087 -- right now the interface is a bit more complex than needed, and doesn't do a good job of abstracting details like whether file formats need locking.

So we could either directly write a DataStore or write a separate "znetcdf" or "netzdf" module that implements an interface similar to h5netcdf (which itself is a thin wrapper on top of h5py). All things being equal, I would prefer the later approach, because people seem to find these intermediate interfaces useful, and it would help clarify the specification of the file format vs. details of how xarray uses it.

As far as the spec goes, I agree that JSON is the sensible file format. Really, all we need on top of zarr is: - specified dimensions sizes, stored at the group level (Dict[str, int]) - a list of dimension names associated with each array (List[str]) - a small amount of validation logic to ensure that dimensions used on an array exist (on the array's group or one of its parents) and are consistent

This could make sense either as part of zarr or a separate library. I would lean towards putting it in zarr only because that would be slightly more convenient, as we could safely make use of subclassing to add the extra functionality. zarr already handles hierarchies, arrays and metadata, which is most of the hard work.

I'm certainly quite open to integrate experimental data formats like this one into xarray, but ultimately of course it depends on interest from the community. This wouldn't even necessarily need to live in xarray proper (though that would be fine, too). For example, @rabernat wrote a DataStore for loading MIT GCM outputs (https://github.com/xgcm/xmitgcm).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  zarr as persistent store for xarray 202260275
274209930 https://github.com/pydata/xarray/issues/1223#issuecomment-274209930 https://api.github.com/repos/pydata/xarray/issues/1223 MDEyOklzc3VlQ29tbWVudDI3NDIwOTkzMA== mrocklin 306380 2017-01-20T23:47:29Z 2017-01-20T23:47:29Z MEMBER

Also cc @alimanfoo

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  zarr as persistent store for xarray 202260275
274200419 https://github.com/pydata/xarray/issues/1223#issuecomment-274200419 https://api.github.com/repos/pydata/xarray/issues/1223 MDEyOklzc3VlQ29tbWVudDI3NDIwMDQxOQ== mrocklin 306380 2017-01-20T22:46:44Z 2017-01-20T22:46:44Z MEMBER

This looks pretty cool to me. I expected it to be harder to encode xarray into zarr. Some thoughts/comments:

  1. Is it harder to encode a full xarray into zarr? Are there cases that are not covered by this example that are likely to occur in the wild (mostly a question for @shoyer)
  2. I guess one major thing missing is storing full Dataset objects rather than just DataArrays. I suspect that scientific users want to keep all of the variables and coordinates in a single artifact
  3. It would be nice to avoid using pickle if possible, so that the data could be cross-language.
  4. How open is the XArray community to adding experimental to/from_zarr methods?
  5. Eventually we probably want to do lazy_value = da.store(..., compute=False) and then compute all of them at once

@pwolfram @rabernat @jhamman

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  zarr as persistent store for xarray 202260275

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3680.429ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows