home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 325226495

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/1528#issuecomment-325226495 https://api.github.com/repos/pydata/xarray/issues/1528 325226495 MDEyOklzc3VlQ29tbWVudDMyNTIyNjQ5NQ== 1197350 2017-08-27T21:38:35Z 2017-08-27T21:38:35Z MEMBER

Could you comment more on the difference between your approach and mine?

Your functions are a great proof of concept for the relative ease of interoperability between xarray and zarr. What I have done here is to implement an xarray "backend" (i.e. DataStore) that uses zarr as its storage medium. This puts zarr on the same level as netCDF and HDF5 as a "first class" storage format for xarray data, as suggested by @shoyer in the comment on that thread. My hope is that this will enable the magical performance benefits that you have anticipated.

Digging deeper into that thread, I see @shoyer makes the following proposition:

So we could either directly write a DataStore or write a separate "znetcdf" or "netzdf" module that implements an interface similar to h5netcdf (which itself is a thin wrapper on top of h5py).

With this PR, I have started to do the former (write a DataStore). However, I can already see the wisdom of what he says next:

All things being equal, I would prefer the later approach, because people seem to find these intermediate interfaces useful, and it would help clarify the specification of the file format vs. details of how xarray uses it.

I have already implemented my own custom DataStore for a different project, so I felt comfortable diving into this. But I might end up reinventing the wheel several times over if I continue down this road. In particular, I can see that my HiddenKeyDict is very similar to h5netcdf's treatment of attributes. (I had never looked at the h5netcdf code until just now!)

On the other hand, zarr is so simple to use that a separate wrapper package might be overkill.

So I am still not sure whether the approach I am taking here is worth pursuing further. I consider this a highly experimental PR, and I'm really looking for feedback.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  253136694
Powered by Datasette · Queries took 72.73ms · About: xarray-datasette