home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 255800363

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/798#issuecomment-255800363 https://api.github.com/repos/pydata/xarray/issues/798 255800363 MDEyOklzc3VlQ29tbWVudDI1NTgwMDM2Mw== 306380 2016-10-24T17:00:58Z 2016-10-24T17:00:58Z MEMBER

One alternative would be to define custom serialization for netCDF4.Dataset objects.

I've been toying with the idea of custom serialization for dask.distributed recently. This was originally intended to let Dask make some opinionated serialization choices for some common formats (usually so that we can serialize numpy arrays and pandas dataframes faster than their generic pickle implementations allow) but this might also be helpful here to allow us to serialize netCDF4.Dataset objects and friends.

We would define custom dumps and loads functions for netCDF4.Dataset objects that would presumably encode them as a filename and datapath. This would get around the open-many-files issue because the dataset would stay in the worker's .data dictionary while it was needed.

One concern is that there are reasons why netCDF4.Dataset objects are not serializable (see https://github.com/h5py/h5py/issues/531). I'm not sure if this would affect XArray workloads.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  142498006
Powered by Datasette · Queries took 0.672ms · About: xarray-datasette