issue_comments: 255800363
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/798#issuecomment-255800363 | https://api.github.com/repos/pydata/xarray/issues/798 | 255800363 | MDEyOklzc3VlQ29tbWVudDI1NTgwMDM2Mw== | 306380 | 2016-10-24T17:00:58Z | 2016-10-24T17:00:58Z | MEMBER | One alternative would be to define custom serialization for I've been toying with the idea of custom serialization for dask.distributed recently. This was originally intended to let Dask make some opinionated serialization choices for some common formats (usually so that we can serialize numpy arrays and pandas dataframes faster than their generic pickle implementations allow) but this might also be helpful here to allow us to serialize netCDF4.Dataset objects and friends. We would define custom dumps and loads functions for netCDF4.Dataset objects that would presumably encode them as a filename and datapath. This would get around the open-many-files issue because the dataset would stay in the worker's One concern is that there are reasons why netCDF4.Dataset objects are not serializable (see https://github.com/h5py/h5py/issues/531). I'm not sure if this would affect XArray workloads. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
142498006 |