home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 663148659

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
663148659 MDU6SXNzdWU2NjMxNDg2NTk= 4242 Expose xarray's h5py serialization capabilites as public API? 1197350 open 0     5 2020-07-21T16:27:45Z 2024-03-20T13:33:15Z   MEMBER      

Xarray has a magic ability to serialize h5py datasets. We should expose this somehow and allow it to be used outside of xarray.

Consider the following example:

```python import s3fs import h5py import dask.array as dsa import xarray as xr import cloudpickle

url = 'noaa-goes16/ABI-L2-RRQPEF/2020/001/00/OR_ABI-L2-RRQPEF-M6_G16_s20200010000216_e20200010009524_c20200010010034.nc' fs = s3fs.S3FileSystem(anon=True) f = fs.open(url) ds = h5py.File(f, mode='r') data = dsa.from_array(ds['RRQPE']) _ = cloudpickle.dumps(data) ```

This raises TypeError: h5py objects cannot be pickled.

However, if I read the file with xarray... python ds = xr.open_dataset(f, chunks={}) data = ds['RRQPE'].data _ = cloudpickle.dumps(data)

It works just fine. This has come up in several places (e.g. https://github.com/dask/s3fs/issues/337, https://github.com/dask/distributed/issues/2787).

It seems like the ability to pickle these arrays is broadly useful, beyond xarray.

  1. How does our magic work?
  2. What would it look like to break this magic out and expose it as public API (or inside another package)
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4242/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 160.128ms · About: xarray-datasette