issues: 412623833
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
412623833 | MDU6SXNzdWU0MTI2MjM4MzM= | 2781 | enable reading of file-like HDF5 objects | 3924836 | closed | 0 | 2 | 2019-02-20T20:55:15Z | 2019-03-16T00:35:57Z | 2019-03-16T00:35:57Z | MEMBER | xarray 11.3 currently won't read HDF5 file-like objects```python import xarray as xr import gcsfs fs = gcsfs.GCSFileSystem() images = fs.ls('pangeo-data/grfn-v2/137/') fileObj = fs.open('pangeo-data/grfn-v2/137/S1-GUNW-A-R-137-tops-20181129_20181123-020010-43220N_41518N-PP-e2c7-v2_0_0.nc') but, can we open this w/ xarray anyway? Yes! with modifications to xarray and h5netcdfda = xr.open_dataset(fileObj, group='/science/grids/data', engine='h5netcdf') da ``` ```pytbValueError Traceback (most recent call last) <ipython-input-3-22e0010de1f2> in <module>() 1 # but, can we open this w/ xarray anyway? Yes! with modifications to xarray and h5netcdf ----> 2 da = xr.open_dataset(fileObj, group='/science/grids/data', engine='h5netcdf') 3 da /srv/conda/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs) 347 else: 348 if engine is not None and engine != 'scipy': --> 349 raise ValueError('can only read file-like objects with ' 350 "default engine or engine='scipy'") 351 # assume filename_or_obj is a file-like object ValueError: can only read file-like objects with default engine or engine='scipy' ``` Problem descriptionIt is now possible to do this with h5py >2.9.0. see https://github.com/h5py/h5py/pull/1105. This would be a useful feature because there is a lot of NASA data out there in HDF5. This functionality could open up reading without first writing to disk (to translate to Zarr or other formats possibly). There seem to be many issues related to this: https://github.com/dask/s3fs/issues/144 https://github.com/pydata/xarray/issues/2535 I'm guessing adding this functionality doesn't fix many of the performance issues related to HDF5 and Dask https://github.com/dask/dask/issues/2488 https://github.com/dask/distributed/issues/2319 Expected Output
Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2781/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |