home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 412623833

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
412623833 MDU6SXNzdWU0MTI2MjM4MzM= 2781 enable reading of file-like HDF5 objects 3924836 closed 0     2 2019-02-20T20:55:15Z 2019-03-16T00:35:57Z 2019-03-16T00:35:57Z MEMBER      

xarray 11.3 currently won't read HDF5 file-like objects

```python import xarray as xr import gcsfs fs = gcsfs.GCSFileSystem() images = fs.ls('pangeo-data/grfn-v2/137/') fileObj = fs.open('pangeo-data/grfn-v2/137/S1-GUNW-A-R-137-tops-20181129_20181123-020010-43220N_41518N-PP-e2c7-v2_0_0.nc')

but, can we open this w/ xarray anyway? Yes! with modifications to xarray and h5netcdf

da = xr.open_dataset(fileObj, group='/science/grids/data', engine='h5netcdf') da ```

```pytb

ValueError Traceback (most recent call last) <ipython-input-3-22e0010de1f2> in <module>() 1 # but, can we open this w/ xarray anyway? Yes! with modifications to xarray and h5netcdf ----> 2 da = xr.open_dataset(fileObj, group='/science/grids/data', engine='h5netcdf') 3 da

/srv/conda/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs) 347 else: 348 if engine is not None and engine != 'scipy': --> 349 raise ValueError('can only read file-like objects with ' 350 "default engine or engine='scipy'") 351 # assume filename_or_obj is a file-like object

ValueError: can only read file-like objects with default engine or engine='scipy' ```

Problem description

It is now possible to do this with h5py >2.9.0. see https://github.com/h5py/h5py/pull/1105. This would be a useful feature because there is a lot of NASA data out there in HDF5. This functionality could open up reading without first writing to disk (to translate to Zarr or other formats possibly). There seem to be many issues related to this: https://github.com/dask/s3fs/issues/144 https://github.com/pydata/xarray/issues/2535

I'm guessing adding this functionality doesn't fix many of the performance issues related to HDF5 and Dask https://github.com/dask/dask/issues/2488 https://github.com/dask/distributed/issues/2319

Expected Output

<xarray.Dataset> Dimensions: (latitude: 2045, longitude: 4158) Coordinates: * longitude (longitude) float64 -123.1 -123.1 ... -119.6 -119.6 * latitude (latitude) float64 43.22 43.22 43.22 ... 41.52 41.52 Data variables: crs int32 ... unwrappedPhase (latitude, longitude) float32 ... coherence (latitude, longitude) float32 ... connectedComponents (latitude, longitude) float32 ... amplitude (latitude, longitude) float32 ...

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Nov 21 2018, 03:09:43) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.14.65+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.11.3 pandas: 0.24.1 numpy: 1.16.1 scipy: 1.2.0 netCDF4: 1.4.2 pydap: None h5netcdf: 0.6.2 h5py: 2.9.0 Nio: None zarr: 2.2.0 cftime: 1.0.3.4 PseudonetCDF: None rasterio: 1.0.18 cfgrib: None iris: None bottleneck: None cyordereddict: None dask: 1.1.0 distributed: 1.25.2 matplotlib: 3.0.2 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.7.1 pip: 19.0.2 conda: 4.6.3 pytest: None IPython: 7.1.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2781/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 0.599ms · About: xarray-datasette