home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 42840763

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/66#issuecomment-42840763 https://api.github.com/repos/pydata/xarray/issues/66 42840763 MDEyOklzc3VlQ29tbWVudDQyODQwNzYz 703554 2014-05-12T14:45:57Z 2014-05-12T14:45:57Z CONTRIBUTOR

Thanks @akleeman for the info, much appreciated.

A couple of other points I thought maybe worth mentioning if you're considering wrapping h5py.

First I've been using lzf as the compression filter in my HDF5 files. I believe h5py bundles the source for lzf. I don't know if lzf would be supported if accessing through the python netcdf API.

Second, I have a situation where I have multiple datasets, each of which is stored in a separate groups, each of which has two dimensions (genome position and biological sample). The genome position scale is different for each dataset (there's one dataset per chromosome), however, the biological sample scale is actually common to all of the datasets. So at the moment I have a variable in the root group with the "samples" dimension scale, then each dataset group has it's own "position" dimension scale. You can represent all this with HDF5 dimension scales, but I've no idea if this is accommodated by NetCDF4 or could fit into the xray model. I could work around this by copying the samples variable into each dataset, but just thought I mention this pattern as something to be aware of.

On Mon, May 12, 2014 at 3:04 PM, akleeman notifications@github.com wrote:

@alimanfoo https://github.com/alimanfoo

Glad you're enjoying xray!

From your description it sounds like it should be relatively simple for you to get xray working with your dataset. NetCDF4 is a subset of h5py and simply adding dimension scales should get you most of the way there.

Re: groups, each xray.Dataset corresponds to one HDF5 group. So while xray doesn't currently support groups, you could split your HDF5 dataset into separate files for each group and load those files using xray. Alternatively (if you feel ambitious) it shouldn't be too hard to get xray's NetCDF4DataStore (backends.netCDF4_.py) to work with groups, allowing you to do something like:

dataset = xray.open_dataset('multiple_groups.h5', group='/one_group')

Thishttp://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-module.htmlgives some good examples of how groups work within the netCDF4.

Also, as @shoyer https://github.com/shoyer mentioned, it might make sense to modify xray so that NetCDF4 support is obtained by wrapping h5py instead of netCDF4 which might make your life even easier.

Reply to this email directly or view it on GitHubhttps://github.com/xray-pydata/xray/issues/66#issuecomment-42835510 .

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Web: http://purl.org/net/aliman Email: alimanfoo@gmail.com Tel: +44 (0)1865 287721 _new number_

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  29453809
Powered by Datasette · Queries took 0.529ms · About: xarray-datasette