home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

15 rows where issue = 29453809 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • shoyer 7
  • alimanfoo 5
  • akleeman 1
  • ToddSmall 1
  • tomchor 1

author_association 3

  • CONTRIBUTOR 7
  • MEMBER 7
  • NONE 1

issue 1

  • HDF5 backend for xray · 15 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
485291578 https://github.com/pydata/xarray/issues/66#issuecomment-485291578 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDQ4NTI5MTU3OA== shoyer 1217238 2019-04-21T23:55:02Z 2019-04-21T23:55:02Z MEMBER

Xarray will never be able to read arbitrary HDF5 files. The full HDF5 data model is far more complicated than any data structure xarray supports.

Using h5py directly is your best bet for HDF5 files that aren’t also netcdf files.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
485287247 https://github.com/pydata/xarray/issues/66#issuecomment-485287247 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDQ4NTI4NzI0Nw== tomchor 13205162 2019-04-21T22:37:48Z 2019-04-21T22:37:48Z CONTRIBUTOR

Have there been any developments for HDF5 support? I've been trying to read HDF5 data from Dedalus but I've been having a hard time.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
338782661 https://github.com/pydata/xarray/issues/66#issuecomment-338782661 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDMzODc4MjY2MQ== shoyer 1217238 2017-10-23T20:14:36Z 2017-10-23T20:14:36Z MEMBER

I've been looking at the h5netcdf code recently to understand better how dimensions are plumbed in netcdf4.

It's pretty messy, to be honest :). The HDF5 dimension scale API is highly flexible, and netCDF4 only uses a small part of it.

I'm exploring refactoring all my data model classes in scikit-allel to build on xarray, I think the time is right, especially if xarray gets a Zarr backend too.

Interesting -- I'd love to hear how this goes! Please don't hesitate to file issues when problems come up (though you're already off to a good start).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
338459385 https://github.com/pydata/xarray/issues/66#issuecomment-338459385 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDMzODQ1OTM4NQ== alimanfoo 703554 2017-10-22T08:02:29Z 2017-10-22T08:02:29Z CONTRIBUTOR

Just to say thanks for the work on this, I've been looking at the h5netcdf code recently to understand better how dimensions are plumbed in netcdf4. I'm exploring refactoring all my data model classes in scikit-allel to build on xarray, I think the time is right, especially if xarray gets a Zarr backend too.

On Sun, 22 Oct 2017 at 02:01, Stephan Hoyer notifications@github.com wrote:

Closed #66 https://github.com/pydata/xarray/issues/66.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/66#event-1304360167, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QqPs_6iyjBqHhFoB2CV7blLX8TUYks5supQEgaJpZM4BpxKD .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: https://twitter.com/alimanfoo

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
90828627 https://github.com/pydata/xarray/issues/66#issuecomment-90828627 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDkwODI4NjI3 shoyer 1217238 2015-04-08T07:20:32Z 2015-04-08T07:20:32Z MEMBER

Note that h5netcdf won't (yet) let you read any HDF5 files you couldn't already read with netCDF4-python -- it just gives us an alternative backend to use. One thing we could do that's not supported by netCDF is potentially read HDF5 dimension labels. The original netCDF4 library only understands dimension scales -- which, to be honest, seems like a less natural fit to me than reading dimension labels.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
90813596 https://github.com/pydata/xarray/issues/66#issuecomment-90813596 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDkwODEzNTk2 alimanfoo 703554 2015-04-08T06:04:53Z 2015-04-08T06:04:53Z CONTRIBUTOR

Thanks Stephan, I'll take a look.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
90798866 https://github.com/pydata/xarray/issues/66#issuecomment-90798866 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDkwNzk4ODY2 shoyer 1217238 2015-04-08T04:21:15Z 2015-04-08T04:21:15Z MEMBER

I wrote a little library to read and write netCDF4 files via h5py the other day: https://github.com/shoyer/h5netcdf

I also merged a preliminary backend for it into xray that should work if you use engine='h5netcdf'. So I think we can consider this issue resolved!

I've also been looking into the netCDF4 data model in a bit more detail, and the good news is that it looks like it does, at least theoretically, support hierarchical dimension scales. This doesn't work in h5netcdf yet, but would be easy to add. Read support into xray would also be straightforward.

Figuring out how to write a hierarchy of xray datasets into the format is less obvious, however. We might need something like a HierarchicalDataset object. I guess using / with variable names in normal Dataset objects would work, though it would help to have something like a pandas MultiIndex to make it easier to actually work with all those variable names.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
42872192 https://github.com/pydata/xarray/issues/66#issuecomment-42872192 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDQyODcyMTky shoyer 1217238 2014-05-12T18:51:21Z 2014-05-12T18:51:21Z MEMBER

In principle, I think dimension scales are all we need to interpret HDF5 files as xray Datasets. That's also most of what you need to make a netCDF4 file, but I would not be surprised if NetCDF libraries have issues with HDF5 files that don't conform to every last NetCDF convention. For reference, here is the full NetCDF4 spec (pretty short!): https://www.unidata.ucar.edu/software/netcdf/docs/netcdf/NetCDF_002d4-Format.html

We don't yet support reading from groups or subgroups (other than the root group '/'), but I agree this would be a nice feature. It would seem straightforward enough to add some option to read variables from subgroups recursively, although I'm sure there are some subtleties to get the API right. Yours is an interesting use of dimension scales (and it makes complete sense), but I'm not sure if the NetCDF4 model supports that sort of thing.

To support HDF5 properly, including interesting use cases like yours, I think it we should probably write our own interface to h5py, instead of reading everything through the NetCDF libraries. Ideally, we could set this up to write HDF5 as (mostly) valid NetCDF4, at least in the simpler cases where that makes sense.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
42869488 https://github.com/pydata/xarray/issues/66#issuecomment-42869488 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDQyODY5NDg4 alimanfoo 703554 2014-05-12T18:29:57Z 2014-05-12T18:29:57Z CONTRIBUTOR

One other detail, I have an HDF5 group for each conceptual dataset, but then variables may be organised into subgroups. It would be nice if this could be accommodated, e.g., when opening an HDF5 group as an xray dataset, assume the dataset contains all variables in the group and any subgroups searched recursively. Again apologies I don't know if this is allowed in NetCDF4, will do the research.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
42840763 https://github.com/pydata/xarray/issues/66#issuecomment-42840763 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDQyODQwNzYz alimanfoo 703554 2014-05-12T14:45:57Z 2014-05-12T14:45:57Z CONTRIBUTOR

Thanks @akleeman for the info, much appreciated.

A couple of other points I thought maybe worth mentioning if you're considering wrapping h5py.

First I've been using lzf as the compression filter in my HDF5 files. I believe h5py bundles the source for lzf. I don't know if lzf would be supported if accessing through the python netcdf API.

Second, I have a situation where I have multiple datasets, each of which is stored in a separate groups, each of which has two dimensions (genome position and biological sample). The genome position scale is different for each dataset (there's one dataset per chromosome), however, the biological sample scale is actually common to all of the datasets. So at the moment I have a variable in the root group with the "samples" dimension scale, then each dataset group has it's own "position" dimension scale. You can represent all this with HDF5 dimension scales, but I've no idea if this is accommodated by NetCDF4 or could fit into the xray model. I could work around this by copying the samples variable into each dataset, but just thought I mention this pattern as something to be aware of.

On Mon, May 12, 2014 at 3:04 PM, akleeman notifications@github.com wrote:

@alimanfoo https://github.com/alimanfoo

Glad you're enjoying xray!

From your description it sounds like it should be relatively simple for you to get xray working with your dataset. NetCDF4 is a subset of h5py and simply adding dimension scales should get you most of the way there.

Re: groups, each xray.Dataset corresponds to one HDF5 group. So while xray doesn't currently support groups, you could split your HDF5 dataset into separate files for each group and load those files using xray. Alternatively (if you feel ambitious) it shouldn't be too hard to get xray's NetCDF4DataStore (backends.netCDF4_.py) to work with groups, allowing you to do something like:

dataset = xray.open_dataset('multiple_groups.h5', group='/one_group')

Thishttp://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-module.htmlgives some good examples of how groups work within the netCDF4.

Also, as @shoyer https://github.com/shoyer mentioned, it might make sense to modify xray so that NetCDF4 support is obtained by wrapping h5py instead of netCDF4 which might make your life even easier.

Reply to this email directly or view it on GitHubhttps://github.com/xray-pydata/xray/issues/66#issuecomment-42835510 .

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Web: http://purl.org/net/aliman Email: alimanfoo@gmail.com Tel: +44 (0)1865 287721 _new number_

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
42835510 https://github.com/pydata/xarray/issues/66#issuecomment-42835510 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDQyODM1NTEw akleeman 514053 2014-05-12T14:04:11Z 2014-05-12T14:04:11Z CONTRIBUTOR

@alimanfoo

Glad you're enjoying xray!

From your description it sounds like it should be relatively simple for you to get xray working with your dataset. NetCDF4 is a subset of h5py and simply adding dimension scales should get you most of the way there.

Re: groups, each xray.Dataset corresponds to one HDF5 group. So while xray doesn't currently support groups, you could split your HDF5 dataset into separate files for each group and load those files using xray. Alternatively (if you feel ambitious) it shouldn't be too hard to get xray's NetCDF4DataStore (backends.netCDF4_.py) to work with groups, allowing you to do something like:

dataset = xray.open_dataset('multiple_groups.h5', group='/one_group')

This gives some good examples of how groups work within the netCDF4.

Also, as @shoyer mentioned, it might make sense to modify xray so that NetCDF4 support is obtained by wrapping h5py instead of netCDF4 which might make your life even easier.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
42805550 https://github.com/pydata/xarray/issues/66#issuecomment-42805550 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDQyODA1NTUw alimanfoo 703554 2014-05-12T08:08:37Z 2014-05-12T08:08:37Z CONTRIBUTOR

I'm really enjoying working with xray, it's so nice to be able to think of my dimensions as named and labeled dimensions, no more remembering which axis is which!

I'm not sure if this is relevant to this specific issue, but I am working for the most part with HDF5 files created using h5py. I'm only just learning about NetCDF-4, but I have datasets that comprise a number of 1D and 2D variables with shared dimensions, so I think my data is already very close to the right model. I have a couple of questions:

(1) If I have multiple datasets within an HDF5 file, each within a separate group, can I access those through xray?

(2) What would I need to add to my HDF5 to make it fully compliant with the xray/NetCDF4 model? Is it just a question of creating and attaching dimension scales or would I need to do something else as well?

Thanks in advance.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
40737375 https://github.com/pydata/xarray/issues/66#issuecomment-40737375 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDQwNzM3Mzc1 shoyer 1217238 2014-04-17T17:03:36Z 2014-04-17T17:03:36Z MEMBER

I did a little bit of research into the HDF5 file-format last night and how it maps on the NetCDF data model: https://www.unidata.ucar.edu/software/netcdf/docs/netcdf/NetCDF_002d4-Format.html

HDF5 has a notion of "dimension scales" which implement shared dimensions. The bad news is that pytables does not support them, although h5py does. As @ToddSmall shows in his example above, pytables supports getting file images for HDF5 files, but unfortunately h5py does not implement file image operations. So it looks like there are not currently any existing solutions that will let us implement our data model in HDF5 with file images :(.

On the plus side, it does look like it would be pretty simple to implement the NetCDF4 file format directly via h5py. This is something worth considering, because the codebase for the h5py project looks much cleaner than netCDF4-python and has better test coverage. I can also verify that it is straightforward to open and interpret NetCDF4 files via pytables or h5py.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
38005951 https://github.com/pydata/xarray/issues/66#issuecomment-38005951 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDM4MDA1OTUx shoyer 1217238 2014-03-19T00:32:38Z 2014-03-19T00:32:38Z MEMBER

Thanks @ToddSmall!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809
38004488 https://github.com/pydata/xarray/issues/66#issuecomment-38004488 https://api.github.com/repos/pydata/xarray/issues/66 MDEyOklzc3VlQ29tbWVudDM4MDA0NDg4 ToddSmall 4194485 2014-03-19T00:08:03Z 2014-03-19T00:08:03Z NONE

Here's an elementary example of using HDF5 memory images to pass self-describing binary data between processes using pytables:

``` python import numpy as np import os import tables

pipe_name = '/tmp/my-pipe' driver = "H5FD_CORE" my_array_name = "My array" my_attribute = "Drummer is cool!" my_title = "My title"

def child(): h5_file = tables.open_file("in-memory", title=my_title, mode="w", driver=driver, driver_core_backing_store=0) h5_file.create_array(h5_file.root, "array", np.array([0., -1., 1., -2., 2.]), title=my_array_name) h5_file.root.array.attrs.my_attribute = my_attribute image = h5_file.get_file_image() h5_file.close()

pipeout = open(pipe_name, 'w')
pipeout.write(image)
pipeout.flush()

def parent(): pipein = open(pipe_name, 'r') image = pipein.read()

h5_file = tables.open_file("in-memory", mode="r", driver=driver,
                           driver_core_image=image,
                           driver_core_backing_store=0)
print("my_title is \"%s\"." % h5_file.title)
print("my_attribute is \"%s\"." % h5_file.root.array.attrs.my_attribute)
print("my_array_name is \"%s\"." % h5_file.root.array.title)
print("array data is \"%s\"." % str(h5_file.root.array[:]))
h5_file.close()

if not os.path.exists(pipe_name): os.mkfifo(pipe_name)

pid = os.fork() if pid: parent() else: child() ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 backend for xray 29453809

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.077ms · About: xarray-datasette