home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where author_association = "MEMBER" and issue = 140291221 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • mrocklin 5
  • shoyer 3

issue 1

  • dask.async.RuntimeError: NetCDF: HDF error on xarray to_netcdf · 8 ✖

author_association 1

  • MEMBER · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
199547343 https://github.com/pydata/xarray/issues/793#issuecomment-199547343 https://api.github.com/repos/pydata/xarray/issues/793 MDEyOklzc3VlQ29tbWVudDE5OTU0NzM0Mw== shoyer 1217238 2016-03-22T00:01:52Z 2016-03-22T00:01:52Z MEMBER

This should be pretty easy -- we'll just need to add lock=threading.Lock() to this line: https://github.com/pydata/xarray/blob/v0.7.2/xarray/backends/common.py#L165

The only subtlety is that this needs to be done in a way that is dependent on the version of dask, because the keyword argument is new -- something like if dask.__version__ > '0.8.1'.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.async.RuntimeError: NetCDF: HDF error on xarray to_netcdf 140291221
196924992 https://github.com/pydata/xarray/issues/793#issuecomment-196924992 https://api.github.com/repos/pydata/xarray/issues/793 MDEyOklzc3VlQ29tbWVudDE5NjkyNDk5Mg== shoyer 1217238 2016-03-15T17:04:57Z 2016-03-15T17:27:29Z MEMBER

I did a little digging into this and I'm pretty sure the issue here is that HDF5 cannot do multi-threading -- at all. Moreover, many HDF5 builds are not thread safe.

Right now, we use a single shared lock for all reads with xarray, but for writes we rely on dask.array.store, which only uses different locks for each array it writes. Because @pwolfram's HDF5 file includes multiple variables, each of these gets written with their own thread lock -- which means we end up writing to the same file simultaneously from multiple threads.

So what we could really use here is a lock argument to dask.array.store (like dask.array.from_array) that lets us insist on a using a shared lock when we're writing HDF5 files. Also, we may need to share that same lock between reading and writing data -- I'm not 100% sure. But at the very least we definitely need a lock to stop HDF5 from trying to do multi-threaded writes, whether that's to the same or different files.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.async.RuntimeError: NetCDF: HDF error on xarray to_netcdf 140291221
196935638 https://github.com/pydata/xarray/issues/793#issuecomment-196935638 https://api.github.com/repos/pydata/xarray/issues/793 MDEyOklzc3VlQ29tbWVudDE5NjkzNTYzOA== mrocklin 306380 2016-03-15T17:26:41Z 2016-03-15T17:26:41Z MEMBER

https://github.com/dask/dask/pull/1053

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.async.RuntimeError: NetCDF: HDF error on xarray to_netcdf 140291221
195811381 https://github.com/pydata/xarray/issues/793#issuecomment-195811381 https://api.github.com/repos/pydata/xarray/issues/793 MDEyOklzc3VlQ29tbWVudDE5NTgxMTM4MQ== mrocklin 306380 2016-03-12T21:32:56Z 2016-03-12T21:32:56Z MEMBER

To be clear, we ran into the NetCDF: HDF error error when having multiple threads in the same process open-read-close many different files. I don't think there was any concurrent access of the same file. The problem went away when we switched to using processes rather than threads.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.async.RuntimeError: NetCDF: HDF error on xarray to_netcdf 140291221
195637636 https://github.com/pydata/xarray/issues/793#issuecomment-195637636 https://api.github.com/repos/pydata/xarray/issues/793 MDEyOklzc3VlQ29tbWVudDE5NTYzNzYzNg== shoyer 1217238 2016-03-12T02:19:18Z 2016-03-12T02:19:18Z MEMBER

I'm pretty sure we now have a thread lock around all writes to NetCDF files, but it's possible that isn't aggressive enough (maybe we can't safely read and write a different file at the same time?). If your script works with synchronous execution I'll take another look.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.async.RuntimeError: NetCDF: HDF error on xarray to_netcdf 140291221
195573297 https://github.com/pydata/xarray/issues/793#issuecomment-195573297 https://api.github.com/repos/pydata/xarray/issues/793 MDEyOklzc3VlQ29tbWVudDE5NTU3MzI5Nw== mrocklin 306380 2016-03-11T22:13:28Z 2016-03-11T22:13:28Z MEMBER

Yes, my apologies for the typo.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.async.RuntimeError: NetCDF: HDF error on xarray to_netcdf 140291221
195562924 https://github.com/pydata/xarray/issues/793#issuecomment-195562924 https://api.github.com/repos/pydata/xarray/issues/793 MDEyOklzc3VlQ29tbWVudDE5NTU2MjkyNA== mrocklin 306380 2016-03-11T21:29:46Z 2016-03-11T21:29:46Z MEMBER

Sure. I'm not proposing any particular approach. I'm just supporting your previous idea that maybe the problem is having too many open file handles. It would be good to check this before diving into threading or concurrency issues.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.async.RuntimeError: NetCDF: HDF error on xarray to_netcdf 140291221
195557013 https://github.com/pydata/xarray/issues/793#issuecomment-195557013 https://api.github.com/repos/pydata/xarray/issues/793 MDEyOklzc3VlQ29tbWVudDE5NTU1NzAxMw== mrocklin 306380 2016-03-11T21:16:41Z 2016-03-11T21:16:41Z MEMBER

1024 might be a common open file handle limit. Some things to try to isolate the issue: 1. Try this with dask.set_globals(get=dask.async.get_sync) to turn off threading 2. Try just opening all of the files and see if the NetCDF error presents itself under normal operation

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.async.RuntimeError: NetCDF: HDF error on xarray to_netcdf 140291221

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 5998.81ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows