home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where author_association = "MEMBER", issue = 427410885 and user = 5821660 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • kmuehlbauer · 10 ✖

issue 1

  • Quadratic slowdown when saving multiple datasets to the same h5 file (h5netcdf) · 10 ✖

author_association 1

  • MEMBER · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1010713645 https://github.com/pydata/xarray/issues/2857#issuecomment-1010713645 https://api.github.com/repos/pydata/xarray/issues/2857 IC_kwDOAMm_X848PkQt kmuehlbauer 5821660 2022-01-12T07:15:39Z 2022-01-12T07:15:39Z MEMBER

This issue is fixed to some extent since h5netcdf 0.12.0.

h5netcdf does not reach the timings of netCDF4 engine, but the improvement is quite significant.

| Number of datasets in file | netCDF4 write (ms) | h5netcdf <= 0.11.0 write(ms) | h5netcdf >= 0.12.0 write (ms) | |-----|------|-----|-----| | 1 | 2 | 7 | 7 | | 250 | 104 | 1710 | 164 |

The issue can be closed.

Ping @aldanor.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Quadratic slowdown when saving multiple datasets to the same h5 file (h5netcdf) 427410885
999410802 https://github.com/pydata/xarray/issues/2857#issuecomment-999410802 https://api.github.com/repos/pydata/xarray/issues/2857 IC_kwDOAMm_X847kcxy kmuehlbauer 5821660 2021-12-22T09:11:05Z 2021-12-22T09:11:05Z MEMBER

FYI: h5netcdf has just merged a refactor of the dimension scale handling, which greatly improves the performance here. It will be released in the next version (0.13.0).

See https://github.com/h5netcdf/h5netcdf/pull/112

I'll come back if the release is out, so we can close this issue.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Quadratic slowdown when saving multiple datasets to the same h5 file (h5netcdf) 427410885
825579825 https://github.com/pydata/xarray/issues/2857#issuecomment-825579825 https://api.github.com/repos/pydata/xarray/issues/2857 MDEyOklzc3VlQ29tbWVudDgyNTU3OTgyNQ== kmuehlbauer 5821660 2021-04-23T11:01:04Z 2021-04-23T11:01:04Z MEMBER

@aldanor Could you please have a look into https://github.com/h5netcdf/h5netcdf/pull/101 for a fix. Any comments are very much appreciated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Quadratic slowdown when saving multiple datasets to the same h5 file (h5netcdf) 427410885
807344131 https://github.com/pydata/xarray/issues/2857#issuecomment-807344131 https://api.github.com/repos/pydata/xarray/issues/2857 MDEyOklzc3VlQ29tbWVudDgwNzM0NDEzMQ== kmuehlbauer 5821660 2021-03-25T19:34:55Z 2021-03-25T19:34:55Z MEMBER

@shoyer Could we move the entire issue? Or just open another one over at 'h5netcdf' and reference this one?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Quadratic slowdown when saving multiple datasets to the same h5 file (h5netcdf) 427410885
806982015 https://github.com/pydata/xarray/issues/2857#issuecomment-806982015 https://api.github.com/repos/pydata/xarray/issues/2857 MDEyOklzc3VlQ29tbWVudDgwNjk4MjAxNQ== kmuehlbauer 5821660 2021-03-25T15:48:35Z 2021-03-25T15:48:35Z MEMBER

OK, we might check if that depends on the data size or on the number of groups, or both.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Quadratic slowdown when saving multiple datasets to the same h5 file (h5netcdf) 427410885
806853536 https://github.com/pydata/xarray/issues/2857#issuecomment-806853536 https://api.github.com/repos/pydata/xarray/issues/2857 MDEyOklzc3VlQ29tbWVudDgwNjg1MzUzNg== kmuehlbauer 5821660 2021-03-25T14:29:24Z 2021-03-25T14:29:24Z MEMBER

I wonder if it would help to use the same underlying h5py.File or h5netcdf.File when appending.

This should somehow be possible. I'll try to create some proof of concept script bypassing to_netcdf, when I find the time. If there are other ideas or solutions, please comment here. Thanks @aldanor for intensive testing and minimal example.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Quadratic slowdown when saving multiple datasets to the same h5 file (h5netcdf) 427410885
806825379 https://github.com/pydata/xarray/issues/2857#issuecomment-806825379 https://api.github.com/repos/pydata/xarray/issues/2857 MDEyOklzc3VlQ29tbWVudDgwNjgyNTM3OQ== kmuehlbauer 5821660 2021-03-25T14:11:43Z 2021-03-25T14:11:43Z MEMBER

From my understanding, part of the the problem is with the use of CachingFileManager. Every call to to_netcdf(filename....) reopens this particular file (with all the downsides) and wraps it in CachingFileManager again. I wonder if it would help to use the same underlying h5py.File or h5netcdf.File when appending.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Quadratic slowdown when saving multiple datasets to the same h5 file (h5netcdf) 427410885
806759522 https://github.com/pydata/xarray/issues/2857#issuecomment-806759522 https://api.github.com/repos/pydata/xarray/issues/2857 MDEyOklzc3VlQ29tbWVudDgwNjc1OTUyMg== kmuehlbauer 5821660 2021-03-25T13:39:02Z 2021-03-25T13:39:02Z MEMBER

@aldanor If I change your example to using engine=netcdf4, the times increase too, but not to the extend of the h5netcdf case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Quadratic slowdown when saving multiple datasets to the same h5 file (h5netcdf) 427410885
806741704 https://github.com/pydata/xarray/issues/2857#issuecomment-806741704 https://api.github.com/repos/pydata/xarray/issues/2857 MDEyOklzc3VlQ29tbWVudDgwNjc0MTcwNA== kmuehlbauer 5821660 2021-03-25T13:27:43Z 2021-03-25T13:27:43Z MEMBER

@aldanor Thanks, that's what I expected (that the new version doesn't change the behaviour you are showing).

I think your assessment of the situation is correct. It looks like, to_netcdf is re-reading the whole file when in append-mode. Or better said, the underlying machinery re-reads the complete file. Would it be possible to use engine=netcdf4 just to see if this isn't affected?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Quadratic slowdown when saving multiple datasets to the same h5 file (h5netcdf) 427410885
806697600 https://github.com/pydata/xarray/issues/2857#issuecomment-806697600 https://api.github.com/repos/pydata/xarray/issues/2857 MDEyOklzc3VlQ29tbWVudDgwNjY5NzYwMA== kmuehlbauer 5821660 2021-03-25T12:59:11Z 2021-03-25T12:59:11Z MEMBER

@aldanor Which h5netcdf-version are you using? There have been changes to the _lookup_dimensions-function (which should not change behaviour). I'd try to check this out, could you help with a minimal script to reproduce?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Quadratic slowdown when saving multiple datasets to the same h5 file (h5netcdf) 427410885

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 22.949ms · About: xarray-datasette