issue_comments
4 rows where author_association = "CONTRIBUTOR" and issue = 1581046647 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Differences in `to_netcdf` for dask and numpy backed arrays · 4 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1450841385 | https://github.com/pydata/xarray/issues/7522#issuecomment-1450841385 | https://api.github.com/repos/pydata/xarray/issues/7522 | IC_kwDOAMm_X85WehUp | slevang 39069044 | 2023-03-01T21:01:48Z | 2023-03-01T21:01:48Z | CONTRIBUTOR | Yeah that seems to be it. Dask's write neatly packs all the needed metadata at the beginning of the file, since we can scale this up to a many GB file with dozens of variables and still read in ~100ms. While xarray is doing a less well organized write of the metadata and we have to go seeking in the middle of the byte range. FWIW, I inspected the actual bytes of the dask and xarray written files and they are identical for a single variable, but diverge when multiple variables are being written. So, the important differences are probably associated with this step:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Differences in `to_netcdf` for dask and numpy backed arrays 1581046647 | |
1450727551 | https://github.com/pydata/xarray/issues/7522#issuecomment-1450727551 | https://api.github.com/repos/pydata/xarray/issues/7522 | IC_kwDOAMm_X85WeFh_ | martindurant 6042212 | 2023-03-01T19:22:54Z | 2023-03-01T19:22:54Z | CONTRIBUTOR | I do generally recommend cache_type="first" for reading HDF5 files, because they tend to have most of the metadata in the header area of the file, with short pieces of metadata "elsewhere"; so the default readahead doesn't perform very well. As to what the two writers might be doing differently, I only have guesses. I imagine xarray leaves it entirely to HDF to make whatever choices it likes. Dask does not write in parallel, since HDF does not support that, but it may order the writes more logically. It does set up the whole set of variables as a initialisation stage before writing any data - I don't know if xarray does this. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Differences in `to_netcdf` for dask and numpy backed arrays 1581046647 | |
1449302032 | https://github.com/pydata/xarray/issues/7522#issuecomment-1449302032 | https://api.github.com/repos/pydata/xarray/issues/7522 | IC_kwDOAMm_X85WYpgQ | slevang 39069044 | 2023-03-01T04:04:25Z | 2023-03-01T04:04:25Z | CONTRIBUTOR | The slow file:
And the fast file:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Differences in `to_netcdf` for dask and numpy backed arrays 1581046647 | |
1428872842 | https://github.com/pydata/xarray/issues/7522#issuecomment-1428872842 | https://api.github.com/repos/pydata/xarray/issues/7522 | IC_kwDOAMm_X85VKt6K | slevang 39069044 | 2023-02-13T23:49:31Z | 2023-02-13T23:49:31Z | CONTRIBUTOR | I did try many loops and different order of operations to make sure this isn't a caching or auth issue. You can see the std dev of the For my actual use case, the difference is very apparent, with I also inspected the actual header bytes of these two files and see they are indeed different. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Differences in `to_netcdf` for dask and numpy backed arrays 1581046647 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 2