home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1450841385

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/7522#issuecomment-1450841385 https://api.github.com/repos/pydata/xarray/issues/7522 1450841385 IC_kwDOAMm_X85WehUp 39069044 2023-03-01T21:01:48Z 2023-03-01T21:01:48Z CONTRIBUTOR

Yeah that seems to be it. Dask's write neatly packs all the needed metadata at the beginning of the file, since we can scale this up to a many GB file with dozens of variables and still read in ~100ms. While xarray is doing a less well organized write of the metadata and we have to go seeking in the middle of the byte range. cache_type="first" does provide some improvement but still not as good as on the dask-written file.

FWIW, I inspected the actual bytes of the dask and xarray written files and they are identical for a single variable, but diverge when multiple variables are being written. So, the important differences are probably associated with this step:

It does set up the whole set of variables as a initialisation stage before writing any data - I don't know if xarray does this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1581046647
Powered by Datasette · Queries took 0.606ms · About: xarray-datasette